AI Data Generation

Task #Task ID

The system provides a tool to automatically populate audit forms with realistic, context-aware dummy data using an LLM (Large Language Model). This is primarily used for testing dashboards, demonstrating features to clients, or stress-testing form logic without manual data entry.

Usage

This feature is accessible via the Django Admin.

  1. Navigate to the Audit Forms change list: /meg-admin/megforms/auditform/.

  2. Select exactly one audit form.

  3. From the “Action” dropdown, select Generate form data with AI.

  4. Click Go.

A progress bar will appear monitoring the background task. Once complete, a success message will display the number of created observations.

Note

Single Form Only: This action cannot be run on multiple forms simultaneously. If you select multiple forms, the system will return an error message to prevent overloading the AI service.

How it Works

Unlike standard “lorem ipsum” generators, this tool uses the form’s specific schema to generate semantic data.

  1. Schema Export: The tool exports a simplified JSON representation of the form (including fields, widgets, and subforms) using export_audit_form().

  2. Context Construction: It constructs a prompt containing the form structure, valid wards, and the current date/time.

  3. AI Generation: The LLM generates a JSON list of observations.

  4. Validation: The raw output is sanitized and validated against the form definition before being saved to the database.

See also

The core logic resides in meg_forms.auto_data_generation.

Data Validation & Consistency

Because LLMs can “hallucinate” (generate invalid data), strict validation layers are in place to ensure database integrity.

Pydantic Validation

The system uses Pydantic models to strictly type-check the incoming data.

  • Choice Fields: If the AI generates an answer that is not in the valid choices for a Select/Radio widget, the system will attempt to pick a random valid choice instead of crashing.

  • Required Fields: If a required field is missing or invalid, the observation may be skipped entirely.

  • Dates: Dates are distributed over the last 3 months to provide trend data for line charts.

Standard vs. Subforms

The generator automatically detects the form structure:

Warning

Compliance Logic: The AI attempts to vary data based on compliant_value and ignored_value definitions in the form. However, it does not execute the actual calculation logic—that happens via the standard compliance calculation task after the data is saved.

Configuration

The behavior of the generator is driven by a strict system prompt defined in AUTO_DATA_GEN_PROMPT.

Key generation rules include:

  • Subject Profile: The AI assigns a “clinical profile” (e.g., “High Risk Patient”) to each observation to ensure answers within that submission are semantically consistent with each other.

  • Wards: It distributes observations across the wards available to the auditor running the task.

  • Permissions: The data is created under the user account of the admin running the action.