=========================
AI Data Generation
=========================

:task:`Task ID`

The system provides a tool to automatically populate :term:`audit forms <form>` with realistic, context-aware dummy data using an LLM (Large Language Model). This is primarily used for testing dashboards, demonstrating features to clients, or stress-testing form logic without manual data entry.

Usage
=====

This feature is accessible via the :term:`Django Admin`.

1. Navigate to the **Audit Forms** change list: :url:`/meg-admin/megforms/auditform/`.
2. Select **exactly one** audit form.
3. From the "Action" dropdown, select **Generate form data with AI**.
4. Click **Go**.

A progress bar will appear monitoring the background task. Once complete, a success message will display the number of created observations.

.. note::
    **Single Form Only:** This action cannot be run on multiple forms simultaneously. If you select multiple forms, the system will return an error message to prevent overloading the AI service.

How it Works
============

Unlike standard "lorem ipsum" generators, this tool uses the form's specific schema to generate semantic data.

1. **Schema Export:** The tool exports a simplified JSON representation of the form (including fields, widgets, and subforms) using :func:`~audit_builder.export.export_schema.export_audit_form`.
2. **Context Construction:** It constructs a prompt containing the form structure, valid :term:`wards <ward>`, and the current date/time.
3. **AI Generation:** The LLM generates a JSON list of :term:`observations <observation>`.
4. **Validation:** The raw output is sanitized and validated against the form definition before being saved to the database.

.. seealso::
    The core logic resides in :mod:`meg_forms.auto_data_generation`.

Data Validation & Consistency
=============================

Because LLMs can "hallucinate" (generate invalid data), strict validation layers are in place to ensure database integrity.

Pydantic Validation
-------------------
The system uses `Pydantic <https://docs.pydantic.dev/>`_ models to strictly type-check the incoming data.

* **Choice Fields:** If the AI generates an answer that is not in the valid choices for a Select/Radio widget, the system will attempt to pick a random valid choice instead of crashing.
* **Required Fields:** If a required field is missing or invalid, the observation may be skipped entirely.
* **Dates:** Dates are distributed over the last 3 months to provide trend data for line charts.

Standard vs. Subforms
---------------------

The generator automatically detects the form structure:

* **Standard Forms:** Generates a single :class:`~audit_builder.models.CustomObservation` with answers in ``custom_answers``.
* **Forms with Subforms:** Generates a parent Observation and multiple linked :class:`~audit_builder.models.CustomSubObservationInstance` records.

.. warning::
    **Compliance Logic:** The AI attempts to vary data based on ``compliant_value`` and ``ignored_value`` definitions in the form. However, it does not execute the actual calculation logic—that happens via the standard compliance calculation task after the data is saved.

Configuration
=============

The behavior of the generator is driven by a strict system prompt defined in :const:`~auto_data_generation.constants.AUTO_DATA_GEN_PROMPT`.

Key generation rules include:

* **Subject Profile:** The AI assigns a "clinical profile" (e.g., "High Risk Patient") to each observation to ensure answers within that submission are semantically consistent with each other.
* **Wards:** It distributes observations across the wards available to the :term:`auditor` running the task.
* **Permissions:** The data is created under the user account of the admin running the action.