Project

Clinical Data Management Agents

AI program for clinical data management and study setup, designed around multiple focused automation tasks rather than one monolithic agent. The system supports protocol understanding, CRF design, historical project mining for reusable templates, DVP-to-edit-check generation, targeted SFT for difficult rule patterns, DPO-based preference alignment, and expert review of structured study-build artifacts.

Agentic workflowsLLM orchestrationSFTDPOClustering

Overview

Clinical data management and study setup require teams to interpret protocols, identify visits and assessments, design CRFs, write edit checks, and maintain consistency across study-build artifacts. The work is difficult because protocol language is dense, requirements are distributed across sections, and each build decision has downstream consequences.

This project approaches clinical data management automation as a portfolio of focused AI tasks rather than a single end-to-end agent. Different parts of the system support protocol understanding, CRF design, historical project mining, DVP-to-edit-check generation, targeted model training, preference alignment, validation, and expert review.

The design goal is to make clinical data management work faster and more consistent while keeping the workflow controllable: models draft structured outputs, specialized components handle narrow tasks, and domain experts can inspect or correct important decisions before production use.

AI Components

Protocol and study understanding
LLM workflows analyze protocol text and study setup context to identify visits, assessments, forms, procedures, timing constraints, and data collection requirements.
CRF build support
Study-build assistance turns interpreted requirements into reviewable CRF design outputs, including form suggestions, field-level design support, visit-form organization, and data collection structure.
Historical template mining
A smaller traditional-ML branch helps convert historical study builds into reusable template and library assets through clustering, matching, normalization, and deduplication.
DVP to edit check SFT
The SFT stage teaches the model to translate DVP requirements into structured edit-check outputs, including formulas, variable bindings, actions, query messages, and stable JSON-style schemas.
Specialized rule reinforcement
Targeted SFT slices strengthen difficult clinical rule patterns such as multi-timepoint logic, PK/PD scenarios, cross-visit checks, and complex conditional combinations that need more focused examples.
DPO and expert review
DPO aligns generated edit checks with business review preferences after SFT has established the base capability, while validation checkpoints and human review help catch mismatches before production use.

Design Highlights

Multi-task framing that separates protocol understanding, CRF build support, template mining, edit-check generation, preference alignment, and review.
SFT strategy that first establishes core DVP-to-edit-check generation ability, then adds specialized training slices for harder clinical logic patterns.
DPO stage focused on aligning outputs with business review preferences rather than relearning the entire edit-check task.
Traditional ML branch for turning historical study artifacts into reusable templates and libraries that strengthen future study builds.
Validation and human-review checkpoints that make generated CRF and edit-check artifacts inspectable before production use.

My Role

Led product framing and system design for the clinical data management AI program.
Defined how clinical data management automation should be decomposed into focused AI subtasks rather than a single monolithic workflow.
Worked with engineering and domain experts to translate protocol interpretation, CRF design, DVP rules, and edit-check requirements into reusable workflow components.
Shaped the SFT and DPO strategy for DVP-to-edit-check generation, including the role of specialized datasets for complex rule patterns.
Designed validation checkpoints and human-review loops for study-build and edit-check artifacts.

Impact

Improved consistency in protocol interpretation, CRF design support, and edit-check generation workflows.
Reduced manual effort in repetitive study setup tasks while preserving expert control over clinical interpretation and build decisions.
Created a path for historical study builds to become reusable template and library assets instead of one-off project knowledge.
Made DVP-to-edit-check generation more controllable by separating base generation capability, specialized rule coverage, and preference alignment.
Enabled reusable automation patterns across studies, rule types, and study-build artifacts.