Asthra · Regulatory AI Governance
Where Inference Enters the Record
A sophisticated regulator is not asking whether you used AI. The question is where inference entered your document, and who owns it. Here is how we build for that question.
The FDA's concern about AI in regulatory documents is often misread as a concern about AI itself. It is not. If you re-read the January 2025 draft guidance, the worry is specific: not that the technology is used, but that a qualified person stops thinking because the output sounds authoritative and complete. The guidance puts it plainly, warning that AI can present multiple supporting arguments that lead to excessive confidence and diminish the due diligence a reviewer would otherwise apply. AI is most dangerous, in other words, when it is persuasive, because it can talk a competent reviewer out of the scrutiny they were hired to perform.
This is no longer a thought experiment. In its first warning letter to address AI as a compliance tool, the FDA made the stakes explicit: a company remains fully responsible for AI-generated output, including its errors and omissions, and reliance on AI is not a defense against a regulatory violation. The agency expects personnel to exercise their own judgment rather than defer to what the model produced. The EMA's 2024 reflection paper arrives at the same place from a risk-based, human-centric direction. Both regulators are watching for one failure above all others: a person who signed off on a decision the machine actually made.
The better question to ask of a sentence
Governing this well starts with sorting sentences not by whether AI touched them, but by what they do. A sentence in a results section is either reporting what the data show, which is safe and checkable, or drawing a conclusion about what the data mean. Drawing the conclusion is the valuable part, because it is the clinical judgment a sponsor is paying for, but it has to be owned by a named person who can defend the leap. Trouble starts when a conclusion is dressed to look like plain reporting, so no one at the table registers that a judgment was just made.
Sorting sentences this way matters because it shows what actually sets the risk, which comes down to two things: how much the model influences a decision about safety, efficacy, or quality, and how bad it is if the model is wrong and the error is not caught. Influence multiplied by consequence sets the inherent risk, and the quality of the human review sets what remains. A claim that is a minor edit in an efficacy paragraph can become a patient-safety problem when the same construction appears in a hepatic-safety narrative. What a capable reviewer evaluates is the mode of use, not whether AI was present.
Five ways AI shows up in a results sentence
We found it useful to give the patterns plain names. The five below are the ones that come up in practice.
A validated program runs the pre-specified analysis and produces the same number from the same data on every run, so the output is the answer rather than an interpretation. The only thing to confirm is the dull but required thing, that the program was validated and matches the statistical analysis plan.
The model steps through fixed, pre-set criteria where every branch resolves to a defined rule, such as reportability logic or a testing hierarchy. The conclusion traces back to the rule, not to the model's opinion, so it is defensible because it is traceable.
The analysis is already done, and the model only hands the reviewer a sharper set of questions, such as whether the other causes have been ruled out or whether one site could be driving the result. Because it cannot change the result, only make the review harder to pass, this is the best use of all, working against automation bias rather than with it.
The model drafts prose or condenses an existing section. This is fine as a first draft, and fine for a synopsis, but only when it is explicitly constrained to carry forward conclusions a human already made and forbidden from inventing new ones. The safety net is the writer who edits it. The danger is rubber-stamping prose because it reads well.
It starts from a real result, then states a conclusion the data do not support, in language that still reads like reporting. Because it borrows the credibility of a true first half and spends it on an oversell, it passes review when it should not, never announcing that a judgment was made. This is the exact pattern the FDA's concern describes.
What matters here is that the same task can be built as any of these, since a sensitivity-analysis sentence can be the calculator, the devil's advocate, or the smooth talker depending only on how it is written. Choosing the mode is the mitigation decision, and it is where the risk is decided and where most tools lose control of it.
What Asthra does: the model performs only selected tasks
Most tools that generate regulatory text put the model first and the constraint second. They write from training data or from uploaded documents with no constraint on what may be asserted, produce fluent prose, and hand it to a writer to correct afterward. The judgment lives at the end of the process, in the review, where it competes against output engineered to sound finished. That is precisely the arrangement the FDA warns about.
Asthra reverses the sequence. The writer specifies, before a word is generated, which sources the model may draw from and which statement types it may not produce. The model cannot go beyond description of those sources. When it reaches a boundary, it stops and flags rather than filling the gap with inference. We never put the model in a position to make a required decision in the first place, because we restrict it to tasks where it either cannot introduce a judgment or where any judgment it introduces is visible and must be confirmed by a person. It may report and compute, walk defined rules, raise questions for the reviewer, and draft or summarize within the limits above. What it may not do is move from real data to an unsupported conclusion in the register of plain reporting. Rather than hoping that pattern gets caught in review, we engineer it out, so that the required judgments about what a result means, whether a finding generalizes, and whether something is safe stay with the sponsor's qualified people by design.
The output that comes back is therefore not a draft to be corrected. It is a record of what the sources support, with the unsupported positions shown as flags rather than buried in confident prose. The writer moves from the end of the process to the front of it.
How we prove it: we test our own prompts
Restricting the model only means something if the restriction can be shown to hold, so we built an internal control that reviews the instructions driving the product. Every prompt we ship is run through a risk-based review that classifies the kind of output the prompt is constructed to produce and returns a pass or fail against the modes above. A prompt that would let the model draw a disguised conclusion fails and does not go into use. A synopsis or summary prompt passes only if it does two things, checked as separate conditions: it explicitly directs the model to summarize without adding inference, and it points to where the existing human-owned conclusions live while forbidding any new ones. Meeting one condition is not enough.
Two properties make this a control rather than a gesture. It fails closed, so if the review cannot clearly confirm a prompt is safe, the prompt fails and nothing passes by default. And it produces a record, a documented review of every prompt, which is the kind of governance evidence the FDA now expects a company to be able to produce.
This control governs how prompts are built to steer the model. It does not certify the model's actual output, and it does not replace the qualified human who reviews and approves the final document. That person remains the last and accountable check, which is exactly where the FDA wants it. What the system guarantees is narrower and more useful: the model was never set up to make a decision that should have been a person's.
What this means for the next review
The FDA and EMA are not auditing your tools. They are auditing where inference enters the record, and who owns it. That gives a writing team a clear design instruction, which is to build each task so that inference is either eliminated, as in the calculator and the checklist-walker, or visible, as in an edited draft, or deliberately handed back to a person, as in the devil's advocate, rather than disguised as derivation. A platform that enforces this before generation, and keeps a record that it did, does not make the regulatory case harder to defend; it makes it easier.
Ingrid Witherell is Asthra's Regulatory Writing Partner. Based in Chicago, she has 25+ years of FDA-submission experience across 100+ filings — including INDs, DSURs, and NDAs — with a focus on safety surveillance and aggregate safety reporting. She partners with sponsor regulatory writing teams on safety-document architecture and reviews every Asthra output before handoff.
This paper reflects the FDA's January 2025 draft guidance on risk-based credibility and human oversight of AI used in regulatory decision-making, the FDA's first warning letter addressing AI as a compliance tool, and the EMA's 2024 risk-based, human-centric reflection paper. All clinical examples referenced in our discussion materials are synthetic and illustrative. This is not legal or regulatory advice; specific submissions should be assessed against the current text of the applicable FDA and EMA guidance and with qualified regulatory counsel.