The Hidden Hours Behind Every Safety Section

Open any CSR safety section, any PSUR adverse-event table, any PMS narrative — the prose is short, the data is huge. Behind a single line that reads "Treatment-emergent adverse events of Grade 3 or higher were reported in 13.4% of pembrolizumab patients versus 35.0% of docetaxel patients" sits a line listing of thousands of rows that someone filtered, grouped, counted, and cross-checked before the sentence could be written.

The same is true for a PSUR sales-exposure table. Behind a row that says "Cumulative patient exposure: 12,847 patient-years" sits a sales dataset that someone pulled, sliced by reporting period, mapped to country exposure factors, and aggregated.

For most regulatory writers, this is the slowest part of the work. Not the writing — the data prep. Hours in Excel filtering pivot tables, hours validating that the SOC mapping is right, hours rebuilding the same table for a different cut. And then the figure for the appendix has to be made too, often in a separate tool. It is the kind of work that AI writing tools have stayed away from — because if the AI gets the numbers wrong, the whole submission is worthless.

Asthra's data analysis agent handles this step inside the chat. Differently from how an LLM-only tool would.

How It Works

The agent reads your request in plain language. "From the AE line listing, give me a Grade 3+ frequency table by SOC, treated vs placebo." It writes a short Python script that does the filtering, grouping, and aggregation. Python runs the script. The output is the table — or a chart, if you asked for one — ready to insert into the draft.

The split matters. The LLM proposes the analysis plan; Python executes it. The LLM is not doing the arithmetic. Python is. That distinction is what makes the output trustable.

Every step is captured in the audit trail with the exact code that produced the output. The trail includes the line-listing file the agent read, the filter expression, the group-by columns, the aggregation function, and the result. If a reviewer asks how a particular cell was computed, the answer is a Python script that can be re-run against the same source data.

A typical interaction looks like this. You point the agent at the line listing in your source set, say what you need, and a few seconds later you have a table preview in the chat. You either accept the cut as the agent proposed it, or you ask for variations: "Same table, but only for the safety population. Add a percentage column. Group by preferred term within Blood disorders." The agent revises the analysis, runs it again, and updates the preview. When you are satisfied, one click inserts the table as a numbered figure in the draft, with the citation back to the source file and the analysis transaction.

What It Is Best For

Five categories of work where the agent is meaningfully faster than manual analysis.

Adverse event tables. Frequency tables by SOC, by preferred term, by severity grade, by serious / non-serious, by relatedness. Sub-population cuts (safety population, ITT, per-protocol). Treatment-emergent vs all events. Onset windows. The agent handles the slice-and-dice in seconds; the writer applies judgement to which cuts to include.

Lab parameter shifts. Baseline-to-worst-value shifts, baseline-to-end-of-treatment shifts, NCI-CTCAE grade transitions. The agent runs the shift table; the writer interprets it.

Exposure data. Patient-years, cumulative drug exposure, cumulative dose, exposure-adjusted event rates. The PSUR's mandatory sales-and-exposure block becomes a chat exchange rather than a separate spreadsheet pull.

Demographics and baseline. Tables by treatment arm, by region, by demographic stratum. Median, IQR, frequency. Standard regulatory cuts, derived in seconds from the line listing.

Figures. Bar charts of AE frequencies, forest plots for sub-population analyses, time-course plots for exposure, line plots for lab values over time. The agent generates the figure in Python's plotting libraries and the output is a publication-ready image that drops into the draft.

What It Is Not

The agent is not a black-box data scientist. It does not invent insights or run analyses the writer did not request. It is a faster path to the cuts the writer already wants — not a replacement for the writer's judgement about which analyses belong in the section.

It is also not a substitute for the statistician's work. Pre-specified primary and secondary analyses still come from the SAP, derived by the biostatistics team, locked before unblinding. The data analysis agent handles the descriptive, exploratory, and supportive cuts that a regulatory writer assembles to support the narrative — the cuts that today live in writer-owned Excel files.

And it is not an LLM doing arithmetic. The numbers come from Python running against your line listing. The LLM is the interface, not the calculator.

What the Audit Trail Captures

For every analysis the agent runs, the audit trail records:

The source artifact — which line listing, which sales dataset, which lab file. Identified by the file name, the version, and the hash of the underlying data.

The analysis script — the Python code that ran. Captured verbatim, with the libraries and versions used.

The input parameters — the filter expressions, the group-by columns, the aggregations, the chart specifications.

The output — the resulting table or figure. Stored as part of the run bundle alongside the draft.

The transaction ID — links the analysis to the section of the draft where its output appears.

A reviewer who asks "how was Table 14.3.1.2 produced" gets a complete answer: the source file, the script, the parameters, the timestamp. The same level of traceability a regulator expects for an SDTM derivation, applied to the descriptive cuts that until now have lived in informal Excel workbooks.

What Is Next

We are working toward two extensions. The first is statistical-method extensibility — letting biostatistics teams ship custom analysis recipes as skills (the same skill registry that hosts reviewer personas) so the agent can run pre-specified company-standard analyses without per-project setup. The second is multi-source synthesis — analyses that join a line listing with the exposure dataset and produce exposure-adjusted event rates in one step.

The principle holds: the LLM proposes, Python validates. For data analysis, validates means actually computes the answer. The reviewer gets prose grounded in source data. The writer gets hours back.


See it on the product page: Tables & figures, in the chat walks through the workflow with a real Python-execution mock and a table preview. For the broader studio surface, see /product.