Statistics & Methodology — Glossary¶

← Back to glossary index

The statistical concepts, trial design choices, and clinical endpoints that drive trial design, analysis, and regulatory defence.

Terms covered:

Power analysis
Sample size calculation
Adaptive design
MTD vs RP2D
Common oncology endpoints (ORR, PFS, OS, DCR, DOR)
MSS vs MSI-H
Exploratory vs confirmatory endpoints
Log-rank test / Cox regression
Bayesian vs frequentist
Operational Qualification (OQ)

Power analysis¶

Definition. A statistical calculation that determines the probability that a clinical trial will detect a true effect of a specified size, given the sample size, variability, and significance threshold. Power = 1 - β, where β is the probability of a Type II error (false negative).

In practice. A typical Phase 3 trial is designed for 80-90% power to detect the protocol-specified treatment effect at a 5% two-sided significance threshold. Lower-powered trials risk missing real effects (and burning the asset's value); over-powered trials waste patients and time.

Common inputs to a power calculation:

Effect size (e.g., hazard ratio of 0.7 for PFS)
Variability assumption
Sample size per arm
Significance threshold (alpha)
Trial duration / event accrual assumptions
For survival endpoints: censoring patterns, accrual rate

Why it matters. Power analysis underpins sample size calculation, which underpins trial duration, cost, and feasibility. A small change in the assumed effect size can dramatically change required sample size — and therefore trial cost.

Where Flusso fits. Statistical computation engine with versioned, auditable power analyses. Every assumption is captured; every analysis is reproducible; results are exportable as regulator-ready PDF artefacts. Particularly relevant for FDA Type B meetings defending sample-size decisions.

Sample size calculation¶

Definition. The determination of the number of patients required in a clinical trial to achieve the desired statistical power for the planned analysis. Inverse of power analysis — given a desired power, what sample size is needed?

In practice. Sample size calculations are typically performed at multiple levels:

Per-arm sample size — patients in each treatment arm
Total sample size — sum across arms
Enrolment target — adjusted for expected dropouts and screen failures
Event-driven trials — for survival endpoints, the calculation is event-count driven (e.g., 200 PFS events) and sample size flows from accrual + event-rate assumptions

Common software: nQuery, PASS, SAS PROC POWER, R packages (rpact for adaptive designs, gsDesign for group-sequential designs, samplesize for general-purpose calculations).

Why it matters. Sample size determines cost, duration, and feasibility. Under-powered trials waste resources on non-results; over-powered trials waste patients (an ethical concern) and time. The defensibility of sample-size decisions is increasingly important under Project Optimus.

Where Flusso fits. Sample size and power analyses become reproducible artefacts rather than ad hoc Excel / SAS outputs. Versioned computation envelopes capture every assumption and result; export pipelines produce regulator-ready documentation.

Related: Power analysis · Project Optimus

Adaptive design¶

Definition. A clinical trial design that allows pre-specified modifications to the trial based on accumulating data — without compromising the trial's statistical validity or operational integrity. Modifications can include sample size re-estimation, treatment-arm selection, dose adjustment, or population enrichment.

In practice. Common adaptive design types:

Group-sequential design — pre-planned interim analyses with stopping rules for efficacy or futility
Sample size re-estimation — interim analysis updates sample size based on observed variability
Adaptive randomisation — treatment assignment ratios adjust based on accumulating efficacy data
Treatment-arm selection (drop-the-loser) — early-phase trials drop underperforming arms
Population enrichment — restrict enrolment to patients showing benefit signals
Master protocols (basket, umbrella, platform) — adaptive trial structures across multiple indications or treatments

Common software: rpact (R package), East (Cytel), ADDPLAN (Aptiv).

Why it matters. Adaptive designs can reduce trial cost and duration substantially when used appropriately, and have become increasingly mainstream in oncology over the past decade. They require sophisticated upfront planning and ongoing operational discipline.

Where Flusso fits. Adaptive design decision points are exactly the kind of high-stakes, defensibility-critical analyses where versioned computation envelopes pay off. Every decision (continue, stop, adapt) needs a reproducible analytical record for regulatory defence.

Related: Power analysis · Sample size calculation · DSMB

MTD vs RP2D¶

Definition. Two related but distinct concepts in oncology dose-finding:

MTD (Maximum Tolerated Dose) — the highest dose at which dose-limiting toxicities (DLTs) occur in fewer than a defined fraction of patients (often 33%). Historical anchor for oncology dose selection.
RP2D (Recommended Phase 2 Dose) — the dose recommended for further development. Increasingly distinct from MTD because the RP2D may be lower than MTD when the lower dose has equivalent efficacy with better tolerability.

In practice. Historically, oncology Phase 1 trials selected the MTD as the RP2D and proceeded directly. Under Project Optimus, the RP2D is now expected to be selected through evaluation of multiple doses with PK/PD and benefit-risk analysis — not simply equal to MTD.

For Adagene's muzastotug, the Phase 2 study evaluating 10 mg/kg vs. 20 mg/kg is precisely this dose-comparison work — selecting the RP2D for Phase 3, which may or may not be the higher dose tested.

Why it matters. The shift from MTD = RP2D to evidence-based RP2D selection is one of the most consequential changes in oncology development of the past decade. Defending the RP2D selection at the end-of-Phase-2 FDA meeting is now a critical analytical exercise.

Where Flusso fits. RP2D selection requires aggregating safety, PK, PD, and efficacy data across multiple cohorts and presenting a defensible rationale. The same evidence-aggregation infrastructure that supports milestone evidence packets supports RP2D defence packs.

Common oncology endpoints (ORR, PFS, OS, DCR, DOR)¶

Definition. The standard efficacy endpoints in oncology clinical trials:

ORR (Objective Response Rate) — proportion of patients with a defined tumour response (typically complete or partial response per RECIST criteria). Often used as a primary endpoint in early-phase or accelerated-approval contexts.
PFS (Progression-Free Survival) — time from randomisation to disease progression or death from any cause. Commonly used in Phase 2 and Phase 3 trials.
OS (Overall Survival) — time from randomisation to death from any cause. The "gold standard" endpoint; often the primary endpoint of pivotal Phase 3 trials.
DCR (Disease Control Rate) — proportion of patients with response or stable disease for a defined period.
DOR (Duration of Response) — time from first response to disease progression or death (among responders).

In practice. The choice of primary endpoint shapes trial design — sample size, duration, statistical analysis plan. OS is the strongest endpoint but requires longer follow-up and more events; PFS is faster but a weaker indicator of clinical benefit; ORR is fastest but generally only sufficient for accelerated approvals or early-phase decisions.

Why it matters. Endpoint selection drives trial economics. ORR-driven trials are faster and cheaper but produce weaker regulatory and commercial outcomes; OS-driven trials are slower and more expensive but produce stronger outcomes. The trade-off is central to development strategy.

Where Flusso fits. Endpoint event tracking is part of operational reporting. Real-time visibility on event accrual (PFS events, OS events) supports finance forecasting of when readouts will occur.

Related: Database lock · Adaptive design

MSS vs MSI-H¶

Definition. A molecular distinction in colorectal and other cancers based on microsatellite instability:

MSS (Microsatellite Stable) — no defects in DNA mismatch repair; the more common phenotype (~85% of CRC)
MSI-H (Microsatellite Instability — High) — defects in DNA mismatch repair (often Lynch syndrome related); the less common phenotype (~15% of CRC)

In practice. MSS and MSI-H tumours respond very differently to immunotherapy. MSI-H tumours typically respond well to anti-PD-1 / anti-PD-L1 monotherapy; MSS tumours typically don't, making MSS-CRC a major unmet need that has been the focus of intensive combination-therapy development.

For Adagene specifically, muzastotug's lead indication is MSS-CRC in combination with pembrolizumab — the muzastotug is intended to make MSS tumours responsive to checkpoint inhibition by depleting regulatory T cells in the tumour microenvironment.

Why it matters. MSS-CRC is a high-value indication precisely because it's an immunotherapy-resistant population that lacks effective options. Successful development against MSS-CRC has high commercial and clinical impact.

Related: SAFEbody · CTLA-4

Exploratory vs confirmatory endpoints¶

Definition. A distinction in trial endpoint hierarchy:

Confirmatory endpoints (typically primary and key secondary) — pre-specified, statistically tested with controlled type I error, designed to support regulatory claims
Exploratory endpoints — additional analyses included to generate hypotheses or characterise the asset, but not powered for confirmatory testing

In practice. A typical pivotal Phase 3 trial has:

One or two primary confirmatory endpoints (PFS, OS)
A handful of key secondary confirmatory endpoints (ORR, DOR, quality of life)
Many exploratory endpoints (biomarker analyses, subgroup analyses, post hoc analyses)

The statistical analysis plan (SAP) defines the analytical hierarchy and multiplicity adjustments.

Why it matters. Only confirmatory endpoints support regulatory claims. Exploratory endpoints inform clinical understanding and can drive future trial design but cannot be cited as evidence of approval. Conflating the two is a common source of regulatory-marketing tension.

Related: Common oncology endpoints

Log-rank test / Cox regression¶

Definition. Two foundational statistical methods for time-to-event (survival) analysis:

Log-rank test — non-parametric test comparing survival curves between groups; the standard primary analysis for time-to-event endpoints in clinical trials
Cox proportional hazards regression — semi-parametric regression model that estimates hazard ratios (treatment effect on event rate) and supports adjustment for covariates

In practice. A typical PFS or OS analysis presents:

Kaplan-Meier curves visualising survival in each arm
Log-rank p-value testing the difference between arms
Cox model hazard ratio with 95% confidence interval

For trials with non-proportional hazards (e.g., delayed treatment effects), alternative methods (restricted mean survival time, weighted log-rank) may be more appropriate.

Why it matters. These are the universal methods for time-to-event endpoints in oncology — and therefore foundational to OS and PFS analyses, which dominate pivotal trial endpoints.

Where Flusso fits. Statistical analyses underlying these methods can be implemented in Flusso's computation engine with reproducible, auditable outputs.

Related: Common oncology endpoints · Power analysis

Bayesian vs frequentist¶

Definition. Two foundational statistical paradigms:

Frequentist — probability is interpreted as long-run frequency; inference is made through hypothesis tests and confidence intervals; the dominant paradigm in regulatory clinical trials
Bayesian — probability is interpreted as belief; inference combines prior beliefs with observed data to produce posterior distributions; increasingly common in adaptive designs and decision-making contexts

In practice. Most pivotal regulatory trials use frequentist methods because regulatory frameworks (Type I error control, p-values, confidence intervals) are frequentist-aligned. Bayesian methods are increasingly common in:

Early-phase dose-finding (CRM, BLRM, mTPI)
Adaptive designs
Borrowing historical control data
Internal decision-making (go / no-go decisions)

Why it matters. The choice of paradigm shapes how analyses are presented, defended, and interpreted. Bayesian methods can be more efficient but require more upfront methodological work to gain regulatory acceptance.

Where Flusso fits. Both paradigms are supported in computation. The audit trail and defensibility infrastructure applies equally to both.

Related: Adaptive design · Power analysis

Operational Qualification (OQ)¶

Definition. A formal validation activity in Computer Systems Validation (CSV) that demonstrates a system operates according to its functional specification across its intended operating range. One of the IQ / OQ / PQ triad of qualification activities.

In practice. OQ for a statistical analysis system typically involves:

Testing each statistical method against pre-computed reference values (often from R packages like rpact for sample-size, or independent SAS programs)
Documenting test cases with input, expected output, observed output, and pass/fail status
Capturing version-controlled OQ evidence for regulatory inspection
Ongoing OQ for changes (re-validation on functional changes)

In Flusso specifically, the statistical engine OQ suite uses rpact-derived reference values for Phase 2/3 sample-size and power calculations, with Playwright-driven browser-mode test execution.

Why it matters. OQ is the operational gate for using a system in regulated activities. Without OQ evidence, statistical outputs cannot be relied on for regulatory decision-making.

Where Flusso fits. Flusso's statistical engine OQ suite is part of the platform's validation posture — a precondition for the regulatory-defensibility value proposition.

← Back to glossary index