PMx-00: Preface – From 'How' to 'Why': Rethinking Your Modeling Mindset
Have you ever experienced a moment like this:
- You write NONMEM or Monolix code fluently, yet hesitate when choosing between
FOCE(First-Order Conditional Estimation) andLAPLACEin the$ESTIMATIONblock? - You feel your heart skip a beat when the output shows
Hessian ResetorCovariance Step Failed, but have no idea what mathematical issue caused it? - When asked “What is a Likelihood Function?”, all you can summon is a jumble of vague formulas, unable to articulate how it fundamentally differs from Probability?
If your answer is “Yes,” then this blog series is for you.
This series is based on the classic Pharmacometric Statistics Workshop by statistician and mathematician Adrian Dunne of University College Dublin. Adrian is not a pharmacologist; he is a statistician who once worked alongside Stuart Beal, the core developer of NONMEM. His perspective is unique and incisive — he strips away the complex “black box” of pharmacometric software to reveal the true statistical principles behind the algorithms, using the most straightforward logical language.
The Core Philosophy: Why vs. How
Adrian Dunne lays down a rule at the very start of his course: “This is not a ‘How-to’ workshop, it is a ‘Why’ workshop.”
There are countless tutorials teaching you how to write code and how to click through a software interface. But the purpose of this series is to explain why we do what we do, through understanding statistical concepts, using “English” rather than opaque mathematical notation.
Throughout this series, we will uphold three of Adrian’s core modeling philosophies:
Never mistake a NONMEM control stream for a model. The model is your mathematical description of the biological process. Code is merely the tool to implement it. Model first, code second.
Always start with the simplest model. Unless the data strongly demands complexity (a significant drop in OFV), do not add it. Trying to squeeze complex parameter estimates from limited data is futile.
No matter how much you, the clinical team, or your supervisor love a particular model, if the data doesn’t like it (the Likelihood is not high), it is meaningless. Always ask the data: “How well do you like this model?”
Series Outline
To enable a smooth progression from the basics to complex population analysis algorithms, the original course sessions have been reorganized into five core modules. This is a journey from “single individual” to “population,” and from “analytic solutions” to “numerical approximations.”
Module 1: Foundations (Blogs 01–03)
What problem are we trying to solve?
We will re-examine the relationship between “Population” and “Sample.” Statistical Inference is essentially reverse engineering — we cannot see the “Population” from a god’s-eye view; we can only use the “Sample” in hand to infer backwards. We will build intuition for probability distributions, random variables, and multivariate distributions — the bedrock for everything that follows.
Module 2: The Engine — Maximum Likelihood Estimation (Blogs 04–06)
What is the software actually calculating?
In this module, we temporarily set aside population models and return to the simplest single-individual data. We will thoroughly understand:
- Why Likelihood is not the same as Probability.
- How the Score Function (the slope of the likelihood surface — which direction should we move?) and Hessian Matrix (the curvature — how steep is the peak?) work together to search for the best parameter estimates.
- Why Fisher Information (how much does the data tell us about a parameter?) determines the precision of our estimates.
- Whether the Wald Test or the Likelihood Ratio Test (LRT) is more reliable.
Module 3: The Hurdle — Mixed Effects Models (Blogs 07–08)
When data is no longer independent, trouble begins.
This is the critical turning point into Population PK. When the same patient contributes multiple data points, those points are no longer independent. We will introduce $\eta$ (random effects) and reveal the central nightmare of population analysis — the Marginal Likelihood integral problem. To estimate population parameters, the software must mathematically “average out” each patient’s unknown random effects by integrating over all possible values. This integral has no closed-form solution, making it the single hardest computational challenge in pharmacometrics. Without solving it, we cannot compute the Objective Function Value (OFV).
Module 4: The Solutions — Algorithms (Blogs 09–10)
Those cryptic acronyms (FO, FOCE, EM, MCMC) all exist to solve the same problem.
Since the integral is intractable, we must find ways to approximate it. We will delve into the inner workings of NONMEM and Monolix:
- FO and FOCE: What kind of compromise is it to force a non-linear model to look linear (Taylor expansion)?
- Laplace: How does a second-order approximation improve accuracy?
- EM Algorithm: Why is it a “dance” between Expectation and Maximization? How does it solve the problem through the lens of “filling in missing data”?
- MCMC and SAEM: When computers are fast enough, we can stop relying on approximate formulas and directly simulate the integral through random sampling.
Module 5: Evaluation and Philosophy (Blogs 11–15)
The model run is finished. Can we trust the results?
Finally, we will discuss the world after the model converges:
- Shrinkage: Why does high Shrinkage cause EBEs to mislead our diagnostic plots?
- EBEs (Empirical Bayes Estimates): What does “Bayesian estimation” actually mean in this context?
- Model Selection: How to choose between non-nested models using AIC and BIC.
- The Sandwich Estimator: Your safety net when distributional assumptions aren’t quite right.
- Mu-Referencing: Not just a coding trick, but mathematical wisdom to accelerate algorithms.
Advice for Readers
- Don’t be scared off by formulas. Adrian’s strength lies in explaining mathematics in English. I will strive to preserve this style, using intuition to understand the physical meaning behind the formulas. That said, a basic comfort with algebra and the idea of functions (e.g., $f(x)$) will help.
- Distinguish between two worlds. While reading, constantly remind yourself to separate the “Real World (the unknown true value $\theta$)” from the “Model World (the estimated value $\hat{\theta}$).” This distinction is easy to state but surprisingly easy to forget mid-analysis.
- Engage and think. At the end of each post, I will include thought-provoking questions from the original course. For example: “Why is the Objective Function Value (OFV) reported by NONMEM sometimes negative?”
Ready? Let us begin this statistical expedition from “Sample” to “Population.”
Note: This blog series is primarily based on the lecture recordings and slides of Adrian Dunne. It is intended as shared study notes. If there are any misunderstandings, please refer to the original course material.