PMx-02: Building Blocks – Probability Distributions for Modelers

Publish on: 2023/02/14 Classify at: RESEARCH/Pharmacometrics

Words: 2823 Read:≈ 14min

Summary

A practical review of the probability distributions central to pharmacometric modeling: from the Binomial to the Multivariate Normal. Understanding the building blocks before we build the model.

In the previous post, we established the fundamental divide between Population and Sample, and between Probability and Inference. We said that our job as modelers is to write a model for the probability distribution of the population.

But what exactly is a probability distribution? Before we can write a model, we need to master the building blocks themselves.

This post serves as a practical review of the distributions that pharmacometricians encounter daily. If you have seen these before, treat this as a refresher focusing on the aspects that matter most for population modeling.

Refresher: The Language of Probability

Before we dive into specific distributions, let’s briefly recap the language we use to quantify uncertainty.

Uncertainty: Due to random variation, any prediction we make has uncertainty. Probability is how we express the degree of this uncertainty.
Probability $P(A)$: A number between 0 (Impossible) and 1 (Certain) describing how likely event A is to happen.

Key Concepts

Basic Event Operations:
- Union ($A \cup B$): Event A or B (or both) happens.
- Intersection ($A \cap B$): Event A and B happen simultaneously.
- Difference ($A \setminus B$): Event A happens but B does not.
- Complement ($A^c$ or $\bar{A}$): Event A does not happen (everything not in A).
Basic Event Operations from left to right: Union (A or B), Intersection (A and B), Difference (A but not B), Complement (Not A).
Event Relationships:
- Inclusion ($A \subseteq B$): If A happens, B must happen.
- Equality ($A = B$): Events A and B contain the same outcomes ($A \subseteq B$ and $B \subseteq A$).
- Intersection ($A \cap B \neq \emptyset$): Events A and B can happen simultaneously.
- Mutually Exclusive ($A \cap B = \emptyset$): Events A and B cannot happen simultaneously.
- Opposite ($A \cup \bar{A} = \Omega$): Events are mutually exclusive covering the entire sample space (one must happen).
Event Relationships from left to right: Inclusion (A inside B), Equality (A is B), Intersection (Overlap), Mutually Exclusive (Disjoint), Opposite (Partition).
The Addition Rule (Inclusion–exclusion principle): The probability of A or B happening. $$P(A \cup B) = P(A) + P(B) - P(A \cap B)$$ If mutually exclusive: $P(A \cup B) = P(A) + P(B)$
Visualizing the Addition Rule. Left: General case subtracts the overlap to avoid double-counting. Right: Mutually Exclusive events have no overlap, so probabilities simply add.
Conditional Probability $P(A|B)$: The probability that event A happens given that event B has already happened. This is central to Bayesian inference. $$P(A|B) = \frac{P(A \cap B)}{P(B)} \Rightarrow P(A|B) = \frac{P(B|A)P(A)}{P(B)}$$
Visualizing Conditional Probability. Left: Prior sample space $\Omega$. Right: Conditional sample space given B, where B becomes the new universe and we look at the fraction of A inside it.
The Multiplication Rule: The probability of A and B happening. $$P(A \cap B) = P(A) \cdot P(B|A)$$ If independent: $P(A \cap B) = P(A) \cdot P(B)$
Visualizing the Multiplication Rule with a Tree Diagram. To find the probability of both events happening (A AND B), you walk down the branches and multiply the probabilities.
Independent Events: If A and B are independent, knowing B tells you nothing about A. $$P(AB)=P(A)P(B)$$ $$P(A|B) = P(A)$$

Probability Distributions

Univariate and Multivariate Data

Before analyzing distributions, it is important to distinguish the data we collect:

Univariate data: Record values of a single random variable.
- Examples: Time to event, Number of adverse events (AEs) in a week, Plasma drug concentration at a particular time, AUC.
Multivariate data: Record values of two or more random variables.
- Examples: Gender, Body Weight, Treatment outcome; Plasma drug concentration and pharmacodynamic response; Plasma drug concentration at 1, 2, 4, 8 hours post-dose.

Probability Distribution and Population

To understand population modeling, we must link distributions to populations:

A probability distribution corresponds to a population.
The probability of a randomly selected item from the population having a particular value is given by the probability distribution.
The proportion of the population having a particular value is also given by the probability distribution.

Think of it this way: if the population is a giant bowl of marbles — some red, some blue, some green — then the probability distribution is the recipe that tells you the fraction of each color. Our goal as modelers is to figure out that recipe from a handful of marbles we draw (the sample).

Discrete Univariate Distributions: Counting the Countable

Some random variables can only take on distinct, countable values (typically integers). These are governed by discrete distributions.

The Binomial Distribution

This models the number of successes in a fixed number of independent trials.

Example: If a drug has a 30% response rate, what is the probability that exactly 3 out of 10 patients respond?
Parameters: $n$ (number of trials), $p$ (probability of success).
PMF (Probability Mass Function):

$$P(X = k) = \binom{n}{k} p^k (1-p)^{n-k}$$

Connection to PK/PD: The Binomial distribution underpins binary outcome models (e.g., Logistic Regression) used to model “response” vs. “no response,” or the probability of a clinical event occurring.

The Poisson Distribution

This models the number of events occurring in a fixed interval.

Example: How many adverse events will a patient experience in one year?
Parameter: $\lambda$ (the average rate of occurrence).
PMF:

$$P(X = k) = \frac{\lambda^k e^{-\lambda}}{k!}$$

Connection to PK/PD: The Poisson distribution is useful for modeling count data, such as the number of seizures per week or the number of skin lesions.

Summary Measures: Location and Spread

Before moving to continuous distributions, let’s formalize how we summarize any distribution. Two questions matter: Where is it centered? How wide is it?

Location: Mean, Median, Mode.
Spread: Variance (Standard Deviation), Range.

Mean (Expected Value)

The population mean (Expected Value) is given by:

$$E(Y) = \mu = \sum y f(y)$$

This is the mean of the corresponding population, i.e., it is equivalent to summing all $Y$ values in the population and dividing by the population size.

Sample Mean is given by:

$$\bar{y} = \frac{1}{n} \sum_{i=1}^{n} y_i$$

The sample mean is frequently used to estimate the population mean, i.e., use $\bar{y}$ to estimate $\mu$.

Variance

The population variance is given by:

$$Var(Y) = \sigma^2 = \sum (y - \mu)^2 f(y)$$

Standard Deviation: $\sigma = \sqrt{\sigma^2}$

This is the variance (standard deviation) of the corresponding population.

Sample Variance is given by:

$$s^2 = \frac{1}{n-1} \sum_{i=1}^{n} (y_i - \bar{y})^2$$

The sample variance is frequently used to estimate the population variance, i.e., use $s^2$ to estimate $\sigma^2$.

Why divide by $n-1$?
This is known as Bessel’s Correction. When calculating sample variance, we measure deviations from the sample mean ($\bar{y}$), which is naturally closer to the data points than the true population mean ($\mu$). This makes the spread appear smaller than it really is (biased). Dividing by $n-1$ (degrees of freedom) inflates the result slightly to create an unbiased estimator of the true population variance $\sigma^2$.

Continuous Univariate Distributions: Measuring the Measurable

Most pharmacokinetic variables — concentration, clearance, volume — are continuous. They can take any value within a range.

For discrete variables, we used a Probability Mass Function (PMF) that assigns a probability to each distinct outcome. For continuous variables, we use a Probability Density Function (PDF) instead. A PDF does not give the probability of a single point (which is always zero for continuous variables); rather, the area under the curve over an interval gives the probability that the variable falls within that interval.

The Normal (Gaussian) Distribution

This is the queen of probability distributions in pharmacometrics.

Parameters: $\mu$ (mean), $\sigma^2$ (variance).
PDF:

$$f(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left( -\frac{(x-\mu)^2}{2\sigma^2} \right)$$

Calculation of Mean and Variance:

For continuous distributions like the Normal, we replace summation ($\sum$) with integration ($\int$):

Mean (Expected Value): $$E(X) = \int_{-\infty}^{\infty} x f(x) dx = \mu$$
Variance: $$Var(X) = \int_{-\infty}^{\infty} (x-\mu)^2 f(x) dx = \sigma^2$$

Why is it so important?

Central Limit Theorem: The average of many independent random variables tends toward a Normal distribution, regardless of the individual variable’s distribution.
Error Modeling: Residual errors ($\epsilon$) are typically assumed to be Normally distributed.
Random Effects: The inter-individual variability parameters ($\eta$) are assumed to follow a Multivariate Normal distribution.

The Log-Normal Distribution

If $X$ is Normally distributed, then $Y = e^X$ follows a Log-Normal distribution. Equivalently, a variable $Y$ is Log-Normal if $\ln(Y)$ is Normally distributed.

Why it matters: Pharmacokinetic parameters like Clearance ($CL$) and Volume ($V$) must be positive. The Log-Normal distribution is the natural choice because it is bounded below by zero.

This is why in NONMEM you often see:

$$CL_i = \theta_{CL} \cdot e^{\eta_i}$$

Here, $\eta_i \sim N(0, \omega^2)$, which guarantees $CL_i$ is always positive and Log-Normally distributed.

Try It! Drag the Shape ($\sigma$) slider.

The Median ($e^\mu$) stays fixed at 1 (dashed line), but the Mean ($e^{\mu + \sigma^2/2}$) pulls away to the right.
Notice how the curve becomes incredibly skewed (heavy right tail) as $\sigma$ increases. This is why we use it for biological parameters—it handles large variability without going negative.

Multivariate Probability Distributions

So far, we have dealt with single random variables. In clinical data, however, variables almost always come in groups.

Random Vectors

A Random Vector $\mathbf{Y}$ is simply a collection of random variables grouped into a column vector:

$$ \mathbf{Y} = \begin{bmatrix} Y_1 \\ Y_2 \\ \vdots \\ Y_n \end{bmatrix} $$

For example, in a PK study each patient yields several measurements at once — say, Clearance ($CL$), Volume of Distribution ($V$), and Absorption Rate ($k_a$). For patient $i$, we can write $\mathbf{Y}_i = [CL_i, V_i, k_{a,i}]^T$. Treating them as a vector lets us describe their joint behavior — including how they correlate with each other.

Joint Probability Distribution

The Joint Probability Distribution $f(y_1, y_2, \dots)$ tells us the probability that $Y_1$ takes a specific value $y_1$ AND $Y_2$ takes a specific value $y_2$ simultaneously.

$$ \sum_{y_1} \sum_{y_2} f(y_1, y_2) = 1.0 $$

Example: Bivariate Discrete Distribution

Let $Y_1$ be Renal Function ($0=\text{Normal}$, $1=\text{Impaired}$) and $Y_2$ be the Number of Adverse Events ($0, 1, 2, 3$).

$Y_1$ \ $Y_2$	0	1	2	3
0 (Normal)	0.40	0.27	0.20	0.06
1 (Impaired)	0.01	0.015	0.023	0.022

$f(1, 2)$ represents the probability of having Impaired Function AND 2 Events, which is 0.023.

Joint Probability Distribution: Renal Function vs Number of Events. This 3D bar chart visualizes the probability mass for each combination of outcomes.

Marginal Probability Distribution

Sometimes we want to look at just one variable in isolation, “averaging out” the others. This is called the Marginal Distribution.

$$ f(y_1) = \sum_{\text{all } y_2} f(y_1, y_2) $$

Using our table, we can calculate the marginal probability of Renal Function status:

$Y_1$ \ $Y_2$	0	1	2	3	$f(Y_1)$
0 (Normal)	0.40	0.27	0.20	0.06	0.93
1 (Impaired)	0.01	0.015	0.023	0.022	0.07
$f(Y_2)$	0.41	0.285	0.223	0.082	1.00

P(Normal): $f(Y_1=0) = 0.93$
P(Impaired): $f(Y_1=1) = 0.07$

This tells us that 93% of this population has normal renal function.

Joint & Marginal Distributions. The blue cylinders show the joint probability $f(y_1, y_2)$. The yellow cylinders on the back walls show the marginal distributions $f(y_1)$ and $f(y_2)$.

Conditional Probability Distribution

Marginal distributions tell us about one variable ignoring the other. Conditional distributions update our knowledge when we know the outcome of one variable. This is the heart of Bayesian inference.

$$ f(y_1 | y_2) = \frac{f(y_1, y_2)}{f(y_2)} $$

Example: Suppose we observe a patient who had 2 Adverse Events ($Y_2=2$). What is the probability their renal function is impaired?

We zoom in on the $Y_2=2$ column (Total probability = $0.223$):

$f(\text{Normal}, 2) = 0.20$
$f(\text{Impaired}, 2) = 0.023$

The conditional probability of impairment is:

$$ f(\text{Impaired} | Y_2=2) = \frac{0.023}{0.223} \approx \mathbf{0.103} \quad (10.3\%) $$

Compare this to the baseline (marginal) probability of impairment, which was only $7%$. Knowing the patient had adverse events increases our estimate of renal impairment.

Here are the full Conditional Distributions:

1. Probability of Renal Function given Number of Events ($f(y_1 | y_2)$) Note: Columns sum to 1.0

$Y_1 \setminus Y_2$	0	1	2	3
0 (Normal)	0.98	0.95	0.90	0.73
1 (Impaired)	0.02	0.05	0.10	0.27
Sum	1.0	1.0	1.0	1.0

2. Probability of Events given Renal Function ($f(y_2 | y_1)$) Note: Rows sum to 1.0

$Y_1 \setminus Y_2$	0	1	2	3	Sum
0 (Normal)	0.43	0.29	0.22	0.06	1.0
1 (Impaired)	0.14	0.21	0.33	0.32	1.0

Multivariate Parameters and Covariance Matrix

To define a multivariate distribution (like the Multivariate Normal), we need to generalize the concepts of Mean and Variance.

Mean Vector ($\boldsymbol{\mu}$)

The expected value of each individual random variable is defined by the integral: $$ E(Y_i) = \int_{-\infty}^{\infty} y_i f(y_i) dy_i $$

The Mean Vector collects these individual means for the random vector $\mathbf{Y}$: $$ E[\mathbf{Y}] = \begin{bmatrix} E(Y_1) \\ E(Y_2) \\ \vdots \\ E(Y_n) \end{bmatrix} $$

Variance and Covariance

The variance of a single variable is: $$ Var(Y_i) = \int_{-\infty}^{\infty} (y_i - E(Y_i))^2 f(y_i) dy_i $$ This measures the spread of $Y_i$ around its own mean.

However, variables can also vary together. This co-movement is measured by covariance: $$ Cov(Y_1, Y_2) = \iint (y_1 - E(Y_1))(y_2 - E(Y_2)) f(y_1, y_2) dy_1 dy_2 $$

Positive Covariance: When $Y_1 > E(Y_1)$, $Y_2$ also tends to be $> E(Y_2)$.
Correlation coefficient: A standardized version of covariance, bounded between $-1$ and $+1$: $$ Corr(Y_1, Y_2) = \frac{Cov(Y_1, Y_2)}{\sqrt{Var(Y_1)Var(Y_2)}} $$

The Covariance Matrix ($\boldsymbol{\Sigma}$)

For a random vector $\mathbf{Y} = [Y_1, Y_2, Y_3]^T$, all variabilities are organized into a symmetric matrix. Diagonal elements are variances; off-diagonal elements are covariances:

$$ Cov(\mathbf{Y}) = \begin{bmatrix} Var(Y_1) & Cov(Y_1, Y_2) & Cov(Y_1, Y_3) \\ Cov(Y_2, Y_1) & Var(Y_2) & Cov(Y_2, Y_3) \\ Cov(Y_3, Y_1) & Cov(Y_3, Y_2) & Var(Y_3) \end{bmatrix} $$

Visualizing Covariance: A Pairwise Scatter Matrix. 1. Diagonal (Blue): Histograms show the variance ($Var(Y_i)$) of each variable. 2. Off-Diagonal (Gray): Scatter plots show the relationship between pairs. 3. Interpretation: A slope indicates correlation (Positive Covariance), while a round cloud indicates independence (Zero Covariance).

$Cov(Y_i, Y_j) = Cov(Y_j, Y_i)$, so the matrix is always symmetric across the diagonal.

Key Properties of the Covariance Matrix:

Diagonal elements $\ge 0$: Variances cannot be negative.
Symmetry: $\Sigma_{ij} = \Sigma_{ji}$.
Positive Semidefinite: For any non-zero vector $\mathbf{a}$: $$ \mathbf{a}^T \boldsymbol{\Sigma} \mathbf{a} \ge 0 $$ Intuitively, this means you can never combine the variables in a way that produces negative variance — no matter which direction you “look” in parameter space, the spread is zero or positive. If this condition is violated, the matrix does not represent a valid distribution.

The Multivariate Normal: When Parameters Are Correlated

In population models, we estimate the variability in several parameters simultaneously (e.g., Clearance and Volume). These are rarely independent — a patient with a large Volume might also have a large Clearance.

The Multivariate Normal (MVN) distribution extends the Normal distribution to multiple dimensions. Its characteristic “bell” shape is centered at $\boldsymbol{\mu}$, and its orientation and spread are determined by the covariance matrix $\boldsymbol{\Sigma}$.

Anatomy of a Bivariate Normal Distribution: 1. The mesh shows the Joint density f(y₁, y₂). 2. The blue filled curve on the wall is the Marginal density f(y₂). 3. The red line and orange projection show the Conditional density f(y₁ | y₂).

From One to Many Dimensions

Recall the univariate Normal PDF for a single variable $x$:

$$f(x) = \frac{1}{\sigma\sqrt{2\pi}} \exp\left( -\frac{1}{2}\frac{(x-\mu)^2}{\sigma^2} \right)$$

The multivariate extension for a vector $\boldsymbol{\eta}$ of $p$ random effects replaces scalar operations with matrix operations:

$$f(\boldsymbol{\eta}) = \frac{1}{(2\pi)^{p/2} |\boldsymbol{\Omega}|^{1/2}} \exp\left( -\frac{1}{2} \boldsymbol{\eta}^T \boldsymbol{\Omega}^{-1} \boldsymbol{\eta} \right)$$

Where:

$\boldsymbol{\eta}$ is the vector of random effects (e.g., $[\eta_{CL}, \eta_V]$).
$\boldsymbol{\Omega}$ is the variance-covariance matrix. Its diagonal contains the variances ($\omega^2_{CL}$, $\omega^2_V$), and its off-diagonal elements contain the covariance ($\text{cov}(CL, V)$).

The $\Omega$ Matrix is King: In NONMEM, the $OMEGA block directly specifies this matrix. A diagonal $\Omega$ assumes independence between random effects. A full block $\Omega$ allows them to be correlated. Choosing between the two is a modeling decision with real consequences.

Bivariate Normal Distribution with positive correlation. Each ellipse is a contour of equal probability density. The tilt reveals the correlation encoded in the off-diagonal elements of $\Omega$.

Reading the contour plot:

Each ellipse connects points of equal probability density — think of it as slicing the 3D bell horizontally. The innermost ellipse encloses the region of highest density.
A circular contour means $\Omega$ is diagonal (no correlation; the two parameters vary independently).
A tilted ellipse means $\Omega$ has non-zero off-diagonal elements (correlation). Here the ellipse tilts upward to the right, indicating positive correlation — when one parameter is above its mean, the other tends to be as well.
The degree of elongation reflects the strength of correlation: a narrow, cigar-shaped ellipse means strong correlation ($|r| \to 1$), while a rounder ellipse means weaker correlation ($|r| \to 0$).

Summary

Discrete distributions (Binomial, Poisson) handle count data and binary outcomes.
Continuous distributions (Normal, Log-Normal) are the workhorses for PK parameter modeling and error terms.
The Multivariate Normal distribution describes how multiple random effects vary together and is defined by the $\Omega$ matrix.

What’s Next

We now know what distributions look like. But how do we estimate their parameters from data? In the next post, we tackle Estimation Theory — what makes a good estimator, what happens when we sample, and what “Bias” really means.