Normal-gamma distribution

subject area

in short

The Normal-gamma distribution serves as conjugate prior for a Normal distribution with unknown mean and standard deviation. As parameters for the prior, the prior mean and variance can be used, along with the number of associated (pseudo-)observations.

The probability density function of a Normal distribution, a log-Normal distribution and a Gamma distribution.

The Normal-gamma distribution $(\mu,\lambda)\sim\mathcal{NG}\left(\gamma,\kappa,\eta,\xi^2\right)$ is a bivariate continuous probability distribution that has four parameters ($\gamma,\kappa,\eta,\xi^2$). It is the conjugate prior of a Normal distribution $X\sim \mathcal{N}\left(\mu,\sigma=\sqrt{\lambda^{-1}}\right)$ with unknown mean $\mu$ and precision $\lambda$ (where the precision is defined as the reciprocal value of the variance $\sigma^2 = \lambda^{-1}$).

Definition

The pair of random variables $(\mu,\lambda)$ follows a Normal-gamma distribution $\mathcal{NG}\left(\gamma,\kappa,\eta,\xi^2\right)$ under the following conditions: (A) The conditional distribution $\mu|\lambda$ is a Normal distribution $\mathcal{N}\left(\gamma,\sigma=1/\sqrt{\kappa\lambda}\right)$ with mean $\gamma$ and standard deviation $\sigma=1/\sqrt{\kappa\lambda}$. (B) The marginal distribution of $\lambda$ is a Gamma distribution with shape parameter $\alpha=\frac{\eta}{2}$ and rate parameter $\beta=\frac{\eta\xi^2}{2}$.

The probability density function of the Normal-gamma distribution is $$
f\left(\mu,\lambda|\gamma,\kappa,\alpha,\beta\right) = C \cdot \lambda^{\alpha-\frac{1}{2}} \cdot
\exp\left( -\lambda \beta - \frac{\kappa\lambda}{2}\left(\mu-\gamma\right)^2 \right)
\;,$$ with scaling constant $C$: $$
C = \frac{\beta^\alpha \sqrt{\kappa}}{\Gamma(\alpha)\sqrt{2\pi}}\;.
$$

Marginal distributions for $\mu$ and $\lambda$

The marginal distribution for $\mu$ is a non-standardized Student's t distribution with $\eta$ degrees of freedom, location parameter $\gamma$ and scale parameter $\sqrt{\frac{\xi^2}{\kappa}}$. Thus, the standard deviation for the uncertainty about $\mu$ is $\sqrt{\frac{\xi^2}{\kappa}\cdot\frac{\eta}{\eta-2}}$ (for $\eta>2$).

The marginal distribution for $\lambda$ is by definition a Gamma distribution with shape parameter $\alpha=\frac{\eta}{2}$ and rate parameter $\beta=\frac{\eta\xi^2}{2}$. Consequently, the marginal distribution for $\sigma^2=\lambda^{-1}$ follows an inverse gamma distribution. The mean value of $\sigma^2$ is $\xi^2\cdot\frac{\eta}{\eta-2}$ (for $\eta>2$). The coefficient of variation of $\sigma^2$ is $\frac{1}{\sqrt{\eta/2-2}}$ (for $\eta>4$).

Predictive distribution for $X$

The predictive distribution for $X$ is a location-scale t distribution (i.e., a non-standardized Student's t distribution) with:

degrees of freedom: $\eta$,
location parameter: $\gamma$,
scale parameter: $\tau = \xi\sqrt{1+\frac{1}{\kappa}}=\sqrt{\xi^2\cdot\left(1+\frac{1}{\kappa}\right)}$.

Equivalently, the uncertainty about $X$ can be expressed in terms of a standard Student's t distribution T with $\eta$ degrees of freedom as: $X=\gamma+\tau T$.

The predictive distribution for $X$ has the following properties:

mean value: $\gamma$ (for $\eta>1$)
standard deviation: $\tau\sqrt{\frac{\eta}{\eta-2}} = \sqrt{\xi^2\cdot\left(1+\frac{1}{\kappa}\right)\cdot\frac{\eta}{\eta-2}}$ (for $\eta>2$)

Bayesian analysis based on the Normal-Gamma distribution

Posterior distribution of the parameters

Assume data vector $\mathbf{x} = \left[x_1,x_2,\ldots,x_n\right]$, where the individual elements $x_i$ of $\mathbf{x}$ are distributed according to a Normal distribution with unknown mean $\mu$ and standard deviation $\sigma=\sqrt{\lambda^{-1}}$. If we select a Normal-Gamma distribution $\mathcal{NG}\left(\gamma_0,\kappa_0,\eta_0,\xi^2_0\right)$ as prior distribution for $(\mu,\lambda)$, the posterior distribution for $(\mu,\lambda)$ is also a Normal-Gamma distribution; i.e., $(\mu,\lambda)|\mathbf{x} \sim \mathcal{NG}\left(\gamma_n,\kappa_n,\eta_n,\xi^2_n\right)$.

Posterior values of the parameters of the Normal-gamma distribution can be obtained as follows: $$ \gamma_n = \frac{\kappa_0\gamma_0+nm}{\kappa_0+n} \;,$$ $$
\kappa_n = \kappa_0 + n \;,$$ $$
\eta_n = \eta_0 + n \;,$$ $$
\xi^2_n = \frac{\eta_0}{\eta_0+n} \xi^2_0 + \frac{n}{\eta_0+n} \left( s^2 + \frac{\kappa_0 \left(m-\gamma_0\right)^2}{\kappa_0+n} \right) \;,
$$ with $$
m = \frac{1}{n}\sum_{i=1}^n x_i $$ and $$
s^2 = \frac{1}{n} \sum_{i=1}^n \left(x_i-m\right)^2\;.$$

Interpretation of the parameters

The parameter $\kappa_0$ quantifies the information content in the prior with respect to the mean value; $\kappa_0$ corresponds to the number of (pseudo-)observations associated with the prior (for the mean). For the posterior distribution, the parameter value of $\kappa_n$ is obtained by adding $\kappa_0$ with the number of observed samples $n$.

The expected posterior mean of $X$ (which corresponds to the posterior mean of $\mu$) is equal to by $\gamma_n$. The posterior mean is evaluated as the weighted average from the prior mean $\gamma_0$ and the observed sample mean $m$, where $\kappa_0$ and $n$ serve as weights.

The parameter $\eta_0$ quantifies the information content in the prior with respect to the precision (or variance); $\eta_0$ corresponds to the number of (pseudo-)observations associated with the prior (for the precision/variance). For the posterior distribution, the parameter value of $\eta_n$ is obtained by adding $\eta_0$ with the number of observed samples $n$.

The posterior mean of the precision $\lambda$ is equal to $\frac{1}{\xi^2}$. The posterior mean of the variance $\sigma^2$ is $\xi^2\cdot\frac{\eta}{\eta-2}$. The posterior value for $\xi_2$ is evaluated as the weighted average from the prior value for $\xi^2_0$ and the observed sample variance $s^2$ (and an interaction term to account for a change in the mean values), where $\eta_0$ and $n$ serve as weights.

Non-informative prior

A non-informative prior (that is also used in Annex D.7 of Eurocode 0 to estimate characteristic values of material properties) is obtained by setting $\kappa_0=0$ and $\eta_0=0$. Values for $\gamma_0$ and $\xi^2_0$ do not need to be specified for this particular case. The resulting prior for both $\mu$ and $\lambda$ is improper.

Alternative parametrizations

Variant 1

Usually, the Normal-gamma distribution is parametrized in terms of parameters $\gamma,\kappa,\alpha,\beta$, instead of the (easier to interpret) parameters $\gamma,\kappa,\eta,\xi^2$. The relation between $\alpha, \beta$ and $\eta,\xi^2$ is: $$
\eta = 2\alpha
\Leftrightarrow
\alpha = \frac{\eta}{2} \;,
$$ $$
\xi^2 = \frac{\beta}{\alpha}
\Leftrightarrow
\beta = \frac{\eta \xi^2}{2} \;.
$$

The Bayesian updating rules for this alternative parametrization are: $$ \gamma_n = \frac{\kappa_0\gamma_0+nm}{\kappa_0+n} \;,$$ $$
\kappa_n = \kappa_0 + n \;,$$ $$
\alpha_n = \alpha_0 + \frac{n}{2} \;,$$ $$
\beta_n = \beta_0 + \frac{1}{2} \left( n s^2 + \frac{\kappa_0 n \left(m-\gamma_0\right)^2}{\kappa_0+n} \right) \;.$$

Variant 2

[JCSS, Part 3, material properties] uses a slightly different notation. The values for the prior parameters specified in JCSS can be transferred to the probabilistic model presented in this post as:
$\gamma_0 = m^\prime$
$\kappa_0 = n^\prime$
$\eta_0 = v^\prime$
$\xi^2_0 = \left(s^\prime\right)^2$

subject area

Probabilistic modeling

in short