How to evaluate the first-order sensitivity indices numerically?

subject area

in short

Based on samples from a Monte Carlo simulation, the first-order sensitivity indices can be estimated without additional model calls. This requires the numerical approximation of a one-dimensional conditional expectation.

The expected mean conditional on fixing a specifc parameter is the basis for evaluating the first order sobol index associated with that parameter..

prerequisite knowledge

What is sensitivity analysis?

What are the first-order sensitivity indices?

The first-order sensitivity indices are a popular sensitivity measure to identify the most influential uncertain parameters in a probabilistic model. A property that makes first-order sensitivity indices highly suitable for practical application is that their evaluation is easy, computationally cheap and comprehensible.

The first-order sensitivity indices can be evaluated using a simple post-processing step based on the samples generated in a Monte Carlo simulation — no additional model runs are required. Let's assume we generated $N$ independent stochastic realizations of the uncertain vector of input parameters $\mathbf{X}$, namely $\{\mathbf{x}_1,\mathbf{x}_2,\ldots,\mathbf{x}_N\}$. For each sampled input vector $\mathbf{x}_i$, the associated realization of the model output $y_i$ is available; i.e., $\{y_1,y_2,\ldots,y_N\}$, where the model output $Y$ is evaluated through model $h:\mathbf{X}\to Y$.

Mathematical framework

Remember the definition of the first-order sensitivity index $S_i$ of model input parameter $X_i$:
$$
S_i = \frac{V_i}{\operatorname{Var}[Y]} = \frac{\operatorname{Var}_{X_i}\left[ \operatorname{E}_{\mathbf{X}_{\sim i}}\left[Y|X_i\right] \right]}{\operatorname{Var}[Y]} \;,
$$ where $\operatorname{E}_{\mathbf{X}_{\sim i}}\left[Y|X_i\right]$ is the expected value of the model output with respect to all input variables except $X_i$ (which is fixed). Note that $\operatorname{E}_{\mathbf{X}_{\sim i}}\left[Y|X_i\right] = a_i(X_i)$ is a one-dimensional function of model input parameter $X_i$. Based on knowledge about $a_i(\cdot)$, the quantity $V_i$ can be evaluated by the following one-dimensional integral:
$$
V_i = \int_{\Omega_i} \left(a_i(x_i) - \operatorname{E}[Y]\right)\cdot f_{X_i}(x_i) \,\operatorname{d}x_i \;,
$$ where ${\Omega_i}$ is the domain of $X_i$, $f_{X_i}(\cdot)$ is the probability density function of model input parameter $X_i$ and the expected model output $\operatorname{E}[Y]$ can be estimated through the sample mean as:
$$
\operatorname{E}[Y] = \frac{1}{N}\sum_{i=1}^N y_i\;.
$$

Estimating the conditional expectation

The problem is that normally we do not have explicit knowledge about $a_i(\cdot)$. Nevertheless, we can estimate it from the available Monte Carlo samples. Arbitrarily complex methods can be applied to learn the conditional expectation $a_i(\cdot)$. A simple and straightforward approach is to assume $a_i(\cdot)$ piecewise constant; i.e., to discretize $\Omega_i$ into $K$ mutually exclusive and collectively exhaustive bins $\{B_{i,1},B_{i,2},\ldots,B_{i,K}\}$ and to assume $a_i(\cdot)$ constant within each bin; i.e., $a_i(x_i) = a_{i,j} \,\forall\, x_i \in B_{i,j}$. The quantity $a_{i,j}$ can be estimated as the sample mean of all samples that fall into bin $B_{i,j}$.

An often employed discretization strategy is to select the bins such that approximately the same number of samples fall into each bin (compare [Straub, 2019] and section 4.1.7 in [Prieur and Tarantola, 2017]). The total number $K$ of bins can increase with the available number $N$ of samples. A general rule of thumb can be $K=\sqrt{N}$. However, $K>40$ is usually not beneficial for most problems.

Alternative sampling-based strategy

The approach explained above targets the evaluation of the first-order sensitivity indices as a post-processing strategy based on an already conducted Monte Carlo simulation; i.e., besides the model calls during the Monte Carlo simulation, no additional model calls are needed. The required number of samples in the explained approach does not depend on the total number $M$ of model parameters. Note that this explained approach is different from the Monte Carlo based strategy outlined e.g. in [Saltelli, 2008], which requires $N\cdot(M+2)$ model calls and cannot be applied as a post-processing strategy.

References

[Straub, 2019] Straub, Daniel: Lecture Notes in Engineering Risk Analysis. Technische Universität München, 2019.

[Lando and Ortobelli, 2015] Lando, Tommaso, and Ortobelli, Sergio: On the approximation of a conditional expectation. WSEAS Transactions on Mathematics 14 (2015): 237-247.

[Prieur and Tarantola, 2017] Prieur, Clémentine, and Tarantola, Stefano: "Variance-based sensitivity analysis: Theory and estimation algorithms." Handbook of uncertainty quantification (2017): 1217-1239.

[Saltelli, 2008] Saltelli, Andrea, et al. Global sensitivity analysis: the primer. John Wiley & Sons, 2008.

subject area

Sensitivity analysis

in short