We will now follow up our previous post on automatic MCMC hyperparameter sensitivity with a detailed use case taken from the Stan examples. All the code to produce the results below, as well as more examples, can be found in the `examples`

folder on the rgiordan/StanSensitivity git repo.

This simple, real-life example has a number of interesting features:

- The authors accidentally included an extremely informative prior,
- The informative prior masked a real numerical problem in MCMC,
- Fixing the model and prior qualitatively changed the conclusion of the analysis.

In other words, **formal sensitivity analysis can help you from coming to the wrong conclusion from your MCMC analysis**. It's easy to accidentally choose excessively informative priors. We can help you discover them!

The use case we'll focus on is based on chapter 13.5 of the excellent book, Data Analysis Using Regression and Multilevel/Hierarchical Models (Gelman and Hill). In particular, we'll look at the `earnings_vary_si.stan`

model from the Stan examples.

The data consists of 1059 observations of earnings and height for four different ethnicites. We will be regressing log earnings (`earn`

) on height (`height`

) within each ethnicity (`eth`

) in a hierarchical model, with a different slope and intercept for each ethnicity. The model in the Stan language is as follows (the data and hyperparmeters are omitted for brevity -- hopefully, their definitions are clear from context):

```
parameters {
vector[4] a1;
vector[4] a2;
real<lower=0> sigma_a1;
real<lower=0> sigma_a2;
real<lower=0> sigma_y;
real mu_a1;
real mu_a2;
}
model {
vector[N] y_hat;
for (i in 1:N)
y_hat[i] = a1[eth[i]] + a2[eth[i]] * height[i];
mu_a1 ~ normal(mu_a1_loc, mu_a1_scale);
mu_a2 ~ normal(mu_a2_loc, mu_a2_scale);
a1 ~ normal(10 * mu_a1, sigma_a1);
a2 ~ normal(0.01 * mu_a2, sigma_a2);
sigma_a1 ~ cauchy(sigma_a1_loc, sigma_a1_scale);
sigma_a2 ~ cauchy(sigma_a2_loc, sigma_a2_scale);
sigma_y ~ cauchy(sigma_y_loc, sigma_y_scale);
log_earn ~ normal(y_hat, sigma_y);
}
```

The original Stan example set all the location hyperparameters, `*_loc`

to 0, and all the scale parameters `*_scale`

to 1. We will evaluate the model robustness at these values.

We ran Stan for 10000 MCMC draws, which took 139 seconds. Suppose we want to know whether there is evidence that the relationship between height and earnings is different for different ethnicities. The Stan results for the variance of the height effect follow.

```
Inference for Stan model: earnings_vary_si_generated.
1 chains, each with iter=10000; warmup=5000; thin=1;
post-warmup draws per chain=5000, total post-warmup draws=5000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
sigma_a2 0.06 0.00 0.04 0.02 0.03 0.05 0.07 0.16 2776 1
Samples were drawn using NUTS(diag_e) at Mon Jan 8 11:54:47 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
```

A notable conclusion is that most of the posterior mass of `sigma_a2`

is greater than zero. This seems to provide evidence that the relationship between height and earnings *is* different for different ethnic groups.

We then ran our sensitivity analysis, which took 7 seconds. The result (after some filtering) was the following graph:

There seems to be considerable sensitivity to the `mu_a2`

prior, especially in the posterior expectation of `mu_a2`

itself. You might actually have a prior belief that `mu_a2_loc`

should be close to zero, meaning that the a prior we don't believe that height has a strong effect on earnings. However, a strong a prior belief about the scale seems harder to justify -- if we made the prior 4x wider by setting `mu_a2_scale`

to 4 instead of 1, , which seems subjectively reasonable, we would hope our results would not change. But the sensitivity analysis suggests that `mu_a2`

would change by more than 1.25 * (4 - 1) = 3.75 posterior standard deviations!

That seems potentially problematic, so we investigated further by re-running the model with different values of `mu_a2_scale`

. This was time-consuming -- all the re-runs together took about 27 minutes -- but at least we knew where to look. Here are the results, which show the linear approximation (and its standard errors) plotted against the actual MCMC posterior means (with their standard errors).

The linear approximation wasn't perfect, but it detected real non-robustness. We additionally noted that, for large values of `mu_a2_scale`

, the model started to exhibit divergent transitions, indicating a badly scaled posterior. After more detailed analysis (which you can find in the full Jupyter notebook in the git repo), **we discovered that the model as-written in the Stan examples suffered from real numerical problems which were being masked by an excessively informative prior on mu_a2**.

We corrected the model, and re-ran MCMC. The new posterior is qualitatively different from the original -- in particular, once the highly informative prior is removed and the numerical problems are corrected, there is no longer strong posterior evidence that the relationship between height and earnings is different for different ethnicities.

```
Inference for Stan model: earnings_vary_si_corrected_generated.
1 chains, each with iter=10000; warmup=5000; thin=1;
post-warmup draws per chain=5000, total post-warmup draws=5000.
mean se_mean sd 2.5% 25% 50% 75% 97.5% n_eff Rhat
sigma_a2 0.01 0.00 0.01 0.00 0.00 0.00 0.01 0.04 734 1
Samples were drawn using NUTS(diag_e) at Mon Jan 8 16:45:48 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at
convergence, Rhat=1).
```

We hope you've been inspired by this example to examining your own Stan models for hyperparameter sensitivity. Feel free to head over to the git repo rgiordan/StanSensitivity/ and give it a try!