You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/src/tutorials/constrained.md
+32-21Lines changed: 32 additions & 21 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -2,52 +2,63 @@
2
2
3
3
In this tutorial, we will demonstrate how to deal with constrained posteriors in more detail.
4
4
Formally, by constrained posteriors, we mean that the target posterior has a density defined over a space that does not span the "full" Euclidean space $\mathbb{R}^d$:
5
+
5
6
```math
6
7
\pi : \mathcal{X} \to \mathbb{R}_{> 0} ,
7
8
```
9
+
8
10
where $\mathcal{X} \subset \mathbb{R}^d$ but not $\mathcal{X} = \mathbb{R}^d$.
9
11
10
12
For instance, consider the basic hierarchical model for estimating the mean of the data $y_1, \ldots, y_n$:
There are also more complicated examples of constrained spaces.
31
39
For example, a $k$-dimensional variable with a Dirichlet prior will be constrained to live on a $k$-dimensional simplex.
32
40
33
41
Now, most algorithms provided by `AdvancedVI`, such as:
34
42
35
-
-`KLMinRepGradDescent`
36
-
-`KLMinRepGradProxDescent`
37
-
-`KLMinNaturalGradDescent`
38
-
-`FisherMinBatchMatch`
43
+
-`KLMinRepGradDescent`
44
+
-`KLMinRepGradProxDescent`
45
+
-`KLMinNaturalGradDescent`
46
+
-`FisherMinBatchMatch`
39
47
40
48
tend to assume the target posterior is defined over the whole Euclidean space $\mathbb{R}^d$.
41
49
Therefore, to apply these algorithms, we need to do something about the constraints.
42
50
We will describe some recommended ways of doing this.
43
51
44
52
## Transforming the Posterior
53
+
45
54
The most widely applicable way is to transform the posterior $\pi : \mathcal{X} \to \mathbb{R}_{>0}$ to be unconstrained.
46
55
That is, consider some bijective map $b : \mathcal{X} \to \mathbb{R}^{d}$ between the $\mathcal{X}$ and the associated Euclidean space $\mathbb{R}^{d}$.
47
56
Using the inverse of the map $b^{-1}$ and its Jacobian $\mathrm{J}_{b^{-1}}$, we can apply a change of variable to the posterior and obtain its unconstrained counterpart
This idea popularized by Stan[^CGHetal2017] and Tensorflow probability[^DLTetal2017] is, in fact, how most probabilistic programming frameworks enable the use of off-the-shelf Markov chain Monte Carlo algorithms.
52
63
In the context of variational inference, we will first approximate the unconstrained posterior as
53
64
@@ -70,7 +81,6 @@ z \sim q_{b}^* \quad\Leftrightarrow\quad z \stackrel{\mathrm{d}}{=} b^{-1}(\eta)
70
81
The idea of applying a change-of-variable to the variational approximation to match a constrained posterior was popularized by the automatic differentiation VI[^KTRGB2017].
71
82
72
83
[^KTRGB2017]: Kucukelbir, A., Tran, D., Ranganath, R., Gelman, A., & Blei, D. M. (2017). Automatic differentiation variational inference. Journal of machine learning research, 18(14), 1-45.
73
-
74
84
Now, there are two ways how to do this in Julia.
75
85
First, let's define the constrained posterior example above using the `LogDensityProblems` interface for illustration:
76
86
@@ -83,7 +93,7 @@ end
83
93
84
94
function LogDensityProblems.logdensity(prob::Mean, θ)
Refer to the documentation of `Bijectors.jl` for more details.
132
142
133
-
134
143
## Wrap the `LogDensityProblem`
135
144
136
145
The most general and easy way to obtain an unconstrained posterior using a `Bijector` is to wrap our original `LogDensityProblem` to form a new `LogDensityProblem`.
137
-
This approach only requires the user to implement the model-specific `Bijectors.bijector` function as above.
146
+
This approach only requires the user to implement the model-specific `Bijectors.bijector` function as above.
138
147
The rest can be done by simply copy-pasting the code below:
139
148
140
149
```@example constraints
@@ -179,13 +188,14 @@ x = randn(LogDensityProblems.dimension(prob_trans)) # sample on an unconstrained
179
188
LogDensityProblems.logdensity(prob_trans, x)
180
189
```
181
190
182
-
We can also wrap `prob_trans` with `LogDensityProblemsAD.ADGradient` to make it differentiable.
191
+
We can also wrap `prob_trans` with `LogDensityProblemsAD.ADGradient` to make it differentiable.
192
+
183
193
```@example constraints
184
194
using LogDensityProblemsAD
185
195
using ADTypes, ReverseDiff
186
196
187
197
prob_trans_ad = LogDensityProblemsAD.ADgradient(
188
-
ADTypes.AutoReverseDiff(; compile=true), prob_trans; x = randn(2)
Plots.stephist(x[2,:], normed=true, xlabel="Posterior of σ", label=nothing, xlims=(0, 2))
222
-
Plots.vline!([1.0], label="True Value")
231
+
Plots.stephist(x[2, :]; normed=true, xlabel="Posterior of σ", label=nothing, xlims=(0, 2))
232
+
Plots.vline!([1.0]; label="True Value")
223
233
savefig("constrained_histogram.svg")
224
234
```
225
235
226
236

227
237
228
238
We can see that the transformed posterior is indeed a meaningful approximation of the original posterior $\pi(\sigma \mid y_1, \ldots, y_n)$ we were interested in.
229
239
230
-
231
240
## Bake a Bijector into the `LogDensityProblem`
232
241
233
242
A problem with the general approach above is that automatically differentiating through `TransformedLogDensityProblem` can be a bit inefficient (due to `Stacked`), especially with reverse-mode AD.
0 commit comments