You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: HISTORY.md
+90Lines changed: 90 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -72,6 +72,96 @@ When sampling using MCMCChains, the chain object will no longer have its `chain.
72
72
Instead, you can calculate this yourself from the log-likelihoods stored in the chain.
73
73
For SMC samplers, the log-evidence of the entire trajectory is stored in `chain[:logevidence]` (which is the same for every particle in the 'chain').
74
74
75
+
## AdvancedVI 0.6
76
+
77
+
Turing.jl v0.42 updates `AdvancedVI.jl` compatibility to 0.6 (we skipped the breaking 0.5 update as it does not introduce new features).
78
+
`[email protected]` introduces major structural changes including breaking changes to the interface and multiple new features.
79
+
The summary of the changes below are the things that affect the end-users of Turing.
80
+
For a more comprehensive list of changes, please refer to the [changelogs](https://github.com/TuringLang/AdvancedVI.jl/blob/main/HISTORY.md) in `AdvancedVI`.
81
+
82
+
### Breaking changes
83
+
84
+
A new level of interface for defining different variational algorithms has been introduced in `AdvancedVI` v0.5. As a result, the function `Turing.vi` now receives a keyword argument `algorithm`. The object `algorithm <: AdvancedVI.AbstractVariationalAlgorithm` should now contain all the algorithm-specific configurations. Therefore, keyword arguments of `vi` that were algorithm-specific such as `objective`, `operator`, `averager` and so on, have been moved as fields of the relevant `<: AdvancedVI.AbstractVariationalAlgorithm` structs.
85
+
86
+
In addition, the outputs also changed. Previously, `vi` returned both the last-iterate of the algorithm `q` and the iterate average `q_avg`. Now, for the algorithms running parameter averaging, only `q_avg` is returned. As a result, the number of returned values reduced from 4 to 3.
87
+
88
+
For example,
89
+
90
+
```julia
91
+
q, q_avg, info, state =vi(
92
+
model, q, n_iters; objective=RepGradELBO(10), operator=AdvancedVI.ClipScale()
vi(model, q, n_iters; algorithm=KLMinRepGradProxDescent(adtype; n_samples=10))
123
+
```
124
+
125
+
Lastly, to obtain the last-iterate `q` of `KLMinRepGradDescent`, which is not returned in the new interface, simply select the averaging strategy to be `AdvancedVI.NoAveraging()`. That is,
126
+
127
+
```julia
128
+
q, info, state =vi(
129
+
model,
130
+
q,
131
+
n_iters;
132
+
algorithm=KLMinRepGradDescent(
133
+
adtype;
134
+
n_samples=10,
135
+
operator=AdvancedVI.ClipScale(),
136
+
averager=AdvancedVI.NoAveraging(),
137
+
),
138
+
)
139
+
```
140
+
141
+
Additionally,
142
+
143
+
- The default hyperparameters of `DoG`and `DoWG` have been altered.
-`estimate_objective` now always returns the value to be minimized by the optimization algorithm. For example, for ELBO maximization algorithms, `estimate_objective` will return the *negative ELBO*. This is breaking change from the previous behavior where the ELBO was returned.
146
+
- The initial value for the `q_meanfield_gaussian`, `q_fullrank_gaussian`, and `q_locationscale` have changed. Specificially, the default initial value for the scale matrix has been changed from `I` to `0.6*I`.
147
+
- When using algorithms that expect to operate in unconstrained spaces, the user is now explicitly expected to provide a `Bijectors.TransformedDistribution` wrapping an unconstrained distribution. (Refer to the docstring of `vi`.)
148
+
149
+
### New Features
150
+
151
+
`[email protected]` adds numerous new features including the following new VI algorithms:
152
+
153
+
-`KLMinWassFwdBwd`: Also known as "Wasserstein variational inference," this algorithm minimizes the KL divergence under the Wasserstein-2 metric.
154
+
-`KLMinNaturalGradDescent`: This algorithm, also known as "online variational Newton," is the canonical "black-box" natural gradient variational inference algorithm, which minimizes the KL divergence via mirror descent under the KL divergence as the Bregman divergence.
155
+
-`KLMinSqrtNaturalGradDescent`: This is a recent variant of `KLMinNaturalGradDescent` that operates in the Cholesky-factor parameterization of Gaussians instead of precision matrices.
156
+
-`FisherMinBatchMatch`: This algorithm called "batch-and-match," minimizes the variation of the 2nd order Fisher divergence via a proximal point-type algorithm.
157
+
158
+
Any of the new algorithms above can readily be used by simply swappin the `algorithm` keyword argument of `vi`.
159
+
For example, to use batch-and-match:
160
+
161
+
```julia
162
+
vi(model, q, n_iters; algorithm=FisherMinBatchMatch())
163
+
```
164
+
75
165
## External sampler interface
76
166
77
167
The interface for defining an external sampler has been reworked.
Copy file name to clipboardExpand all lines: docs/src/api.md
+13-6Lines changed: 13 additions & 6 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -110,12 +110,19 @@ Turing.jl provides several strategies to initialise parameters for models.
110
110
111
111
See the [docs of AdvancedVI.jl](https://turinglang.org/AdvancedVI.jl/stable/) for detailed usage and the [variational inference tutorial](https://turinglang.org/docs/tutorials/09-variational-inference/) for a basic walkthrough.
|`q_locationscale`|[`Turing.Variational.q_locationscale`](@ref)| Find a numerically non-degenerate initialization for a location-scale variational family |
117
-
|`q_meanfield_gaussian`|[`Turing.Variational.q_meanfield_gaussian`](@ref)| Find a numerically non-degenerate initialization for a mean-field Gaussian family |
118
-
|`q_fullrank_gaussian`|[`Turing.Variational.q_fullrank_gaussian`](@ref)| Find a numerically non-degenerate initialization for a full-rank Gaussian family |
|`q_locationscale`|[`Turing.Variational.q_locationscale`](@ref)| Find a numerically non-degenerate initialization for a location-scale variational family |
117
+
|`q_meanfield_gaussian`|[`Turing.Variational.q_meanfield_gaussian`](@ref)| Find a numerically non-degenerate initialization for a mean-field Gaussian family |
118
+
|`q_fullrank_gaussian`|[`Turing.Variational.q_fullrank_gaussian`](@ref)| Find a numerically non-degenerate initialization for a full-rank Gaussian family |
119
+
|`KLMinRepGradDescent`|[`Turing.Variational.KLMinRepGradDescent`](@ref)| KL divergence minimization via stochastic gradient descent with the reparameterization gradient |
120
+
|`KLMinRepGradProxDescent`|[`Turing.Variational.KLMinRepGradProxDescent`](@ref)| KL divergence minimization via stochastic proximal gradient descent with the reparameterization gradient over location-scale variational families |
121
+
|`KLMinScoreGradDescent`|[`Turing.Variational.KLMinScoreGradDescent`](@ref)| KL divergence minimization via stochastic gradient descent with the score gradient |
122
+
|`KLMinWassFwdBwd`|[`Turing.Variational.KLMinWassFwdBwd`](@ref)| KL divergence minimization via Wasserstein proximal gradient descent |
123
+
|`KLMinNaturalGradDescent`|[`Turing.Variational.KLMinNaturalGradDescent`](@ref)| KL divergence minimization via natural gradient descent |
124
+
|`KLMinSqrtNaturalGradDescent`|[`Turing.Variational.KLMinSqrtNaturalGradDescent`](@ref)| KL divergence minimization via natural gradient descent in the square-root parameterization |
125
+
|`FisherMinBatchMatch`|[`Turing.Variational.FisherMinBatchMatch`](@ref)| Covariance-weighted Fisher divergence minimization via the batch-and-match algorithm |
0 commit comments