Add deterministic advi #564

martiningram · 2025-08-14T10:25:16Z

Hi everyone,

I'm one of the authors of the paper on deterministic ADVI. There is an open feature request for this in PyMC here so I thought I'd kick things off with this PR.

In simple terms, DADVI is like ADVI but rather than using a new draw to estimate its objective at each step, it uses a fixed set of draws during the optimisation. That means that (1) it can use regular off-the-shelf optimisers rather than stochastic optimisation, making convergence more reliable, and (2) it's possible to use techniques to improve the variance estimates. This is in the paper, as well as tools to assess how big the error is from using fixed draws.

This PR covers only the first part -- optimising ADVI with fixed draws. This is because I thought I'd start simple and because I'm hoping that it already addresses a real problem with ADVI, which is the difficulty in assessing convergence.

In addition to adding the code, there is an example notebook in notebooks/deterministic_advi_example.ipynb. It fits DADVI to the PyMC basic linear regression example. I can add more examples, but I thought I'd start simple.

I mostly lifted the code from my research repository, so there are probably some style differences. Let me know what would be important to change.

Note that JAX is needed, but there shouldn't be any other dependencies.

Very keen to hear what you all think! :)

All the best,
Martin

review-notebook-app · 2025-08-14T10:25:21Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

jessegrabowski · 2025-08-14T14:52:06Z

This is super cool -- I'm very excited to look more closely over the next few days.

Since you're ultimately building a loss function and sending it to scipy.optimize, do you think we could re-use any of the machinery that exists for doing that in the laplace_approx module, for example this or this?

ricardoV94 · 2025-08-14T14:57:32Z

What exactly needs jax?

martiningram · 2025-08-14T16:14:17Z

Thank you both very much for having a look so quickly!

@jessegrabowski Good point, yes maybe! I'll take a look.

@ricardoV94 Currently, JAX is used to compute the hvp and the jacobian of the objective. That involves computing a gradient for each of the fixed draws and then taking an average. What's quite nice in JAX is that this can be done with vmap easily: https://github.com/pymc-devs/pymc-extras/pull/564/files#diff-1d6e8b962a8c3ca803c55bea43c19863223ed50ae3814acc55424834ade1215cR44

That said, JAX isn't strictly necessary. Anything that can provide the DADVIFuns is fine: https://github.com/pymc-devs/pymc-extras/pull/564/files#diff-48ee4e85c0ff57f5b8af20dfd608bd0e37c3a2c76169a7bbe499e77ff3802d9dR13 . In fact, I have code in the original research repo that turns the regular hvp and gradient function into the DADVIFuns. But I think it'll be slower because of the for loops e.g. here .

Are you concerned about the JAX dependency? If so, maybe I could have a go at doing a JAX-free version using the code just mentioned and then only support JAX optionally. I do think it might be nice to have since it's probably more efficient and would hopefully also run fast on GPUs. But interested in your thoughts.

Also, I see one of the pre-commit checks seem to be failing. I can do the work to make the pre-commit hooks happy, sorry I haven't done that yet.

zaxtax · 2025-08-14T16:42:29Z

I think a jax dependency is fine. But if it's optional that's obviously even better!

…

On Thu, 14 Aug 2025, 12:14 Martin Ingram, ***@***.***> wrote: *martiningram* left a comment (pymc-devs/pymc-extras#564) <#564 (comment)> Thank you both very much for having a look so quickly! @jessegrabowski <https://github.com/jessegrabowski> Good point, yes maybe! I'll take a look. @ricardoV94 <https://github.com/ricardoV94> Currently, JAX is used to compute the hvp and the jacobian of the objective. That involves computing a gradient for each of the fixed draws and then taking an average. What's quite nice in JAX is that this can be done with vmap easily: https://github.com/pymc-devs/pymc-extras/pull/564/files#diff-1d6e8b962a8c3ca803c55bea43c19863223ed50ae3814acc55424834ade1215cR44 That said, JAX isn't strictly necessary. Anything that can provide the DADVIFuns is fine: https://github.com/pymc-devs/pymc-extras/pull/564/files#diff-48ee4e85c0ff57f5b8af20dfd608bd0e37c3a2c76169a7bbe499e77ff3802d9dR13 . In fact, I have code in the original research repo <https://github.com/martiningram/dadvi/blob/main/dadvi/objective_from_model.py#L5> that turns the regular hvp and gradient function into the DADVIFuns. But I think it'll be slower because of the for loops e.g. here <https://github.com/martiningram/dadvi/blob/main/dadvi/objective_from_model.py#L56> . Are you concerned about the JAX dependency? If so, maybe I could have a go at doing a JAX-free version using the code just mentioned and then only support JAX optionally. I do think it might be nice to have since it's probably more efficient and would hopefully also run fast on GPUs. But interested in your thoughts. — Reply to this email directly, view it on GitHub <#564 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAACCUO4HELS5DTXEAZOJRT3NSYW7AVCNFSM6AAAAACD4JGBS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCOBZGAZTIMZRGQ> . You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

ricardoV94 · 2025-08-14T18:45:45Z

@ricardoV94 Currently, JAX is used to compute the hvp and the jacobian of the objective. That involves computing a gradient for each of the fixed draws and then taking an average. What's quite nice in JAX is that this can be done with vmap easily: h

PyTensor has the equivalent hessian_product_vector and jacobian, and vectorize_graph that does the same as vmap (or more if you have multiple batch dimensions)

The reason I ask is if you don't have anything jax specific you can still end up using jax, but also C or numba which may be better for certain users.

martiningram · 2025-08-14T19:31:30Z

@ricardoV94 Oh cool, thanks, I didn't realise! I'll take a look if I can use those. I agree it would be nice to support as many users as possible.

ricardoV94 · 2025-08-14T19:41:33Z

Happy to assist you. If you're vectorizing the Jacobian you probably want to build jacobian(vectorize=True) which can further be vectorized more nicely.

Everything is described here although a bit scattered: https://pytensor.readthedocs.io/en/latest/tutorial/gradients.html

martiningram · 2025-08-15T10:03:52Z

Hey @ricardoV94 (and potentially others!), I think I could use your advice with the vectorisation. I think I've read enough to do it without using the functions here but I'd really like to try to get this vectorised for speed.

To explain a bit: the code expects the definition of DADVIFuns (https://github.com/pymc-devs/pymc-extras/pull/564/files#diff-48ee4e85c0ff57f5b8af20dfd608bd0e37c3a2c76169a7bbe499e77ff3802d9dR13). The first of these expects two inputs:

The variational parameter vector eta, which is all the means concatenated with the log_sds of the variational parameters. This will have length 2D, where D is the number of parameters in the model (first D is means, second D is log_sds).
A matrix of draws of shape [M, D], with D as before and M the number of draws.

The function should then return the estimate of the kl divergence using these draws, as well as its gradient with respect to the variational parameters. The KL divergence is the sum of the entropy of the approximation (a simple function of the variational parameters only) and the average of the log posterior densities from the draws. That's the part that I'd like to vectorise.

Now in JAX, the way I do this is to...:

Compute the log posterior density for a single draw (i.e. one row in the matrix of draws): https://github.com/pymc-devs/pymc-extras/pull/564/files#diff-1d6e8b962a8c3ca803c55bea43c19863223ed50ae3814acc55424834ade1215cR37
vmap this computation across all the draws and then take the mean here: https://github.com/pymc-devs/pymc-extras/pull/564/files#diff-1d6e8b962a8c3ca803c55bea43c19863223ed50ae3814acc55424834ade1215cR42

Thanks to vectorize_graph, I am hoping I can do something like this with pytensor. My strategy idea was to...:

Define the variational parameter vector eta in pytensor, and transform a single draw with this vector
clone_replace to use the transformed draw as an input to the graph, rather than the current input
Use vectorize_graph to vectorise with respect to the draws
Compute the mean of the densities and get gradients of this mean with respect to the variational parameter vector

This makes sense in my head but the problem I see is that the pymc model's logp seems to expect a dictionary, rather than a flat vector. So as part of the step to get from the new inputs to the density, I need to turn the flat vector into the dictionary. In pymc there is DictToArrayBijection which does this, but I don't think I can use it as part of the pytensor graph.

So in essence, I think I need code to do DictToArrayBijection in pure pytensor. Is there something like that? Or is there another way I am missing? I guess it would be great if I could just have a logp function that takes a flat vector as an input already -- is there a way I can get to that?

Thanks a lot for your help :)

jessegrabowski · 2025-08-15T10:21:37Z

If you get the logp of a pymc model using model.logp (rather than compile_logp or one of the jax helpers), it will just return the symbolic logp graph, which you can then do all your vectorization/replacements on.

The path followed by the laplace code is to freeze the model and extract the negative logp , then create a flat vector input replacing the individual value inputs, then compile the loss_and_grads/hess/hessp functions, (optionally in jax)

My hope is that you can get the correct loss function for DADVI, then you should be able to directly pass it into scipy_optimize_funcs_from_loss and just completely re-use all that.

The 4 steps you outline seem correct to me. pymc.pytensorf.join_nonshared_inputs is the function I think you're looking for to do the pack/unpack operation on different the parameters; I linked to its usage above.

martiningram · 2025-08-15T10:51:43Z

Thanks a lot @jessegrabowski . I'll give it a go!

martiningram · 2025-08-16T15:30:49Z

Hey all, I think I made good progress with the pytensor version. A first version is here: https://github.com/pymc-devs/pymc-extras/pull/564/files#diff-1b6e7da940ec73fce49f5e13ae1db5369ec011cb0b55974ec04d81e519e923f6R55

I think the only major thing missing is to transform the draws back into the constrained space from the unconstrained space. Is there a code snippet anyone could point me to? Thanks for your help and for all the helpful advice you've already given!

zaxtax · 2025-08-16T16:09:52Z

You can make a pytensor function from the model value variables to the output variables. An example of that is how get_jaxified_graph is used in the jax based samplers https://github.com/pymc-devs/pymc/blob/main/pymc/sampling/jax.py#L682 If you look in the source of get_jaxified_graph you can see how it's done

…

On Sat, 16 Aug 2025, 11:31 Martin Ingram, ***@***.***> wrote: *martiningram* left a comment (pymc-devs/pymc-extras#564) <#564 (comment)> Hey all, I think I made good progress with the pytensor version. A first version is here: https://github.com/pymc-devs/pymc-extras/pull/564/files#diff-1b6e7da940ec73fce49f5e13ae1db5369ec011cb0b55974ec04d81e519e923f6R55 I think the only major thing missing is to transform the draws back into the constrained space from the unconstrained space. Is there a code snippet anyone could point me to? Thanks for your help and for all the helpful advice you've already given! — Reply to this email directly, view it on GitHub <#564 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAACCUP5BFLVE5ZDJ4XL6433N5FD5AVCNFSM6AAAAACD4JGBS6VHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZTCOJTG4ZTSMZXG4> . You are receiving this because you commented.Message ID: ***@***.***>

Martin Ingram added 6 commits August 1, 2025 11:54

Add first version of deterministic ADVI

488bd9c

Update API

f46f1cd

Add a notebook example

894f62b

Merge branch 'main' into add_basic_deterministic_advi

a1afaf6

Add to API and add a docstring

637fc3b

Change import in notebook

3e397f7

martiningram mentioned this pull request Aug 14, 2025

ENH: implement deterministic ADVI (DADVI) pymc-devs/pymc#7374

Open

Add jax to dependencies

d954ec7

jessegrabowski added the enhancements New feature or request label Aug 14, 2025

Martin Ingram added 3 commits August 16, 2025 16:42

Add pytensor version

aad9f21

Fix handling of pymc model

ef3d86b

Add (probably suboptimal) handling of the two backends

6bf92ef

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add deterministic advi #564

Add deterministic advi #564

Uh oh!

martiningram commented Aug 14, 2025

Uh oh!

review-notebook-app bot commented Aug 14, 2025

Uh oh!

jessegrabowski commented Aug 14, 2025

Uh oh!

ricardoV94 commented Aug 14, 2025

Uh oh!

martiningram commented Aug 14, 2025 •

edited

Loading

Uh oh!

zaxtax commented Aug 14, 2025 via email

Uh oh!

ricardoV94 commented Aug 14, 2025 •

edited

Loading

Uh oh!

martiningram commented Aug 14, 2025

Uh oh!

ricardoV94 commented Aug 14, 2025

Uh oh!

martiningram commented Aug 15, 2025 •

edited

Loading

Uh oh!

jessegrabowski commented Aug 15, 2025

Uh oh!

martiningram commented Aug 15, 2025

Uh oh!

martiningram commented Aug 16, 2025

Uh oh!

zaxtax commented Aug 16, 2025 via email

Uh oh!

Uh oh!

Add deterministic advi #564

Are you sure you want to change the base?

Add deterministic advi #564

Uh oh!

Conversation

martiningram commented Aug 14, 2025

Uh oh!

review-notebook-app bot commented Aug 14, 2025

Uh oh!

jessegrabowski commented Aug 14, 2025

Uh oh!

ricardoV94 commented Aug 14, 2025

Uh oh!

martiningram commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zaxtax commented Aug 14, 2025 via email

Uh oh!

ricardoV94 commented Aug 14, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

martiningram commented Aug 14, 2025

Uh oh!

ricardoV94 commented Aug 14, 2025

Uh oh!

martiningram commented Aug 15, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jessegrabowski commented Aug 15, 2025

Uh oh!

martiningram commented Aug 15, 2025

Uh oh!

martiningram commented Aug 16, 2025

Uh oh!

zaxtax commented Aug 16, 2025 via email

Uh oh!

Uh oh!

martiningram commented Aug 14, 2025 •

edited

Loading

ricardoV94 commented Aug 14, 2025 •

edited

Loading

martiningram commented Aug 15, 2025 •

edited

Loading