Getting PyMC3 to work
In the last few days, I have been diving into Cameron Davidson-Pilon’s book Bayesian Methods for Hackers, available here.
The book is an excellent, practical introduction to probabilistic programming and, more precisely, to the usage of Markov Chain Monte Carlo, purely in Python and using PyMC.
The printed version of the book was published a while ago and uses the PyMC library over Python 2.7. Since then, the ecosystem has changed a bit, with the publication of PyMC3, which updates the library to Python 3. Although the Jupyter notebooks from the book have been updated to this context (see here), there have been further updates that require extra work.
In this blog post I collect a few notes about how to get this library to work as of March 2021.
Installation
TL;DR: Use conda (and pyenv).
More precisely, use pyenv together with the pyenv-virtualenv plugin to install anaconda, then install PyMC3 from the conda-forge repository.
pyenv install anaconda3-5.3.1
pyenv virtualenv anaconda3-5.3.1 pymctest
pyenv activate pymctest
conda install -c conda-forge pymc3
(Yet) another tale about the pains of Python package management
I rely on pyenv to juggle a multitude of python versions on my local machine. This is a life saver, since for different reasons I regularly use python versions from 3.6 up to 3.9.
Most of the time, I use poetry to manage project dependencies.
This is great for production-quality code, since one of the advantages of poetry is affording reproducible builds and easy packaging.
However, poetry in its turn relies on pip
and looks for packages in the PyPI index.
It turns out that the anaconda distribution takes the prize when it comes to setting up an environment for data stuff on a single machine. Getting JupyterLab, Dask or Plotly to wotk are great examples of this: they become easy one-command tasks instead of requiring following a guide.
Moreover, I did not get away with installing PyMC3 from PyPI using poetry. The culprit is the Theano library, which is a tensor calculus library that serves as a backend for PyMC3, doing the heavy lifting of manipulating multidimensional arrays.
The Theano project was discontinued in 2017 (announcement), and the PyMC developers decided to fork the library and continue its development. As a consequence, the PyMC depends on a forked Theano-PyMC library, and official development continues in the aesara library.
I could not figure out which is the particular versions to pin of the aforementioned libraries that would allow a pip
installation to work. However, installing from the conda-forge repository produces a working installation out of the box.
Visualizations using ArviZ
The Bayesian Methods for Hackers book contains excellent visualizations since, of course, visualization in an integral part of Bayesian inference. For this reason, a very significant portion of the code in the book is actually dedicated to rendering good visualizations in matplotlib.
However, when exploring a model one usually goes back to the same kind of plots over and over, such as for example, posterior densities or sample traces.
Nowadays PyMC3 pairs with a visualization library and, as a consequence, the visualization task is easier. This library is ArviZ.
ArviZ includes functions for diagnostics and visualizations of Bayesian inference and is backend agnostic. This means that one can obtain matplotlib plots (the default backend), but also bokeh plots.
Since the code in the book will still produce valid plots using matplotlib and pyplot, the general comment here is that nowadays one can rely on ArviZ for plots. This is great for people like me, who do not take a particular pleasure when working with matplotlib.