diff --git a/docs/source/api.rst b/docs/source/api.rst
index b62b9145c7..7568a9a2f1 100644
--- a/docs/source/api.rst
+++ b/docs/source/api.rst
@@ -23,6 +23,13 @@ API Reference
api/shape_utils
api/misc
+------------------
+Dimensionality
+------------------
+PyMC provides numerous methods, and syntatic sugar, to easily specify the dimensionality of
+Random Variables in modeling. Refer to {ref}`dimensionality` notebook to see examples
+demonstrating the functionality.
+
--------------
API extensions
--------------
diff --git a/docs/source/learn/examples.md b/docs/source/learn/examples.md
index 7a2af2451b..4512fe9e8c 100644
--- a/docs/source/learn/examples.md
+++ b/docs/source/learn/examples.md
@@ -37,4 +37,5 @@ examples/pymc_overview
examples/GLM_linear
examples/model_comparison
examples/posterior_predictive
+examples/dimensionality
:::
diff --git a/docs/source/learn/examples/dimensionality.ipynb b/docs/source/learn/examples/dimensionality.ipynb
new file mode 100644
index 0000000000..363d8e8a9b
--- /dev/null
+++ b/docs/source/learn/examples/dimensionality.ipynb
@@ -0,0 +1,654 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "pycharm": {
+ "name": "#%% md\n"
+ }
+ },
+ "source": [
+ "# PyMC Dimensionality\n",
+ "PyMC provides a number of ways to specify the dimensionality of its distributions. In this document we will not provide an exhaustive explanation but rather an overview and current best practices. \n",
+ "\n",
+ "## Glossary\n",
+ "In this document we'll be using the term dimensionality to refer to the idea of dimensions. Each of the terms below has a specific\n",
+ "semantic and computational definition in PyMC. While we share them here they will make much more sense when viewed in the examples below.\n",
+ "\n",
+ "+ *Implied dimensions* → dimensionality that follows from inputs to the RV\n",
+ "+ *Support dimensions* → dimensions you can NEVER get rid of\n",
+ "+ *`ndim_support`* → smallest shape that can result from a random draw. This is a fixed attribute in the distribution definition\n",
+ "+ *Shape* → final resulting tensor shape\n",
+ "+ *Size* → shape minus the support dimensions\n",
+ "+ *Dims* → An array of dimension names\n",
+ "+ *Coords* → A dictionary mapping dimension names to coordinate values\n",
+ "\n",
+ "\n",
+ "## General Recommendations\n",
+ "### When prototyping implied and size are convenient\n",
+ "Implied dimensions are easy to specify and great for quickly expanding an existing RV. F\n",
+ "\n",
+ "### For reusable code we suggest dims\n",
+ "For any more important work, or reuable work we suggest dims and coords as the labels will be passed to {class}'arviz.InferenceData'. This is both best practice transparency and readability for others. It also is useful in single developer workflows, for example, in cases where there is a 3 dimensional or higher RV it'll help indiciate which dimension corresponds to which model concept.\n",
+ "\n",
+ "### Use shape if you'd like to be explicit\n",
+ "Use shape if you'd like to bypass any dimensionality calculations implicit in PyMC. This will strictly specify the dimensionality to Aesara\n",
+ "\n",
+ "### When debugging use unique prime numbers\n",
+ "By using prime numbers it will be easier to determine where how input dimensionalities are being converted to output dimensionalities.\n",
+ "Once confident with result then change the dimensionalities to match your data or modeling needs."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Code Examples"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pymc as pm\n",
+ "import numpy as np"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Scalar distribution example\n",
+ "We can start with the simplest case, a single Normal distribution. We specify one as shown below"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "normal_dist = pm.Normal.dist()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can then take a random sample from that same distribution and print both the draw and shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array(-1.11530499), ())"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "random_sample = normal_dist.eval()\n",
+ "random_sample, random_sample.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "In this case we end up with a single scalar value. This is consistent with the distributions `ndim_supp` as the smallest random draw dimension is a scalar which has a dimension of zero"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "0"
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pm.Normal.rv_op.ndim_supp"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Implied Example"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "If we wanted three draws from differently centered Normals we instead could pass a vector to the parameters. When generating a random draw we would now expect a vector value, in this case a vector if size 3. This is a case of *implied dimensions*"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([ 1.00002897, 9.9999175 , 99.99994224]), (3,))"
+ ]
+ },
+ "execution_count": 5,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "random_sample = pm.Normal.dist(mu=[1,10,100], sd=.0001).eval()\n",
+ "random_sample, random_sample.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Shape and Size"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Alternatively we may just want three draws from identical distributions. In this case we could use either `shape` or `size` to specify this"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([-0.56435014, 0.28613655, -0.92945242]), (3,))"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "random_sample = pm.Normal.dist(size=(3,)).eval()\n",
+ "random_sample, random_sample.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([ 0.23463317, -0.24455629, -2.23058663]), (3,))"
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "random_sample = pm.Normal.dist(shape=(3,)).eval()\n",
+ "random_sample, random_sample.shape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### Inspecting dimensionality with a model graph\n",
+ "A powerful tool to understand and debug dimensionality in PyMC is the `pm.model_to_graphviz` functionality. Rather than inspecting array outputs we instead can read the Graphviz output to understand the dimensionality.\n",
+ "\n",
+ "In the example below the number on the bottom left of each box indicates the dimensionality of the Random Variable. With the scalar distribution it is implied to be one random draw of `ndim_support`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "with pm.Model() as pmodel:\n",
+ " pm.Normal(\"scalar\") # shape=()\n",
+ " pm.Normal(\"vector (implied)\", mu=[1,2,3])\n",
+ " pm.Normal(\"vector (from shape)\", shape=(4,))\n",
+ " pm.Normal(\"vector (from size)\", size=(5,))\n",
+ " \n",
+ "pm.model_to_graphviz(pmodel)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Dims\n",
+ "A new feature of PyMC is `dims` support. With many random variables it can become confusing which dimensionality corresponds to which \"real world\" idea, e.g. number of observations, number of treated units etc. The dims argument is an additional label to help."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "with pm.Model() as pmodel:\n",
+ " pm.Normal(\"red\", size=2, dims=\"B\")\n",
+ "\n",
+ " pm.Normal(\"one\", [1,2,3,4], dims=\"Dim_A\") # (4,)\n",
+ " pm.Normal(\"two\", dims=\"Dim_A\")\n",
+ "\n",
+ "\n",
+ "pm.model_to_graphviz(pmodel)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Where dims can become increasingly powerful is with the use of `coords` specified in the model itself. With this it becomes easy to track. As an added bonus the coords and dims will also be present in the returned {class}'arviz.InferenceData' simplifying the entire workflow."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "with pm.Model(coords={\n",
+ " \"year\": [2020, 2021, 2022],\n",
+ "}) as pmodel:\n",
+ " \n",
+ " pm.Normal(\"Normal_RV\", dims=\"year\")\n",
+ "\n",
+ " pm.model_to_graphviz(pmodel)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Vector Distributions\n",
+ "Some distributions by definition cannot return scalar values as random samples, but instead will return an array as their result. An example is the Multivariate Normal. The simplest possible return shape can be verified using `ndim_supp`. The value here indicates the smallest shape that can be returned is a vector"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "1"
+ ]
+ },
+ "execution_count": 11,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pm.MvNormal.rv_op.ndim_supp"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "This can be verified with a random sample as well."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[1.00273693, 1.99583834, 2.99674157],\n",
+ " [4.01036637, 5.00775714, 5.97952974]])"
+ ]
+ },
+ "execution_count": 23,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pm.MvNormal.dist(mu=[[1,2,3], [4,5,6]], cov=np.eye(3)*.0001).eval()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Like scalar distributions we can also use all our dimensionality tools as well to specify a set of Multivariate normals"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "[3]\n",
+ "[3 2]\n",
+ "[3 2]\n",
+ "[3 2]\n"
+ ]
+ },
+ {
+ "data": {
+ "image/svg+xml": [
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 33,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "with pm.Model(coords={\n",
+ " \"year\": [2020, 2021, 2022],\n",
+ "}) as pmodel:\n",
+ " mv = pm.MvNormal(\"implied\", mu=[0, 0, 0], cov=np.eye(3))\n",
+ " print(mv.shape.eval())\n",
+ "\n",
+ " # Multivariate RVs (ndim_supp > 0)\n",
+ " assert mv.ndim == 1\n",
+ "\n",
+ " mv = pm.MvNormal(\"with size\", mu=[0, 0], cov=np.eye(2), size=3, dims=(\"repeats\", \"implied\"))\n",
+ " print(mv.shape.eval())\n",
+ " \n",
+ " # ⚠ Size dims are always __prepended__\n",
+ " mv = pm.MvNormal(\"with shape\", mu=[0, 0], cov=np.eye(2), shape=(3, ...), dims=(\"repeats\", ...))\n",
+ " print(mv.shape.eval())\n",
+ " \n",
+ " mv = pm.MvNormal(\"with coords\", mu=[0, 0], cov=np.eye(2), dims=(\"year\", ...))\n",
+ " print(mv.shape.eval())\n",
+ "\n",
+ "pm.model_to_graphviz(pmodel)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### User caution and practical tips"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "While we provide all these tools for convenience, and while PyMC does it best to understand user intent, the result of mixed dimensionality tools may not always result in the final dimensionality intended. Sometimes the model may not indicate an error until sampling, or not indicate an issue at all. When working with dimensionality, particular more complex ones we suggest\n",
+ "\n",
+ "* Using GraphViz to visualize your model before sampling\n",
+ "* Using the prior predictive to catch errors early\n",
+ "* Inspecting the returned `az.InferenceData` object to ensure all array sizes are as intended"
+ ]
+ }
+ ],
+ "metadata": {
+ "interpreter": {
+ "hash": "f574fac5b7e4a41f7640949d1e1759089329dd116ff7b389caa9cf21f93d872d"
+ },
+ "kernelspec": {
+ "display_name": "Python 3 (ipykernel)",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.7"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}