\n",
"\n",
"## Interactive Gibbs Sampler \n",
"\n",
"### Michael J. Pyrcz, Professor, The University of Texas at Austin \n",
"\n",
"*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*"
]
},
{
"cell_type": "markdown",
"id": "2e15b023",
"metadata": {},
"source": [
"#### Gibbs Sampler\n",
"\n",
"I teach the Gibbs Sampler as part of my lecture on Markov chain Monte Carlo (McMC) methods. This is critical to understand solution methods Bayesian machine learning methods. See my lectures:\n",
"\n",
"* [Bayesian linear regression lecture](https://youtu.be/LzZ5b3wdZQk?si=3Uu2pvCjsl1fH5qU)\n",
"* [Markov chain Monte Carlo](https://youtu.be/7QX-yVboLhk?si=o7CSimpgFhjT1Vxo)\n",
"* [Bayesian Linear Regression Example](https://youtu.be/JG69fxKzwt8?si=ywn9xC_Pe8YQwR2f)\n",
"\n",
"Gibbs sampler is one of the most intuitive methods for McMC.\n",
"\n",
"* as usual we don't have access to the joint distribution, but we have access to the conditional distributions. \n",
"* instead of sampleing directly from the joint distribution (not available), we sequentially sample from the conditional distribution! \n",
"\n",
"For a bivariate example, features $X_1$ and $X_2$, we proceed as follows:\n",
"\n",
"1. Assign random values for $𝑋_1^{\\ell=0}$, $X_2^{\\ell=0}$\n",
"\n",
"2. Sample from $𝑓(𝑋_1|X_2^{\\ell=0})$ to get $𝑋_1^{\\ell=1}$ \n",
"\n",
"3. Sample from $𝑓(𝑋_2|X_1^{\\ell=1})$ to get $𝑋_2^{\\ell=1}$ \n",
"\n",
"4. Repeat for the next steps / samples, $\\ell = 1,\\ldots,𝐿$\n",
"\n",
"Although we only applied the conditional distribution, the resulting samples will have the correct joint distribution.\n",
"\n",
"\\begin{equation}\n",
"f(X_1,X_2)\n",
"\\end{equation}\n",
"\n",
"We never needed to use the joint distribution, we only needed the conditionals!\n",
"\n",
"* Bayesian Linear Regression - we apply Gibbs sampler to sample the posterior distributions of the model parameters given the data.\n",
"\n",
"#### Gibbs Sampler for Bivariate Gaussian Distribution\n",
"\n",
"Below I build out an interactive Gibbs sampler to sample the bivariate joint Gaussian distribution from only the conditional distributions!\n",
"\n",
"#### Load and Configure the Required Libraries\n",
"\n",
"The following code loads the required libraries and sets a plotting default."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "da837ef7",
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"supress_warnings = False\n",
"import os # to set current working directory \n",
"import sys # supress output to screen for interactive variogram modeling\n",
"import numpy as np # arrays and matrix math\n",
"import pandas as pd # DataFrames\n",
"from scipy.stats import norm # Gaussian PDF\n",
"import matplotlib.pyplot as plt # plotting\n",
"import seaborn as sns # plot PDF\n",
"from sklearn.model_selection import train_test_split # train and test split\n",
"from sklearn import tree # tree program from scikit learn (package for machine learning)\n",
"from sklearn import metrics # measures to check our models\n",
"import scipy.spatial as spatial #search for neighbours\n",
"from matplotlib.patches import Rectangle # build a custom legend\n",
"from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks\n",
"import math # sqrt operator\n",
"from ipywidgets import interactive # widgets and interactivity\n",
"from ipywidgets import widgets \n",
"from ipywidgets import Layout\n",
"from ipywidgets import Label\n",
"from ipywidgets import VBox, HBox\n",
"cmap = plt.cm.inferno # default color bar, no bias and friendly for color vision defeciency\n",
"plt.rc('axes', axisbelow=True) # grid behind plotting elements\n",
"if supress_warnings == True:\n",
" import warnings # supress any warnings for this demonstration\n",
" warnings.filterwarnings('ignore') "
]
},
{
"cell_type": "markdown",
"id": "2b57659e",
"metadata": {},
"source": [
"#### Declare Functions\n",
"\n",
"The following functions for clean code. \n",
"\n",
"* Just a improved grid for the plot."
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "a333fd85",
"metadata": {},
"outputs": [],
"source": [
"def add_grid():\n",
" plt.gca().grid(True, which='major',linewidth = 1.0); plt.gca().grid(True, which='minor',linewidth = 0.2) # add y grids\n",
" plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)\n",
" plt.gca().xaxis.set_minor_locator(AutoMinorLocator()); plt.gca().yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks"
]
},
{
"cell_type": "markdown",
"id": "e382a4e2",
"metadata": {},
"source": [
"#### Interactive Gibbs Sampler to Sample the Bivariate Gausian Distribution Dashboard\n",
"\n",
"Here's a dashboard with a cool visualization for my interactive Gibbs sampler."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "90be8276",
"metadata": {},
"outputs": [],
"source": [
"l = widgets.Text(value=' Interactive Gibbs Sampler Demo, Prof. Michael Pyrcz, The University of Texas at Austin',\n",
" layout=Layout(width='750px', height='30px'))\n",
"\n",
"nsample = widgets.IntSlider(min=1, max = 101, value=10, step = 1, description = '$n_{sample}$',orientation='horizontal', \n",
" style = {'description_width': 'initial'},layout=Layout(width='370px', height='30px'),continuous_update=False)\n",
"rho = widgets.FloatSlider(min=-1.0, max = 1.0, value=0.7, step = 0.1, description = r'$\\rho_{X_1,X_2}$',orientation='horizontal',\n",
" style = {'description_width': 'initial'},layout=Layout(width='370px', height='30px'),continuous_update=False)\n",
"\n",
"ui = widgets.HBox([nsample,rho],)\n",
"ui2 = widgets.VBox([l,ui],)\n",
"\n",
"def run_plot(nsample,rho):\n",
" mu1 = 0.0; sig1 = 1.0; mu2 = 0.0; sig2 = 1.0; seed = 73073; nc = 200\n",
" \n",
" L = nsample\n",
" np.random.seed(seed=seed)\n",
" x1 = np.zeros(L); x2 = np.zeros(L); x = np.linspace(-3,3,nc)\n",
" \n",
" x1[0] = np.random.rand(1) * 6.0 - 3.0; x2[0] = np.random.rand(1) * 6.0 - 3.0; \n",
" \n",
" plt.subplot(111)\n",
" plt.scatter(x1[0],x2[0],color='grey',edgecolor='black',s=15,zorder=4)\n",
" \n",
" case = 0\n",
" \n",
" for l in range(1,L):\n",
" if case == 0: # update x2\n",
" x1[l] = x1[l-1]\n",
" lmu = mu2 + rho * (sig2/sig1) * (x1[l] - mu1); lstd = 1 - rho**2\n",
" x2[l] = np.random.normal(loc = lmu,scale = lstd,size = 1)\n",
" case = 1\n",
" plt.scatter(x1[l],x2[l],color='blue',edgecolor='black',s=15,alpha=1.0,zorder=100)\n",
" plt.plot([x1[l-1],x1[l]],[x2[l-1],x2[l]],color='black',lw=1,alpha = max((l-(L-20))/20,0),zorder=4)\n",
" plt.plot([x1[l-1],x1[l]],[x2[l-1],x2[l]],color='white',lw=3,alpha = max((l-(L-20))/20,0),zorder=3)\n",
" if l == L-1:\n",
" #plt.plot([x1[l],x1[l]],[-3,3],color='blue',alpha=0.7,zorder=10)\n",
" pdf = norm.pdf(x, loc=lmu, scale=lstd)*0.5\n",
" mask = pdf > np.percentile(pdf,q=40)\n",
" plt.fill_betweenx(x[mask],x1[l]+pdf[mask],np.full(len(x[mask]),x1[l]),color='blue',alpha=0.2,zorder=2)\n",
" plt.plot(x1[l]+pdf[mask],x[mask],color='blue',alpha=0.7,zorder=1)\n",
" plt.arrow(x1[l-1],x2[l-1],0,x2[l]-x2[l-1],color='black',lw=0.5,head_width=0.05,length_includes_head=True,zorder=100)\n",
" plt.scatter(x1[l],x2[l],color='white',edgecolor='blue',s=30,linewidth=1,alpha=1.0,zorder=100)\n",
" plt.annotate(r'$f_{X_2|X_1}$ = ' + str(np.round(x1[l],2)),xy=[x1[l]+0.02,max(x[mask])-0.2],color='blue',rotation=-90)\n",
" elif case == 1: # update x1\n",
" x2[l] = x2[l-1]\n",
" lmu = mu1 + rho * (sig1/sig2) * (x2[l] - mu2); lstd = 1 - rho**2\n",
" x1[l] = np.random.normal(loc = lmu,scale = lstd,size = 1)\n",
" case = 0\n",
" plt.scatter(x1[l],x2[l],color='red',edgecolor='black',s=15,alpha=1.0,zorder=100)\n",
" plt.plot([x1[l-1],x1[l]],[x2[l-1],x2[l]],color='black',lw=1,alpha = max((l-(L-20))/20,0),zorder=4)\n",
" plt.plot([x1[l-1],x1[l]],[x2[l-1],x2[l]],color='white',lw=3,alpha = max((l-(L-20))/20,0),zorder=3)\n",
" if l == L-1:\n",
" #plt.plot([-3,3],[x2[l],x2[l]],color='red',alpha=0.7,zorder=10)\n",
" pdf = norm.pdf(x, loc=lmu, scale=lstd)*0.5\n",
" mask = pdf > np.percentile(pdf,q=40)\n",
" plt.fill_between(x[mask],x2[l]+pdf[mask],np.full(len(x[mask]),x2[l]),color='red',alpha=0.2,zorder=2)\n",
" plt.plot(x[mask],x2[l]+pdf[mask],color='red',alpha=0.7,zorder=1)\n",
" plt.arrow(x1[l-1],x2[l-1],x1[l]-x1[l-1],0,color='black',lw=0.5,head_width=0.05,length_includes_head=True,zorder=100)\n",
" plt.scatter(x1[l],x2[l],color='white',edgecolor='red',s=30,linewidth=1,alpha=1.0,zorder=100)\n",
" plt.annotate(r'$f_{X_1|X_2}$ = ' + str(np.round(x2[l],2)),xy=[min(x[mask])-0.5,x2[l]+0.1],color='red')\n",
" \n",
" df = pd.DataFrame(np.vstack([x1,x2]).T, columns= ['x1','x2'])\n",
" if L > 20:\n",
" sns.kdeplot(data=df,x='x1',y='x2',color='grey',linewidths=1.0,alpha=min(((l-20)/20),1.0),levels=5,zorder=1)\n",
" add_grid()\n",
" plt.xlim([-3.5,3.5]); plt.ylim([-3.5,3.5]); plt.xlabel(r'$X_1$'); plt.ylabel(r'$X_2$'); plt.title('Gibbs Sampler - Bivariate Joint Gaussian Distribution')\n",
" plt.subplots_adjust(left=0.0,bottom=0.0,right=1.0,top=1.1); plt.show() # set plot size \n",
" \n",
"# connect the function to make the samples and plot to the widgets \n",
"interactive_plot = widgets.interactive_output(run_plot, {'nsample':nsample,'rho':rho})\n",
"interactive_plot.clear_output(wait = True) # reduce flickering by delaying plot updating "
]
},
{
"cell_type": "markdown",
"id": "faaceed1",
"metadata": {},
"source": [
"### Interactive Gibbs Sampler Demonstation \n",
"\n",
"#### Michael Pyrcz, Professor, The University of Texas at Austin \n",
"\n",
"Set the number of samples and correlation coefficient and observe the Gibbs sampler.\n",
"\n",
"### The Inputs\n",
"\n",
"* **$n_{sample}$** - number of samples, **$\\rho_{X_1,X_2}$** - correlation coefficient"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "899c4fa6",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b530fb44f2a843f1960cddbbc6aef73d",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(Text(value=' Interactive Gibbs Sampler Demo, Prof. Michael Pyr…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e0e8fc2fdb43415697a1eaf74174806f",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display(ui2, interactive_plot) # display the interactive plot"
]
},
{
"cell_type": "markdown",
"id": "07eb83a5",
"metadata": {},
"source": [
"#### Comments\n",
"\n",
"This was a basic demonstration of the Gibbs sampler for McMC. I have many other demonstrations and even basics of working with DataFrames, ndarrays, univariate statistics, plotting data, declustering, data transformations and many other workflows available at https://github.com/GeostatsGuy/PythonNumericalDemos and https://github.com/GeostatsGuy/GeostatsPy. \n",
" \n",
"#### The Author:\n",
"\n",
"### Michael J. Pyrcz, Professor, The University of Texas at Austin \n",
"*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*\n",
"\n",
"With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. \n",
"\n",
"For more about Michael check out these links:\n",
"\n",
"#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n",
"\n",
"#### Want to Work Together?\n",
"\n",
"I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.\n",
"\n",
"* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! \n",
"\n",
"* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!\n",
"\n",
"* I can be reached at mpyrcz@austin.utexas.edu.\n",
"\n",
"I'm always happy to discuss,\n",
"\n",
"*Michael*\n",
"\n",
"Michael Pyrcz, Ph.D., P.Eng. Professor, The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, Jackson School of Geosciences, The University of Texas at Austin\n",
"\n",
"#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) \n",
" "
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "f51344db",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}