{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "

\n", " \n", "\n", "

\n", "\n", "## Interactive Bayesian Coin Demonstration from Sivia (1996)\n", "\n", "### Sivia, D.S., 1996, Data Analysis: A Bayesian Tutorial\n", "\n", "* interactive plot demonstration with ipywidget package\n", "\n", "#### Michael Pyrcz, Professor, The University of Texas at Austin \n", "\n", "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)\n", "\n", "### The Bayesian Coin Example\n", "\n", "### Bayesian Updating\n", "\n", "Here is an interactive workflow based on Sivia's (1996) Bayesian coin example to demonstrate Bayesian updating that should help you efficient learn Bayesian updating, central for Bayesian probability calculations and machine learning.\n", "\n", "I have recorded a walk-through of this interactive dashboard in my [Data Science Interactive Python Demonstrations](https://www.youtube.com/playlist?list=PLG19vXLQHvSDy26fM3hDLg3VCU7U5BGZl) series on my [YouTube](https://www.youtube.com/@GeostatsGuyLectures) channel.\n", "\n", "* Join me for walk-through of this dashboard [03 Data Science Interactive: Bayesian Coin Example](TBD). I'm stoked to guide you and share observations and things to try out! \n", "\n", "* I have a lecture on [Bayesian Probability](https://www.youtube.com/watch?v=Ppwfr8H177M&list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ&index=6) as part of my [Data Analytics and Geostatistics](https://www.youtube.com/playlist?list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ). Note, for all my recorded lecture the interactive and well-documented workflow demononstrations are available on my GitHub repository [GeostatsGuy's Python Numerical Demos](https://github.com/GeostatsGuy/PythonNumericalDemos).\n", "\n", "* Also, I have another interactive [Sivia's Bayesian Coin Example](https://github.com/GeostatsGuy/PythonNumericalDemos/blob/master/Interactive_Sivia_Coin_Toss.ipynb) for the Sivia (1996) Bayesian coin example and a [Walk-Through](https://www.youtube.com/watch?v=D1UKZGOYDOg&list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ&index=8&t=14s) on my YouTube channel.\n", "\n", "#### Bayesian Updating\n", "\n", "First let's remind ourselves of the Bayesian updating with a convenient notation:\n", "\n", "\\begin{equation}\n", "P( Model | Data ) = \\frac{P( Data | Model ) \\cdot P( Model )}{P( Data )} \\quad Posterior = \\frac{Likelihood \\cdot Prior}{Evidence}\n", "\\end{equation} \n", "\n", "we can marginalize or normalize to remove the need for the evidence term.\n", "\n", "\\begin{equation}\n", "P( Model | Data ) \\propto P( Data | Model ) \\cdot P( Model )\n", "\\end{equation} \n", "\n", "#### Setting Up the Bayesian Coin Example\n", "\n", "I have a coin and you need to figure out if it is a fair coin!\n", "\n", "* a fair coin would have a 50% probability of heads (and a 50% probability of tails)\n", "\n", "You start with your prior to data assessment of my coin, a **prior distribution** representing your uncertainty of the probability of heads, $P(H)$\n", "\n", "\\begin{equation}\n", "P(H | Tosses) \\propto P(Tosses | H) \\cdot P(H) \\quad Posterior \\propto Likelihood \\cdot Prior\n", "\\end{equation}\n", "\n", "* it could be based on how honest you think I am\n", "\n", "Then you perform a set of coin tosses to build a **likelihood distribution**, $P(Tosses | H)$\n", "\n", "* the more coin tosses, the narrower this distribution\n", "\n", "Then you update the prior distribution with the likelihood distribution to get the **posterior distribution**, $P(H | Tosses)$.\n", "\n", "\\begin{equation}\n", "P( H | Tosses ) \\propto P( Tosses | H ) \\cdot P( H ) \\quad Posterior \\propto Likelihood \\cdot Prior\n", "\\end{equation} \n", "\n", "#### More Details About this Example\n", "\n", "Here's some more details to assist:\n", "\n", "* **Prior Distribution** - we assume a Guassian distributed prior for probability of heads, $P(H)$, so it is specified by prior mean, $\\overline{P(H)}$, and standard deviation, $\\sigma_{P(H)}$, and we can calculate the probaiblities over a range of possible probability of heads, $P_{Heads}$.\n", "\n", "\\begin{equation}\n", "P(H) \\sim N \\left[\\overline{P(heads)},\\sigma_{P(Heads)} \\right]\n", "\\end{equation}\n", "\n", "* **Likelihood Distribution** - we use the binomial distribution to calculate the probablity of the data, observed tosses, given the coin probability of heads, $P(Tosses|H)$. Given we know the data, $n_{heads}$ out of $n_{tosses}$, we can substitute each case of probability of heads, $P_{heads}$, into the binomial PDF equation (where we substitute a range of $P_{heads}$ values from 0.0 to 1.0 to calculate the entire likelihood distribution):\n", "\n", "\\begin{equation}\n", "P(Tosses | H) = \\begin{pmatrix} n_{tosses} \\\\ n_{heads} \\end{pmatrix} P_{heads}^{n_{heads}} (1 - P_{heads})^{n_{tosses} - n_{heads}}\n", "\\end{equation}\n", " \n", "* **Posterior Distribution** - then for each $P_{heads}$ value we multiple the associated prior and likelihood to get the posterior.\n", "\n", "\\begin{equation}\n", "P(H|Tosses)^{\\prime} \\propto \\begin{pmatrix} n_{tosses} \\\\ n_{heads} \\end{pmatrix} P_{heads}^{n_{heads}} (1 - P_{heads})^{(n_{tosses} - n_{heads})} \\cdot \\frac{1}{\\sigma_{P(H)}\\sqrt{2 \\pi}} e^{-\\frac{1}{2}\\left( \\frac{P_{heads} - \\overline{P(heads)} }{\\sigma_{P(Heads)}} \\right)^2}\n", "\\end{equation}\n", "\n", "* **Posterior Distribution Closure** - finally, we normalize all the posterior probabilities over the range of $P_{heads}$ values to sum to 1.0.\n", "\n", "\\begin{equation}\n", "P(H|Tosses) = \\frac{ P(H|Tosses)^{\\prime} } { \\sum_{ \\forall P_{heads} } P(H|Tosses)^{\\prime} } \n", "\\end{equation}\n", "\n", "#### Getting Started\n", "\n", "Here's the steps to get setup in Python with the GeostatsPy package:\n", "\n", "1. Install Anaconda 3 on your machine (https://www.anaconda.com/download/). \n", "\n", "That's all!\n", "\n", "#### Load the Required Libraries\n", "\n", "The following code loads the required libraries." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from ipywidgets import interactive # widgets and interactivity\n", "from ipywidgets import widgets \n", "from ipywidgets import Layout\n", "import matplotlib; import matplotlib.pyplot as plt # plotting\n", "from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks\n", "plt.rc('axes', axisbelow=True) # set axes and grids in the background for all plots\n", "import numpy as np # working with arrays\n", "from scipy.stats import triang # parametric distributions\n", "from scipy.stats import binom\n", "from scipy.stats import norm" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "##### Build the Interactive Dashboard\n", "\n", "Here's the code to build the dashboard." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": false }, "outputs": [], "source": [ "# 4 slider bars for the model input\n", "l = widgets.Text(value=' Sivia (1996) Bayesian Updating Coin Example Demonstration, Michael Pyrcz, Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))\n", "a = widgets.FloatSlider(min=0.0, max = 1.0, value = 0.5, description = r'$\\overline{P(heads)}$',continuous_update=False)\n", "d = widgets.FloatSlider(min=0.01, max = 1.0, value = 0.1, step = 0.01, description = r'$\\sigma_{P(Heads)}$',continuous_update=False)\n", "b = widgets.FloatSlider(min = 0, max = 1.0, value = 0.5, description = r'$\\frac{n_{heads}}{n_{tosses}}$',continuous_update=False)\n", "c = widgets.IntSlider(min = 5, max = 1000, value = 100, description = r'$n_{tosses}$',continuous_update=False)\n", "\n", "ui = widgets.HBox([a,d,b,c],)\n", "uik = widgets.VBox([l,ui],)\n", "\n", "def f(a, b, c, d): # function to make the plot \n", " heads = int(c * b)\n", " tails = c - heads\n", " \n", " x = np.linspace(0.0, 1.0, num=1000)\n", " \n", " prior = norm.pdf(x,loc = a, scale = d)\n", " prior = prior / np.sum(prior)\n", " \n", " plt.subplot(221)\n", " \n", " plt.plot(x, prior,color='red',alpha=0.8,linewidth=3) # prior distribution of coin fairness\n", " plt.arrow(0.48,0.04,-0.4,0,linewidth=1,facecolor='grey',edgecolor='black',head_length=.03)\n", " plt.annotate('Increasingly Weighted Tails',[0.10,0.042],rotation=0.0)\n", " plt.arrow(0.52,0.04,0.4,0,linewidth=1,facecolor='grey',edgecolor='black',head_length=.03)\n", " plt.annotate('Increasingly Weighted Heads',[0.55,0.042],rotation=0.0)\n", " plt.annotate('Double Tailed Coin',[0.02,0.02],rotation=90.0)\n", " plt.annotate('Double Headed Coin',[0.97,0.02],rotation=90.0)\n", " plt.xlim(0.0,1.0)\n", " plt.xlabel('P(Heads)'); plt.ylabel('Density'); plt.title(r'$Prior$ $Distribution$ - What do you think of the coin before the tosses?')\n", " plt.ylim(0, 0.05)\n", " plt.axvline(x=0.5,color='grey',linestyle='--');\n", " plt.annotate('Fair Coin',[0.49,0.02],rotation=90.0)\n", " plt.grid()\n", " plt.gca().xaxis.grid(True, which='major',linewidth = 1.0); plt.gca().xaxis.grid(True, which='minor',linewidth = 0.2) # add y grids\n", " plt.gca().yaxis.grid(True, which='major',linewidth = 1.0); plt.gca().yaxis.grid(True, which='minor',linewidth = 0.2) # add y grids\n", " plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)\n", " plt.gca().xaxis.set_minor_locator(AutoMinorLocator()); plt.gca().yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks\n", " \n", " plt.subplot(222) # results from the coin tosses \n", " labels = ['Heads','Tails']\n", " label_pos = [i for i, _ in enumerate(labels)]\n", " plt.bar(labels,[heads,tails],color=['red','blue'],edgecolor='black',alpha=0.8)\n", " plt.xticks(label_pos, labels)\n", " plt.ylim([0,1000]); plt.ylabel('Frequency')\n", " #plt.pie([heads, tails],labels = ['heads','tails'],radius = 0.5*(c/1000)+0.5, autopct='%1.1f%%', colors = ['#ff9999','#66b3ff'], explode = [.02,.02], wedgeprops = {\"edgecolor\":\"k\",'linewidth': 1} )\n", " plt.title('The Experimental Result from ' + str(c) + ' Coin Tosses')\n", " plt.gca().yaxis.grid(True, which='major',linewidth = 1.0); plt.gca().yaxis.grid(True, which='minor',linewidth = 0.2) # add y grids\n", " plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)\n", " plt.gca().yaxis.set_minor_locator(AutoMinorLocator())\n", " \n", " likelihood = binom.pmf(heads,c,x)\n", " likelihood = likelihood/np.sum(likelihood)\n", " \n", " plt.subplot(223) # likelihood distribution given the coin tosses\n", " plt.plot(x, likelihood,color='red',alpha=0.8,linewidth=3)\n", " plt.arrow(0.48,0.04,-0.4,0,linewidth=1,facecolor='grey',edgecolor='black',head_length=.03)\n", " plt.annotate('Increasingly Weighted Tails',[0.10,0.042],rotation=0.0)\n", " plt.arrow(0.52,0.04,0.4,0,linewidth=1,facecolor='grey',edgecolor='black',head_length=.03)\n", " plt.annotate('Increasingly Weighted Heads',[0.55,0.042],rotation=0.0)\n", " plt.annotate('Double Tailed Coin',[0.02,0.02],rotation=90.0)\n", " plt.annotate('Double Headed Coin',[0.97,0.02],rotation=90.0)\n", " plt.xlim(0.0,1.0); plt.ylim(0, 0.05)\n", " plt.xlabel('P(Tosses | Heads)'); plt.ylabel('Density'); plt.title(r'$Likelihood$ $Distribution$ - What do the coin tosses tell you about the coin?')\n", " plt.axvline(x=0.5,color='grey',linestyle='--');\n", " plt.annotate('Fair Coin',[0.49,0.02],rotation=90.0)\n", " plt.grid()\n", " plt.gca().xaxis.grid(True, which='major',linewidth = 1.0); plt.gca().xaxis.grid(True, which='minor',linewidth = 0.2) # add y grids\n", " plt.gca().yaxis.grid(True, which='major',linewidth = 1.0); plt.gca().yaxis.grid(True, which='minor',linewidth = 0.2) # add y grids\n", " plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)\n", " plt.gca().xaxis.set_minor_locator(AutoMinorLocator()); plt.gca().yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks\n", " \n", " post = prior * likelihood\n", " post = post / np.sum(post) \n", " \n", " plt.subplot(224) # posterior distribution\n", " plt.plot(x, post,color='red',alpha=0.8,linewidth=3)\n", " plt.arrow(0.48,0.04,-0.4,0,linewidth=1,facecolor='grey',edgecolor='black',head_length=.03)\n", " plt.annotate('Increasingly Weighted Tails',[0.10,0.042],rotation=0.0)\n", " plt.arrow(0.52,0.04,0.4,0,linewidth=1,facecolor='grey',edgecolor='black',head_length=.03)\n", " plt.annotate('Increasingly Weighted Heads',[0.55,0.042],rotation=0.0)\n", " plt.annotate('Double Tailed Coin',[0.02,0.02],rotation=90.0)\n", " plt.annotate('Double Headed Coin',[0.97,0.02],rotation=90.0)\n", " plt.xlim(0.0,1.0); plt.ylim(0, 0.05)\n", " plt.xlabel('P(Heads | Tosses)'); plt.ylabel('Density'); plt.title(r\"$Posterior$ $Distribution$ - What do you think of the coin after the tosses?\") \n", " plt.axvline(x=0.5,color='grey',linestyle='--');\n", " plt.annotate('Fair Coin',[0.49,0.02],rotation=90.0)\n", " plt.grid()\n", " plt.gca().xaxis.grid(True, which='major',linewidth = 1.0); plt.gca().xaxis.grid(True, which='minor',linewidth = 0.2) # add y grids\n", " plt.gca().yaxis.grid(True, which='major',linewidth = 1.0); plt.gca().yaxis.grid(True, which='minor',linewidth = 0.2) # add y grids\n", " plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)\n", " plt.gca().xaxis.set_minor_locator(AutoMinorLocator()); plt.gca().yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks\n", "\n", " \n", " plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=1.6, wspace=0.2, hspace=0.3)\n", " plt.show()\n", "\n", "interactive_plot = widgets.interactive_output(f, {'a': a, 'd': d, 'b': b, 'c': c})\n", "interactive_plot.clear_output(wait = True) # reduce flickering by delaying plot updating\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Bayesian Coin Example from Sivia, 1996, Data Analysis: A Bayesian Tutorial\n", "\n", "#### Michael Pyrcz, Professor, The University of Texas at Austin \n", "\n", "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)\n", "\n", "### The Problem\n", "\n", "What is the probability that Dr. Pyrcz has a fair coin? Let's solve this with Bayesian updating. The inputs:\n", "\n", "* prior model **$\\overline{P(heads)}$**: expectation of your prior distribution (probability of heads)\n", ", **$\\sigma_{P(Heads)}$**: standard deviation of your prior distribution \n", "\n", "* **$\\frac{n_{heads}}{n_{tosses}}$**: proportion of heads in the coin toss experiment, **$n_{tosses}$**: number of coin tosses in the coin toss experiment" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "70966136cb5b468db9200e6152c95248", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Text(value=' Sivia (1996) Bayesian Updating Coin Example Demonstra…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "962e898f7ce34e829aa373bbe4c0c652", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(uik, interactive_plot) # display the interactive plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Comments\n", "\n", "This was a simple demonstration of interactive plots in Jupyter Notebook Python with the ipywidgets and matplotlib packages. \n", "\n", "I have many other demonstrations on data analytics and machine learning, e.g. on the basics of working with DataFrames, ndarrays, univariate statistics, plotting data, declustering, data transformations, trend modeling and many other workflows available at https://github.com/GeostatsGuy/PythonNumericalDemos and https://github.com/GeostatsGuy/GeostatsPy. \n", " \n", "I hope this was helpful,\n", "\n", "*Michael*\n", "\n", "#### The Author:\n", "\n", "### Michael Pyrcz, Professor, The University of Texas at Austin \n", "*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*\n", "\n", "With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. \n", "\n", "For more about Michael check out these links:\n", "\n", "#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n", "\n", "#### Want to Work Together?\n", "\n", "I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.\n", "\n", "* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! \n", "\n", "* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!\n", "\n", "* I can be reached at mpyrcz@austin.utexas.edu.\n", "\n", "I'm always happy to discuss,\n", "\n", "*Michael*\n", "\n", "Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin\n", "\n", "#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 2 }