\n",
"\n",
"## P-P Plot Interactive Demonstration\n",
"\n",
"### P-P (Probability-Probablity) Plots in Python \n",
"\n",
"Interactive demonstration of P-P plots to compare two distributions, cumulative distribution functions. \n",
"\n",
"* P-P plots map data values between two distributions, and then scatter plot the cumulative probability values.\n",
"\n",
"* A lecture that covers these concepts is available [Q-Q plots and P-P plots](https://www.youtube.com/watch?v=RETZus4XBNM&list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ&index=23&t=4s).\n",
"\n",
"This interactive dashboard may be applied to support teaching data science.\n",
"\n",
"#### Jason Bott, Undergraduate Student, The University of Texas at Austin\n",
"\n",
"#### [GitHub](https://github.com/jasonbott124) | [GoogleScholar](https://scholar.google.com/citations?user=31Ae8UkAAAAJ&hl=en) | [LinkedIn](https://www.linkedin.com/in/jason-bott-a52944270/) | [Eportfolio](https://jasonseportfolio5.wordpress.com/) | Email: jbott@utexas.edu\n",
"\n",
"#### Michael Pyrcz, Professor, The University of Texas at Austin \n",
"\n",
"##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n"
]
},
{
"cell_type": "markdown",
"id": "96b3cd54",
"metadata": {},
"source": [
"#### Importing Packages\n",
"\n",
"We will need some standard packages. These should have been installed with Anaconda 3."
]
},
{
"cell_type": "code",
"execution_count": 1,
"id": "7fb9fccd",
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"from ipywidgets import interactive # widgets and interactivity\n",
"from ipywidgets import widgets \n",
"from ipywidgets import Layout\n",
"from ipywidgets import Label\n",
"from ipywidgets import VBox, HBox\n",
"\n",
"import numpy as np # ndarrays for gridded data\n",
"import pandas as pd # DataFrames for tabular data\n",
"from scipy import stats # inverse percentiles, percentileofscore function for P-P plots\n",
"import os # set working directory, run executables\n",
"\n",
"import matplotlib.pyplot as plt # plotting\n",
"import matplotlib.gridspec as gridspec\n",
"plt.rc('axes', axisbelow=True)"
]
},
{
"cell_type": "markdown",
"id": "82a0811b",
"metadata": {},
"source": [
"#### Widgets and Display\n",
"Next, we need to create our widgets and format the overall display"
]
},
{
"cell_type": "code",
"execution_count": 2,
"id": "82630901",
"metadata": {},
"outputs": [],
"source": [
"# interactive calculation of the sample set (control of source parametric distribution and number of samples)\n",
"l = widgets.Text(value=' Interactive P-P Plot | Jason Bott, Undergraduate Student, the University of Texas at Austin | Michael Pyrcz, Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'),continuous_update=True)\n",
"\n",
"n1 = widgets.IntSlider(min=0, max = 1000, value = 100, step = 10, description = '$n_{1}$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=False)\n",
"n1.style.handle_color = 'red'\n",
"\n",
"m1 = widgets.FloatSlider(min=0.2, max = 0.8, value = 0.3, step = 0.1, description = '$\\overline{x}_{1}$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=False)\n",
"m1.style.handle_color = 'red'\n",
"\n",
"s1 = widgets.FloatSlider(min=0.0, max = 0.2, value = 0.03, step = 0.005, description = '$s_1$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=False)\n",
"s1.style.handle_color = 'red'\n",
"\n",
"ui1 = widgets.VBox([n1,m1,s1],) # basic widget formatting \n",
"\n",
"n2 = widgets.IntSlider(min=0, max = 1000, value = 100, step = 10, description = '$n_{2}$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=False)\n",
"n2.style.handle_color = 'blue'\n",
"\n",
"m2 = widgets.FloatSlider(min=0.2, max = 0.8, value = 0.2, step = 0.1, description = '$\\overline{x}_{2}$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=False)\n",
"m2.style.handle_color = 'blue'\n",
"\n",
"s2 = widgets.FloatSlider(min=0, max = 0.2, value = 0.03, step = 0.005, description = '$s_2$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=False)\n",
"s2.style.handle_color = 'blue'\n",
"\n",
"ui2 = widgets.VBox([n2,m2,s2],) # basic widget formatting \n",
"\n",
"nq = widgets.IntSlider(min=10, max = 1000, value = 100, step = 1, description = '$n_q$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=False)\n",
"nq.style.handle_color = 'gray'\n",
"\n",
"# plot = widgets.Checkbox(value=False,description='Make Plot')\n",
"\n",
"ui3 = widgets.VBox([nq,],) # basic widget formatting \n",
"\n",
"ui4 = widgets.HBox([ui1,ui2,ui3],) # basic widget formatting \n",
"\n",
"ui2 = widgets.VBox([l,ui4],)"
]
},
{
"cell_type": "markdown",
"id": "0bacdefc",
"metadata": {},
"source": [
"#### P-P plot Function\n",
"\n",
"We create a function that calculates and matches the values from both data distributions. And plots the cumulative probilities."
]
},
{
"cell_type": "code",
"execution_count": 3,
"id": "27c8eadb",
"metadata": {},
"outputs": [],
"source": [
"#function to take parameters, make sample and PP plot\n",
"def double_p(n1, m1, s1, n2, m2, s2, nq):\n",
" \n",
"# n1 = 100; mean1 = 0.35; stdev1 = 0.06 \n",
"# n2 = 50; mean2 = 0.3; stdev2 = 0.05\n",
" \n",
" seed = 73073; #nq = 100\n",
" xmin=0.0; xmax=0.6\n",
" np.random.seed(seed=seed)\n",
" \n",
" X1 = np.random.normal(loc=m1,scale=s1,size=n1)\n",
" X2 = np.random.normal(loc=m2,scale=s2,size=n2)\n",
" \n",
" min_X = min(X1.min(),X2.min())\n",
" max_X = max(X1.max(),X2.max())\n",
" \n",
" X_values = np.linspace(min_X,max_X,nq)\n",
"\n",
" X1_cumul_probs = []; X2_cumul_probs = []\n",
"\n",
" for X in X_values:\n",
" X1_cumul_probs.append(stats.percentileofscore(X1,X)/100)\n",
" X2_cumul_probs.append(stats.percentileofscore(X2,X)/100)\n",
" \n",
" X1_cumul_probs = np.asarray(X1_cumul_probs); X2_cumul_probs = np.asarray(X2_cumul_probs)\n",
" fig = plt.figure()\n",
" spec = fig.add_gridspec(2, 3)\n",
"\n",
" #P-P plot\n",
" ax0 = fig.add_subplot(spec[:, 1:])\n",
" plt.scatter(X1_cumul_probs,X2_cumul_probs,color='darkorange',edgecolor='black',s=20,label='P-P plot')\n",
" plt.plot([0,1.0],[0,1.0],ls='--',color='red')\n",
" plt.grid(); plt.xlim([0.0,1.0]); plt.ylim([0.0,1.0]); plt.xlabel(r'$F^{-1}_{X_1}(x)$ - Cumulative Probability'); plt.ylabel(r'$F^{-1}_{X_2}(x)$ - Cumulative Probability'); \n",
" plt.title('P-P Plot'); plt.legend(loc='lower right')\n",
"\n",
" #Histogram\n",
" ax10 = fig.add_subplot(spec[0, 0])\n",
" plt.hist(X1,bins=np.linspace(xmin,xmax,30),color='red',alpha=0.5,edgecolor='black',label=r'$X_1$',density=True)\n",
" plt.hist(X2,bins=np.linspace(xmin,xmax,30),color='yellow',alpha=0.5,edgecolor='black',label=r'$X_2$',density=True)\n",
" plt.grid(); plt.xlim([xmin,xmax]); plt.ylim([0,15]); plt.xlabel('Porosity (fraction)'); plt.ylabel('Density')\n",
" plt.title('Histograms'); plt.legend(loc='upper right')\n",
" \n",
" #CDF\n",
" ax11 = fig.add_subplot(spec[1, 0])\n",
" plt.scatter(np.sort(X1),np.linspace(0,1,len(X1)),color='red',alpha=0.5,edgecolor='black',s=30,label=r'$X_1$')\n",
" plt.scatter(np.sort(X2),np.linspace(0,1,len(X2)),color='yellow',alpha=0.5,edgecolor='black',s=30,label=r'$X_2$')\n",
" plt.grid(); plt.xlim([xmin,xmax]); plt.ylim([0,1]); plt.xlabel('Porosity (fraction)'); plt.title('CDFs'); plt.legend(loc='lower right')\n",
"\n",
" plt.subplots_adjust(left=0.0, bottom=0.0, right=1.5, top=1.4, wspace=0.3, hspace=0.3); plt.show()\n",
"\n",
"interactive_plot = widgets.interactive_output(double_p, {'n1': n1, 'm1': m1, 's1': s1, 'n2': n2, 'm2': m2, 's2': s2, 'nq': nq}) #creates an object called interactive_plot that calls the double_p() function\n",
"interactive_plot.clear_output(wait = True) #reduce flickering by delaying plot updating"
]
},
{
"cell_type": "markdown",
"id": "63fd9c7b",
"metadata": {},
"source": [
"### P-P Plot for Comparing Distributions\n",
"\n",
"* demonstration of P-P plots to compare distributions, while interactively varying the distributions\n",
"\n",
"#### Jason Bott, Undergraduate Student and Michael Pyrcz, Professor, The University of Texas at Austin \n",
"\n",
"Let's make 2 random datasets, $\\color{red}{X_1}$ and $\\color{blue}{X_2}$ and calculate their P-P plot.\n",
"\n",
"* **$\\color{red}{n_1}$**, **$\\color{blue}{n_2}$** number of samples, **$\\color{red}{\\overline{x}_1}$**, **$\\color{blue}{\\overline{x}_2}$** means and **$\\color{red}{s_1}$**, **$\\color{blue}s_2$** standard deviation of the 2 sample sets\n",
"* **$\\color{grey}{n_q}$**: number of regular bins over the range of values"
]
},
{
"cell_type": "code",
"execution_count": 4,
"id": "2ef49ab5",
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "b1b0656181e54c11b9eab66506771cce",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(Text(value=' Interactive P-P Plot | Jason Bott, Undergraduate Student, the University…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "5724e2c78fa74728a3bdc66004562464",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '', 'i…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display(ui2, interactive_plot) #displays the widgets and plots"
]
},
{
"cell_type": "markdown",
"id": "a0ba1653",
"metadata": {},
"source": [
"#### Comments\n",
"\n",
"This was a basic interactive demonstration of a P-P plot in Python.\n",
"\n",
"#### The Authors:\n",
"\n",
"### Jason Bott, Undergraduate Student, University of Texas at Austin\n",
"*Geostatistics, Geophysics, Polar Geophysics, Volcanism, Subglacial Volcanism, Exploration Geophysics*\n",
"\n",
"Just a passionate student who enjoys all things geoscience. If you would like to contact me, I can be reached through email at: jbott@utexas.edu\n",
"\n",
"For more about Jason check out these links:\n",
"\n",
"#### [GitHub](https://github.com/jasonbott124) | [GoogleScholar](https://scholar.google.com/citations?user=31Ae8UkAAAAJ&hl=en) | [LinkedIn](https://www.linkedin.com/in/jason-bott-a52944270/) | [Eportfolio](https://jasonseportfolio5.wordpress.com/) \n",
"\n",
"\n",
"### Michael Pyrcz, Associate Professor, University of Texas at Austin \n",
"*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*\n",
"\n",
"With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. \n",
"\n",
"For more about Michael check out these links:\n",
"\n",
"#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n",
"\n",
"#### Want to Work With Michael?\n",
"\n",
"I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.\n",
"\n",
"* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! \n",
"\n",
"* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!\n",
"\n",
"* I can be reached at mpyrcz@austin.utexas.edu.\n",
"\n",
"I'm always happy to discuss,\n",
"\n",
"*Michael*\n",
"\n",
"Michael Pyrcz, Ph.D., P.Eng. Associate Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin\n",
"\n",
"#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "a1d28d92",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 5
}