{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "

\n", " \n", "\n", "

\n", "\n", "## QQ-Plot Interactive Demonstration\n", "\n", "### QQ Plots in Python \n", "\n", "\n", "#### Michael Pyrcz, Professor, The University of Texas at Austin \n", "\n", "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Data Analytics: QQ Plots\n", "\n", "Here's a demonstration of calculation of QQ-plots in Python. This demonstration is part of the resources that I include for my courses in Spatial / Subsurface Data Analytics and Geostatistics at the Cockrell School of Engineering and Jackson School of Goesciences at the University of Texas at Austin. \n", "\n", "We will cover the following statistics:\n", "\n", "#### QQ-Plot\n", "* Convenient plot to compare distributions\n", "\n", "I have a lecture on QQ-plots available on [YouTube](https://www.youtube.com/watch?v=RETZus4XBNM). \n", "\n", "#### Getting Started\n", "\n", "Here's the steps to get setup in Python with the GeostatsPy package:\n", "\n", "1. Install Anaconda 3 on your machine (https://www.anaconda.com/download/). \n", "2. From Anaconda Navigator (within Anaconda3 group), go to the environment tab, click on base (root) green arrow and open a terminal. \n", "3. In the terminal type: pip install geostatspy. \n", "4. Open Jupyter and in the top block get started by copy and pasting the code block below from this Jupyter Notebook to start using the geostatspy functionality. \n", "\n", "You will need to copy the data file to your working directory. The dataset is available on my GitHub account in my GeoDataSets repository at:\n", "\n", "* Tabular data - [2D_MV_200wells.csv](https://github.com/GeostatsGuy/GeoDataSets/blob/master/2D_MV_200wells.csv)\n", "\n", "#### Importing Packages\n", "\n", "We will need some standard packages. These should have been installed with Anaconda 3." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "from ipywidgets import interactive # widgets and interactivity\n", "from ipywidgets import widgets \n", "from ipywidgets import Layout\n", "from ipywidgets import Label\n", "from ipywidgets import VBox, HBox\n", "import numpy as np # ndarrys for gridded data\n", "import pandas as pd # DataFrames for tabular data\n", "import os # set working directory, run executables\n", "import matplotlib.pyplot as plt # plotting\n", "import matplotlib.gridspec as gridspec\n", "import matplotlib\n", "plt.rc('axes', axisbelow=True)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set the Working Directory\n", "\n", "I always like to do this so I don't lose files and to simplify subsequent read and writes (avoid including the full address each time). Set this to your working directory, with the above mentioned data file." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "# interactive calculation of the sample set (control of source parametric distribution and number of samples)\n", "l = widgets.Text(value=' Interactive QQ-Plot, Michael Pyrcz, Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'),continuous_update=True)\n", "\n", "n1 = widgets.IntSlider(min=0, max = 1000, value = 100, step = 10, description = '$n_{1}$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=True)\n", "n1.style.handle_color = 'red'\n", "\n", "m1 = widgets.FloatSlider(min=0.2, max = 0.8, value = 0.3, step = 0.1, description = '$\\overline{x}_{1}$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=True)\n", "m1.style.handle_color = 'red'\n", "\n", "s1 = widgets.FloatSlider(min=0.0, max = 0.2, value = 0.03, step = 0.005, description = '$s_1$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=True)\n", "s1.style.handle_color = 'red'\n", "\n", "ui1 = widgets.VBox([n1,m1,s1],) # basic widget formatting \n", "\n", "n2 = widgets.IntSlider(min=0, max = 1000, value = 100, step = 10, description = '$n_{2}$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=True)\n", "n2.style.handle_color = 'blue'\n", "\n", "m2 = widgets.FloatSlider(min=0.2, max = 0.8, value = 0.2, step = 0.1, description = '$\\overline{x}_{2}$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=True)\n", "m2.style.handle_color = 'blue'\n", "\n", "s2 = widgets.FloatSlider(min=0, max = 0.2, value = 0.03, step = 0.005, description = '$s_2$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=True)\n", "s2.style.handle_color = 'blue'\n", "\n", "ui2 = widgets.VBox([n2,m2,s2],) # basic widget formatting \n", "\n", "nq = widgets.IntSlider(min=10, max = 1000, value = 100, step = 1, description = '$n_q$',orientation='horizontal',layout=Layout(width='300px', height='30px'),continuous_update=True)\n", "nq.style.handle_color = 'gray'\n", "\n", "plot = widgets.Checkbox(value=False,description='Make Plot')\n", "\n", "ui3 = widgets.VBox([nq,plot],) # basic widget formatting \n", "\n", "ui4 = widgets.HBox([ui1,ui2,ui3],) # basic widget formatting \n", "\n", "ui2 = widgets.VBox([l,ui4],)\n", "\n", "def f_make(n1, m1, s1, n2, m2, s2, nq,plot): # function to take parameters, make sample and plot\n", "\n", " seed = 73073; np.random.seed(seed=seed)\n", " X1 = np.random.normal(loc=m1,scale=s1,size=n1)\n", " X2 = np.random.normal(loc=m2,scale=s2,size=n2)\n", "\n", " xmin=0.0; xmax=0.6 \n", " \n", " cumul_prob = np.linspace(1,99,nq)\n", " X1_percentiles = np.percentile(X1,cumul_prob)\n", " X2_percentiles = np.percentile(X2,cumul_prob)\n", "\n", " fig = plt.figure()\n", " spec = fig.add_gridspec(2, 3)\n", " \n", " ax0 = fig.add_subplot(spec[:, 1:])\n", " ax0.scatter(X1_percentiles,X2_percentiles,color='darkorange',edgecolor='black',s=10,label='QQ-plot')\n", " ax0.plot([0,1],[0,1],ls='--',color='red')\n", " plt.grid(); plt.xlim([xmin,xmax]); plt.ylim([xmin,xmax]); plt.xlabel('X1 - Porosity (fraction)'); plt.ylabel('X2 - Porosity (fraction)'); \n", " plt.title('QQ-Plot'); plt.legend(loc='upper right')\n", " \n", " ax10 = fig.add_subplot(spec[0, 0])\n", " ax10.hist(X1,bins=np.linspace(xmin,xmax,30),color='red',alpha=0.3,edgecolor='black',label='X1',density=True)\n", " ax10.hist(X2,bins=np.linspace(xmin,xmax,30),color='blue',alpha=0.3,edgecolor='black',label='X2',density=True)\n", " ax10.grid(); plt.xlim([xmin,xmax]); ax10.set_ylim([0,15]); ax10.set_xlabel('Porosity (fraction)'); ax10.set_ylabel('Density')\n", " ax10.set_title('Histograms'); ax10.legend(loc='upper right')\n", " \n", " ax11 = fig.add_subplot(spec[1, 0])\n", " ax11.scatter(np.sort(X1),np.linspace(0,1,len(X1)),color='red',edgecolor='black',s=10,label='X1')\n", " ax11.scatter(np.sort(X2),np.linspace(0,1,len(X2)),color='blue',edgecolor='black',s=10,label='X2')\n", " ax11.grid(); ax11.set_xlim([xmin,xmax]); ax11.set_ylim([0,1]); ax11.set_xlabel('Porosity (fraction)'); ax11.set_ylabel('Cumulative Probability')\n", " ax11.set_title('CDFs'); ax11.legend(loc='upper left')\n", " \n", " if plot:\n", " fig.savefig('QQ_plot.png',dpi=600,facecolor='white')\n", " \n", " plt.subplots_adjust(left=0.0, bottom=0.0, right=1.5, top=1.4, wspace=0.3, hspace=0.3); plt.show()\n", " \n", " \n", "# connect the function to make the samples and plot to the widgets \n", "interactive_plot = widgets.interactive_output(f_make, {'n1': n1, 'm1': m1, 's1': s1, 'n2': n2, 'm2': m2, 's2': s2, 'nq': nq, 'plot':plot})\n", "interactive_plot.clear_output(wait = True) # reduce flickering by delaying plot updating" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### QQ-Plot, Comparing Distributions\n", "\n", "* demonstration of QQ-plots to compare distributions while varying the distributions\n", "\n", "#### Michael Pyrcz, Professor, The University of Texas at Austin \n", "\n", "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)\n", "\n", "### The Problem\n", "\n", "Let's make 2 random datasets, $\\color{red}{X_1}$ and $\\color{blue}{X_2}$.\n", "\n", "* **$n_1$**, **$n_2$** number of samples, **$\\overline{x}_1$**, **$\\overline{x}_2$** means and **$s_1$**, **$s_2$** standard deviation of the 2 sample sets\n", "* **$L$**: number of bootstrap realizations\n", "* **$\\alpha$**: alpha level" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "50c6357d5de24a68a43a2463e2515e59", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Text(value=' Interactive QQ-Plot, Michael Pyrcz, Professor, The University of Texas at Austin',…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "9ada9709a2b247a19f44a21da7460f6d", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '
', 'i…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(ui2, interactive_plot) # display the interactive plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Comments\n", "\n", "This was a basic demonstration of QQ-plot in Python.\n", "\n", "I have other demonstrations on the basics of working with DataFrames, ndarrays, univariate statistics, plotting data, declustering, data transformations, trend modeling and many other workflows available at [Python Demos](https://github.com/GeostatsGuy/PythonNumericalDemos) and a Python package for data analytics and geostatistics at [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy). \n", " \n", "I hope this was helpful,\n", "\n", "*Michael*\n", "\n", "#### The Author:\n", "\n", "### Michael Pyrcz, Associate Professor, University of Texas at Austin \n", "*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*\n", "\n", "With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. \n", "\n", "For more about Michael check out these links:\n", "\n", "#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n", "\n", "#### Want to Work Together?\n", "\n", "I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.\n", "\n", "* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! \n", "\n", "* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!\n", "\n", "* I can be reached at mpyrcz@austin.utexas.edu.\n", "\n", "I'm always happy to discuss,\n", "\n", "*Michael*\n", "\n", "Michael Pyrcz, Ph.D., P.Eng. Associate Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin\n", "\n", "#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.9.12" } }, "nbformat": 4, "nbformat_minor": 2 }