{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": true }, "source": [ "

\n", " \n", "\n", "

\n", "\n", "## Data Analytics \n", "\n", "### Optimum Decision Making in Python \n", "\n", "\n", "#### Michael Pyrcz, Associate Professor, University of Texas at Austin \n", "\n", "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Optimum Decision Making in Python \n", "\n", "Here's a simple interactive demonstration of optimum decision making in the precense of uncertainty. This should help you get started with using your uncertainty models to make decisions. \n", "\n", "#### Optimum Decision Making in the Precense of Uncertainty \n", "\n", "Definition: the decision that minimizes the expected loss or maximizes the expected profit\n", "\n", "Procedure: \n", "\n", "1. Calculate M realizations / reservoir scenarios (the uncertainty model)\n", "2. Establish S, development scenarios (includes all development decisions, could be many)\n", "3. Establish profit metric (accounts for all relevant factors)\n", "4. Calculate profit over all reservoir realization / scenario (m) for each S development scenario (s). \n", "\n", "\\begin{equation} \n", "P_{s,m}, s=1,\\ldots,s, m=1,\\ldots,M\n", "\\end{equation}\n", "\n", "calculate the expected profit for each development scenario\n", "\n", "\\begin{equation}\n", "E{๐‘ƒ_๐‘ } = \\frac{1}{\\sum_{m=1}^M \\lambda_m} \\sum_{\\ell=1}^L \\lambda_m \\cdot P_{s,m}\n", "\\end{equation}\n", "\n", "define the optimal development scenario as the one with highest expected profit\t\n", " \n", "* If all models are equiprobable then, $๐œ†_๐‘š = \\frac{1}{m}, m=1,โ€ฆ,M$\n", "\n", "#### What Uncertainty Distribution is Best?\n", "\n", "It is natural to assume that a narrower uncertainty distribution is better.\n", "\n", "* lower dispersion or variance in the probability density function\n", "\n", "Yet, higher variance has more downside and more upside.\n", "\n", "* depending on the loss function and over- and under-estimation loss or profit the optimum estimate may shift up, shift down or stay the same\n", "\n", "To make an optimum estimate / decision from a distribution we need a function that describes the:\n", "\n", "* loss due to under and overestimation.\n", "\n", "#### Limitations\n", "\n", "The optimum decision making in the precense of uncertainty assumes:\n", "1. **representativity** - the uncertainty distribution is representative, a good uncertainty model\n", "2. **independence** - between the variables / features impacting the uncertainty distribuiton and those impacting the lose function\n", "3. **stationarity** - the uncertainty model and the loss function are invariant under translation over included spatial samples and for location of implementation\n", " \n", "#### Getting Started\n", "\n", "Here's the steps to get setup in Python with the GeostatsPy package:\n", "\n", "1. Install Anaconda 3 on your machine (https://www.anaconda.com/download/). \n", "2. From Anaconda Navigator (within Anaconda3 group), go to the environment tab, click on base (root) green arrow and open a terminal. \n", "3. In the terminal type: pip install geostatspy. \n", "4. Open Jupyter and in the top block get started by copy and pasting the code block below from this Jupyter Notebook to start using the geostatspy functionality. \n", "\n", "There are examples below with these functions. You can go here to see a list of the available functions, https://git.io/fh4eX, other example workflows and source code. " ] }, { "cell_type": "code", "execution_count": 54, "metadata": {}, "outputs": [], "source": [ "import geostatspy.GSLIB as GSLIB # GSLIB utilies, visualization and wrapper\n", "import geostatspy.geostats as geostats # GSLIB methods convert to Python " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will also need some standard packages. These should have been installed with Anaconda 3." ] }, { "cell_type": "code", "execution_count": 55, "metadata": {}, "outputs": [], "source": [ "import numpy as np # ndarrys for gridded data\n", "import pandas as pd # DataFrames for tabular data\n", "import os # set working directory, run executables\n", "import matplotlib.pyplot as plt # for plotting\n", "from scipy import stats # summary statistics\n", "from scipy.stats.kde import gaussian_kde # PDF estimation from data\n", "import math # trig etc.\n", "import random\n", "from ipywidgets import interactive # widgets and interactivity\n", "from ipywidgets import widgets \n", "from ipywidgets import Layout\n", "from ipywidgets import Label\n", "from ipywidgets import VBox, HBox\n", "working_directory = \"C://PGE383//\" # for convenience you can set the default input file information here\n", "bootstrap_file = 'avg_por_boot.csv' # only used if you load your bootstrap results from a file\n", "feature_name = 'avg_por_boot'" ] }, { "cell_type": "code", "execution_count": 56, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "'C://PGE383//'" ] }, "execution_count": 56, "metadata": {}, "output_type": "execute_result" } ], "source": [ "working_directory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set Up the Interactive Dashboard \n", "\n", "The following code includes the:\n", "\n", "* dashboard\n", "* numerical methods\n", "* graphical displays" ] }, { "cell_type": "code", "execution_count": 57, "metadata": {}, "outputs": [], "source": [ "# interactive calculation of the random sample set (control of source parametric distribution and number of samples)\n", "l = widgets.Text(value=' Optimum Decision Making Demonstration, Michael Pyrcz, Associate Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))\n", "\n", "load = widgets.Checkbox(\n", " value=False,\n", " description='Load Data',\n", " disabled=False,\n", " button_style='', # 'success', 'info', 'warning', 'danger' or ''\n", " tooltip='Description',\n", " icon='check' # (FontAwesome names without the `fa-` prefix)\n", ")\n", "\n", "mean = widgets.FloatSlider(min=0.0, max = 1.0, value = 0.2,step=0.01,description = '$\\mu$',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)\n", "mean.style.handle_color = 'red'\n", "\n", "stdev = widgets.FloatSlider(min=0.001, max = 0.25, value = 0.03,step=0.01,description = '$\\sigma$',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)\n", "stdev.style.handle_color = 'red'\n", "\n", "file = widgets.Text(\n", " value= working_directory + bootstrap_file,\n", " placeholder='Type something',\n", " description='String:',\n", " disabled=False\n", ")\n", "\n", "feature = widgets.Text(\n", " value=feature_name,\n", " placeholder='Type something',\n", " description='String:',\n", " disabled=False\n", ")\n", "file.layout.display = \"none\"\n", "feature.layout.display = \"none\"\n", "\n", "ui1 = widgets.VBox([load,mean,stdev,file,feature],kwargs = {'justify_content':'center'})\n", "\n", "slope_under = widgets.FloatSlider(min=0.01, max = 1.0,value = 0.1,step=0.01,description = '$m_{under}$',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)\n", "slope_under.style.handle_color = 'blue'\n", "\n", "power_under = widgets.FloatSlider(min=1.0, max = 5.0, value = 0.0, description = '$p_{under}$',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)\n", "power_under.style.handle_color = 'blue'\n", "\n", "ui2 = widgets.VBox([slope_under,power_under],kwargs = {'justify_content':'center'})\n", "\n", "slope_over = widgets.FloatSlider(min=0.01, max = 1.0,value = 0.1,step=0.1,description = '$m_{over}$',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)\n", "slope_over.style.handle_color = 'blue'\n", "\n", "power_over = widgets.FloatSlider(min=1.0, max = 5.0, value = 0.0, description = '$p_{over}$',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)\n", "power_over.style.handle_color = 'blue'\n", "\n", "ui3 = widgets.VBox([slope_over,power_over],kwargs = {'justify_content':'center'})\n", "\n", "step = widgets.FloatLogSlider(min=-4, max = -1, value = 0.01, description = 'Step',orientation='horizontal',layout=Layout(width='230px', height='50px'),continuous_update=False)\n", "step.style.handle_color = 'gray'\n", "\n", "ui4 = widgets.VBox([step],kwargs = {'justify_content':'none'})\n", "\n", "ui = widgets.HBox([ui1,ui2,ui3,ui4])\n", "ui_title = widgets.VBox([l,ui],)\n", "\n", "out = widgets.Output()\n", "def update_interface(change):\n", " if change.new == True:\n", " mean.layout.display = \"none\"\n", " stdev.layout.display = \"none\"\n", " file.layout.display = \"flex\"\n", " feature.layout.display = \"flex\"\n", " else:\n", " mean.layout.display = \"flex\"\n", " stdev.layout.display = \"flex\"\n", " file.layout.display = \"none\"\n", " feature.layout.display = \"none\"\n", "\n", "load.observe(update_interface, names = 'value')\n", "\n", "# def handle_slider_change(change):\n", "# caption.value = 'The slider value is ' + (\n", "# 'negative' if change.new < 0 else 'nonnegative'\n", "# )\n", "\n", "# slider.observe(handle_slider_change, names='value')\n", "\n", "def make_plot(load,mean,stdev,file,feature,slope_under,power_under,slope_over,power_over,step):\n", " if load == False:\n", " xmin = mean - 5.0 * stdev \n", " xmax = mean + 5.0 * stdev\n", " X = np.arange(xmin,xmax,step) # build uncertainty distribution\n", " pdf = stats.norm.pdf(X, loc = mean, scale = stdev)\n", " y = stats.norm.rvs(loc=mean,scale=stdev,size=10000)\n", " \n", " else:\n", " df = pd.read_csv(file)\n", " y = df[feature]\n", " #xmin = np.min(y); xmax = np.max(y)\n", " xmin = 0; xmax = 1.0\n", " X = np.arange(xmin,xmax,step) # build uncertainty distribution\n", " kernel = gaussian_kde(y)\n", " pdf = kernel(X) \n", " \n", " delta = np.arange(-1*(xmax-xmin)*4,(xmax-xmin)*4,step) # build loss function\n", " loss = np.zeros(len(delta))\n", " rloss = np.zeros(len(delta)) # flip for convolve operator\n", " \n", " for id, d in enumerate(delta):\n", " if d < 0.0: \n", " loss[id] = pow(abs(d),power_under)*slope_under\n", " rloss[len(delta)-id-1] = pow(abs(d),power_under)*slope_under \n", " else: \n", " loss[id] = pow(d,power_over)*slope_over\n", " rloss[len(delta)-id-1] = pow(d,power_over)*slope_over\n", " \n", " pdf_norm = pdf / pdf.sum() # calculate expected loss\n", " exp_loss = np.convolve(pdf_norm,loss,mode='valid')\n", " x_exp_loss = np.arange(-0.5*step*(len(exp_loss))+mean,0.5*step*(len(exp_loss))+mean,step)\n", " x_exp_loss = x_exp_loss[0:len(exp_loss)] # avoid rounding error\n", " inside = np.logical_and(x_exp_loss >= xmin, x_exp_loss <= xmax)\n", " max_loss_plot = np.max(exp_loss,where = inside,initial=0.0)\n", " best_estimate = x_exp_loss[np.argmin(exp_loss)]\n", " min_loss = np.min(exp_loss)\n", " \n", " plt.subplot(131) # plot uncertainty distribution\n", " plt.plot(X,pdf,'--',color='black',alpha=0.4,zorder=1)\n", " hist = plt.hist(y,bins=X,edgecolor='black',color='red',alpha=0.1,density=True,stacked=True)\n", " plt.plot([best_estimate,best_estimate],[0,np.max(hist[0])*1.2],color='black',linestyle='--')\n", " plt.annotate('Optimum Estimate',[best_estimate,np.max(hist[0])*0.6], rotation=270)\n", " plt.xlim(0.0,1.0); plt.xlabel(\"Feature, $y_1$\"); plt.ylim([0,np.max(hist[0])*1.2])\n", " plt.title('Uncertainty Distribution'); plt.ylabel('Density')\n", " plt.grid(color='grey', ls = '-.', lw = 0.1)\n", "\n", " plt.subplot(132) # plot loss function, loss vs. estimate error\n", " plt.plot(delta,loss,color='blue',alpha=0.2)\n", "# plt.plot([0,0],[0,np.max(loss)],color='black',linestyle='--',alpha=0.3)\n", " plt.plot([0,0],[0,1.0],color='black',linestyle='--',alpha=0.3)\n", "# plt.annotate('Underestimation',[(xmax-xmin)*-0.1,np.max(loss)*0.8], rotation=0,horizontalalignment='right')\n", "# plt.annotate('Overestimation',[(xmax-xmin)*0.1,np.max(loss)*0.8], rotation=0,horizontalalignment='left')\n", " plt.annotate('Underestimation',[(xmax-xmin)*-0.1,0.7], rotation=0,horizontalalignment='right')\n", " plt.annotate('Overestimation',[(xmax-xmin)*0.1,0.7], rotation=0,horizontalalignment='left')\n", "# plt.xlim([-1*(xmax-xmin)*3,(xmax-xmin)*3])\n", "# plt.ylim([0,np.max(loss)])\n", " plt.xlim([-1,1]); plt.ylim([0,1.0])\n", " plt.xlabel('Error In Estimate'); plt.title('Loss Function'); plt.ylabel('Loss')\n", " plt.grid(color='grey', ls = '-.', lw = 0.1)\n", " \n", " plt.subplot(133) # plot expected loss vs. estimate\n", " plt.plot(x_exp_loss,exp_loss,color='black',alpha=0.2)\n", " plt.plot([best_estimate,best_estimate],[0,max_loss_plot],color='black',linestyle='--')\n", " plt.xlim([0.0,1.0]); # plt.ylim([0,1.0])\n", " plt.annotate('Optimum Estimate',[best_estimate,max_loss_plot*0.6], rotation=270)\n", "# plt.plot([xmin,xmax],[min_loss,min_loss],color='black',linestyle='--',alpha=0.3)\n", " plt.plot([0,1],[min_loss,min_loss],color='black',linestyle='--',alpha=0.3)\n", " plt.annotate('Minimum Expected Loss',[xmin+(xmax-xmin)*0.2,min_loss*1.1], rotation=0,horizontalalignment='left')\n", " plt.ylim([0,max_loss_plot])\n", " plt.xlabel('Estimate, $y^*_1$'); plt.title('Expected Loss vs. Estimate'); plt.ylabel('Expected Loss')\n", " plt.grid(color='grey', ls = '-.', lw = 0.1)\n", " \n", " plt.subplots_adjust(left=0.0, bottom=0.0, right=3.0, top=1.2, wspace=0.2, hspace=0.2)\n", " plt.show() \n", " \n", "interactive_plot = widgets.interactive_output(make_plot, {'load':load,'mean':mean,'stdev':stdev,'file':file,'feature':feature,'slope_under':slope_under,'power_under':power_under,'slope_over':slope_over,'power_over':power_over,'step':step})\n", "interactive_plot.clear_output(wait = True) # reduce flickering by delaying plot updating \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Optimum Decision Making Demonstration\n", "\n", "* specify the uncertainty distribuiton $\\mu$, $\\sigma$ and loss function $m$ and $p$ and observe the loss function, expected loss and optimum estimate \n", "\n", "#### Michael Pyrcz, Associate Professor, University of Texas at Austin \n", "\n", "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)" ] }, { "cell_type": "code", "execution_count": 58, "metadata": { "scrolled": false }, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "cca204f6bdba4cc9a1e1edc22141ca53", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Text(value=' Optimum Decision Making Demonstration, Michaeโ€ฆ" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "1dad9b547dbe4a1f8d138ff20964241a", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '
', 'iโ€ฆ" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(ui_title, interactive_plot) # display the interactive plot" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": 36, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "479538d167114d4b85aeb0deb2425029", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Label(value='The values of range1 and range2 are synchronized')" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "a90df137f6ff4d3d82b06b612413584e", "version_major": 2, "version_minor": 0 }, "text/plain": [ "IntSlider(value=1, description='Slider', max=5, min=-5)" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "caption = widgets.Label(value='The values of range1 and range2 are synchronized')\n", "slider = widgets.IntSlider(min=-5, max=5, value=1, description='Slider')\n", "\n", "def handle_slider_change(change):\n", " caption.value = 'The slider value is ' + (\n", " 'negative' if change.new < 0 else 'nonnegative'\n", " )\n", "\n", "slider.observe(handle_slider_change, names='value')\n", "\n", "display(caption, slider)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Comments\n", "\n", "This was a basic demonstration of optimum decision making in the precense of uncertainty. \n", "\n", "* we build uncertainty models all the time to support decision making. Here's how it gets done.\n", "\n", "I have other demonstrations on the basics of working with DataFrames, ndarrays, univariate statistics, plotting data, declustering, data transformations, trend modeling, multivariate analysis and many other workflows available at https://github.com/GeostatsGuy/PythonNumericalDemos and https://github.com/GeostatsGuy/GeostatsPy. \n", " \n", "I hope this was helpful,\n", "\n", "*Michael*\n", "\n", "#### The Author:\n", "\n", "### Michael Pyrcz, Associate Professor, University of Texas at Austin \n", "*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*\n", "\n", "With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. \n", "\n", "For more about Michael check out these links:\n", "\n", "#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n", "\n", "#### Want to Work Together?\n", "\n", "I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.\n", "\n", "* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! \n", "\n", "* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!\n", "\n", "* I can be reached at mpyrcz@austin.utexas.edu.\n", "\n", "I'm always happy to discuss,\n", "\n", "*Michael*\n", "\n", "Michael Pyrcz, Ph.D., P.Eng. Associate Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin\n", "\n", "#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "celltoolbar": "Raw Cell Format", "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.6" } }, "nbformat": 4, "nbformat_minor": 2 }