{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "

\n", " \n", "\n", "

\n", "\n", "### Interactive Spatial Uncertainty Model Checking with Accuracy Plots\n", "\n", "\n", "#### Michael Pyrcz, Professor, The University of Texas at Austin \n", "\n", "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Uncertainty Checking\n", "\n", "Here is an interactive dashboard demonstrating spatial uncertainty model checking with the accuracy plot and uncertainty goodness cross validation-based approach proposed by Deutsch (1996) and included in Pyrcz and Deutsch (2014).\n", "\n", "I have recorded a walk-through of this interactive dashboard in my TBA series on my [YouTube](https://www.youtube.com/@GeostatsGuyLectures) channel. I'm stoked to guide you and share observations and things to try out! \n", "\n", "* I have a lecture on [Model Checking](https://www.youtube.com/watch?v=AVms8JoUWXc&list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ&index=49) as part of my [Data Analytics and Geostatistics](https://www.youtube.com/playlist?list=PLG19vXLQHvSB-D4XKYieEku9GQMQyAzjJ). Note, for all my recorded lectures the interactive and well-documented workflow demonstrations are available in my GitHub repositorie:\n", "\n", " * [GeostatsGuy's Python Numerical Demos](https://github.com/GeostatsGuy/PythonNumericalDemos)\n", " * [GeostatsGuy's Data Science Interactive Python](https://github.com/GeostatsGuy/DataScience_Interactive_Python)\n", "\n", "### Interactive Uncertainty Checking\n", "\n", "Here's a simple workflow for checking the uncertainty model from simple kriging estimates and the estimation variance\n", "\n", "* we assume a Gaussian local uncertainty model\n", "\n", "#### Spatial Estimation\n", "\n", "Consider the case of making an estimate at some unsampled location, $𝑧(\\bf{u}_0)$, where $z$ is the property of interest (e.g. porosity etc.) and $𝐮_0$ is a location vector describing the unsampled location.\n", "\n", "How would you do this given data, $𝑧(\\bf{𝐮}_1)$, $𝑧(\\bf{𝐮}_2)$, and $𝑧(\\bf{𝐮}_3)$?\n", "\n", "It would be natural to use a set of linear weights to formulate the estimator given the available data.\n", "\n", "\\begin{equation}\n", "z^{*}(\\bf{u}) = \\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} z(\\bf{u}_{\\alpha})\n", "\\end{equation}\n", "\n", "We could add an unbiasedness constraint to impose the sum of the weights equal to one. What we will do is assign the remainder of the weight (one minus the sum of weights) to the global average; therefore, if we have no informative data we will estimate with the global average of the property of interest.\n", "\n", "\\begin{equation}\n", "z^{*}(\\bf{u}) = \\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} z(\\bf{u}_{\\alpha}) + \\left(1-\\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} \\right) \\overline{z}\n", "\\end{equation}\n", "\n", "We will make a stationarity assumption, so let's assume that we are working with residuals, $y$. \n", "\n", "\\begin{equation}\n", "y^{*}(\\bf{u}) = z^{*}(\\bf{u}) - \\overline{z}(\\bf{u})\n", "\\end{equation}\n", "\n", "If we substitute this form into our estimator the estimator simplifies, since the mean of the residual is zero.\n", "\n", "\\begin{equation}\n", "y^{*}(\\bf{u}) = \\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} y(\\bf{u}_{\\alpha})\n", "\\end{equation}\n", "\n", "while satisfying the unbaisedness constraint. \n", "\n", "#### Kriging\n", "\n", "Now the next question is what weights should we use? \n", "\n", "We could use equal weighting, $\\lambda = \\frac{1}{n}$, and the estimator would be the average of the local data applied for the spatial estimate. This would not be very informative.\n", "\n", "We could assign weights considering the spatial context of the data and the estimate:\n", "\n", "* **spatial continuity** as quantified by the variogram (and covariance function)\n", "* **redundancy** the degree of spatial continuity between all of the available data with themselves \n", "* **closeness** the degree of spatial continuity between the avaiable data and the estimation location\n", "\n", "The kriging approach accomplishes this, calculating the best linear unbiased weights for the local data to estimate at the unknown location. The derivation of the kriging system and the resulting linear set of equations is available in the lecture notes. Furthermore kriging provides a measure of the accuracy of the estimate! This is the kriging estimation variance (sometimes just called the kriging variance).\n", "\n", "\\begin{equation}\n", "\\sigma^{2}_{E}(\\bf{u}) = C(0) - \\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} C(\\bf{u}_0 - \\bf{u}_{\\alpha})\n", "\\end{equation}\n", "\n", "What is 'best' about this estimate? Kriging estimates are best in that they minimize the above estimation variance. \n", "\n", "#### Properties of Kriging\n", "\n", "Here are some important properties of kriging:\n", "\n", "* **Exact interpolator** - kriging estimates with the data values at the data locations\n", "* **Kriging variance** can be calculated before getting the sample information, as the kriging estimation variance is not dependent on the values of the data nor the kriging estimate, i.e. the kriging estimator is homoscedastic. \n", "* **Spatial context** - kriging takes into account, furthermore to the statements on spatial continuity, closeness and redundancy we can state that kriging accounts for the configuration of the data and structural continuity of the variable being estimated.\n", "* **Scale** - kriging may be generalized to account for the support volume of the data and estimate. We will cover this later.\n", "* **Multivariate** - kriging may be generalized to account for multiple secondary data in the spatial estimate with the cokriging system. We will cover this later.\n", "* **Smoothing effect** of kriging can be forecast. We will use this to build stochastic simulations later.\n", "\n", "#### Spatial Continuity \n", "\n", "**Spatial Continuity** is the correlation between values over distance.\n", "\n", "* No spatial continuity – no correlation between values over distance, random values at each location in space regardless of separation distance.\n", "\n", "* Homogenous phenomenon have perfect spatial continuity, since all values as the same (or very similar) they are correlated. \n", "\n", "We need a statistic to quantify spatial continuity! A convenient method is the Semivariogram.\n", "\n", "#### The Semivariogram\n", "\n", "Function of difference over distance.\n", "\n", "* The expected (average) squared difference between values separated by a lag distance vector (distance and direction), $h$:\n", "\n", "\\begin{equation}\n", "\\gamma(\\bf{h}) = \\frac{1}{2 N(\\bf{h})} \\sum^{N(\\bf{h})}_{\\alpha=1} (z(\\bf{u}_\\alpha) - z(\\bf{u}_\\alpha + \\bf{h}))^2 \n", "\\end{equation}\n", "\n", "where $z(\\bf{u}_\\alpha)$ and $z(\\bf{u}_\\alpha + \\bf{h})$ are the spatial sample values at tail and head locations of the lag vector respectively.\n", "\n", "* Calculated over a suite of lag distances to obtain a continuous function.\n", "\n", "* the $\\frac{1}{2}$ term converts a variogram into a semivariogram, but in practice the term variogram is used instead of semivariogram.\n", "* We prefer the semivariogram because it relates directly to the covariance function, $C_x(\\bf{h})$ and univariate variance, $\\sigma^2_x$:\n", "\n", "\\begin{equation}\n", "C_x(\\bf{h}) = \\sigma^2_x - \\gamma(\\bf{h})\n", "\\end{equation}\n", "\n", "Note the correlogram is related to the covariance function as:\n", "\n", "\\begin{equation}\n", "\\rho_x(\\bf{h}) = \\frac{C_x(\\bf{h})}{\\sigma^2_x}\n", "\\end{equation}\n", "\n", "The correlogram provides of function of the $\\bf{h}-\\bf{h}$ scatter plot correlation vs. lag offset $\\bf{h}$. \n", "\n", "\\begin{equation}\n", "-1.0 \\le \\rho_x(\\bf{h}) \\le 1.0\n", "\\end{equation}\n", "\n", "\n", "#### Accuracy Plots\n", "\n", "The accuracy plot was developed by Deutsch (1996) and described in Pyrcz and Deutsch (2014). \n", "\n", "* a method for checking uncertainty models\n", "\n", "* based on calculating the percentiles of withheld testing data in the estimated local uncertainty distributions, $Z(\\bf{u}_{\\alpha})$, describes by CDFs, $F_Z(z,\\bf{u}_{\\alpha})$, at all testing locations $\\alpha = 1,\\ldots,n_{test}$.\n", "\n", "The accuracy plot is the proportion of data within symmetric probability intervals vs. the probability interval, $p$.\n", "\n", "* for example, 20% of withheld testing data should fall between the P40 and P60 probability interval \n", "\n", "#### Load the required libraries\n", "\n", "The following code loads the required libraries." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import geostatspy.GSLIB as GSLIB # GSLIB utilies, visualization and wrapper\n", "import geostatspy.geostats as geostats # GSLIB methods convert to Python " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will also need some standard packages. These should have been installed with Anaconda 3." ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "import os # to set current working directory \n", "from sklearn.model_selection import train_test_split # train and test data split by random selection of a proportion\n", "from scipy.stats import norm # Gaussian distribution assumed for local uncertainty\n", "import sys # supress output to screen for interactive variogram modeling\n", "import io # set the working directory\n", "import numpy as np # arrays and matrix math\n", "import pandas as pd # DataFrames\n", "import matplotlib.pyplot as plt # plotting\n", "from matplotlib.pyplot import cm # color maps\n", "from matplotlib.patches import Ellipse # plot an ellipse\n", "import math # sqrt operator \n", "from ipywidgets import interactive # widgets and interactivity\n", "from ipywidgets import widgets \n", "from ipywidgets import Layout\n", "from ipywidgets import Label\n", "from ipywidgets import VBox, HBox\n", "import warnings\n", "warnings.filterwarnings('ignore')\n", "cmap = plt.cm.inferno" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you get a package import error, you may have to first install some of these packages. This can usually be accomplished by opening up a command window on Windows and then typing 'python -m pip install [package-name]'. More assistance is available with the respective package docs. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Simple, Simple Kriging Function\n", "\n", "Let's write a fast Python function to take data points and unknown location and provide the:\n", "\n", "* **simple kriging estimate**\n", "\n", "* **simple kriging variance / estimation variance**\n", "\n", "* **simple kriging weights**\n", "\n", "This provides a fast method for small datasets, with less parameters (no search parameters) and the ability to see the simple kriging weights " ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [], "source": [ "def simple_simple_krige(df,xcol,ycol,vcol,dfl,xlcol,ylcol,vario,skmean):\n", "# load the variogram\n", " nst = vario['nst']; pmx = 9999.9\n", " cc = np.zeros(nst); aa = np.zeros(nst); it = np.zeros(nst)\n", " ang = np.zeros(nst); anis = np.zeros(nst)\n", " nug = vario['nug']; sill = nug \n", " cc[0] = vario['cc1']; sill = sill + cc[0]\n", " it[0] = vario['it1']; ang[0] = vario['azi1']; \n", " aa[0] = vario['hmaj1']; anis[0] = vario['hmin1']/vario['hmaj1'];\n", " if nst == 2:\n", " cc[1] = vario['cc2']; sill = sill + cc[1]\n", " it[1] = vario['it2']; ang[1] = vario['azi2']; \n", " aa[1] = vario['hmaj2']; anis[1] = vario['hmin2']/vario['hmaj2']; \n", "\n", "# set up the required matrices\n", " rotmat, maxcov = geostats.setup_rotmat(nug,nst,it,cc,ang,pmx) \n", " ndata = len(df); a = np.zeros([ndata,ndata]); r = np.zeros(ndata); s = np.zeros(ndata); rr = np.zeros(ndata)\n", " nest = len(dfl)\n", "\n", " est = np.zeros(nest); var = np.full(nest,sill); weights = np.zeros([nest,ndata])\n", "\n", "# Make and solve the kriging matrix, calculate the kriging estimate and variance \n", " for iest in range(0,nest):\n", " for idata in range(0,ndata):\n", " for jdata in range(0,ndata):\n", " a[idata,jdata] = geostats.cova2(df[xcol].values[idata],df[ycol].values[idata],df[xcol].values[jdata],df[ycol].values[jdata],\n", " nst,nug,pmx,cc,aa,it,ang,anis,rotmat,maxcov)\n", " r[idata] = geostats.cova2(df[xcol].values[idata],df[ycol].values[idata],dfl[xlcol].values[iest],dfl[ylcol].values[iest],\n", " nst,nug,pmx,cc,aa,it,ang,anis,rotmat,maxcov)\n", " rr[idata] = r[idata]\n", " \n", " s = geostats.ksol_numpy(ndata,a,r) \n", " sumw = 0.0\n", " for idata in range(0,ndata): \n", " sumw = sumw + s[idata]\n", " weights[iest,idata] = s[idata]\n", " est[iest] = est[iest] + s[idata]*df[vcol].values[idata]\n", " var[iest] = var[iest] - s[idata]*rr[idata]\n", " est[iest] = est[iest] + (1.0-sumw)*skmean\n", " return est,var,weights " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set Global Parameters\n", "\n", "These impact the look and results of this demonstration." ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [], "source": [ "seed = 73073 # random number seed for train and test split and added error term\n", "cmap = plt.cm.inferno # color map\n", "vmin = 0.0; vmax = 0.20 # feature min and max for plotting\n", "error_std = 0.0 # error standard deviation\n", "bins = 20 # number of bins for the accuracy plots" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Set the working directory\n", "\n", "I always like to do this so I don't lose files and to simplify subsequent read and writes (avoid including the full address each time). " ] }, { "cell_type": "code", "execution_count": 5, "metadata": {}, "outputs": [], "source": [ "#os.chdir(\"c:/PGE383\") # set the working directory" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Load and Visualize the Spatial Data\n", "\n", "Here's the code to load our comma delimited data file in to a Pandas' DataFrame object and to visualize it." ] }, { "cell_type": "code", "execution_count": 6, "metadata": {}, "outputs": [ { "data": { "image/png": "", "text/plain": [ "
" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "#df = pd.read_csv(\"sample_data_biased.csv\") # read a .csv file in as a DataFrame\n", "df = pd.read_csv(\"https://raw.githubusercontent.com/GeostatsGuy/GeoDataSets/master/sample_data_biased.csv\") # load the data from Dr. Pyrcz's github repository\n", "\n", "df['Porosity'] = df['Porosity']+norm.rvs(0.0,error_std,random_state = seed,size=len(df))\n", "\n", "plt.subplot(111)\n", "im = plt.scatter(df['X'],df['Y'],c=df['Porosity'],marker='o',cmap=cmap,vmin=vmin,vmax=vmax,alpha=0.8,linewidths=0.8,\n", " edgecolors=\"black\",label=\"train\")\n", "plt.title(\"Subset of the Data for the Demonstration\")\n", "plt.xlim([0,1000]); plt.ylim([0,1000])\n", "plt.xlabel('X(m)'); plt.ylabel('Y(m)'); plt.legend()\n", "cbar = plt.colorbar(im, orientation=\"vertical\", ticks=np.linspace(vmin, vmax, 10),format='%.2f')\n", "cbar.set_label('Porosity (fraction)', rotation=270, labelpad=20)\n", "\n", "plt.subplots_adjust(left=0.0, bottom=0.0, right=1.0, top=1.2, wspace=0.2, hspace=0.2)\n", "plt.show()" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Interactive Uncertainty Checking with Simple Kriging \n", "\n", "The following code includes:\n", "* dashboard with variogram model, number of data and the proportion of testing data \n", "* plots of variogram model, train and test data locations, accuracy plot and training data with testing data percentiles" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [], "source": [ "# import warnings; warnings.simplefilter('ignore')\n", "\n", "# build the dashboard\n", "style = {'description_width': 'initial'}\n", "l = widgets.Text(value=' Simple Kriging, Michael Pyrcz, Associate Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))\n", "nug = widgets.FloatSlider(min = 0, max = 1.0, value = 0.0, step = 0.1, description = 'nug',orientation='vertical',continuous_update=False,\n", " layout=Layout(width='50px', height='200px'))\n", "nug.style.handle_color = 'gray'\n", "it1 = widgets.Dropdown(options=['Spherical', 'Exponential', 'Gaussian'],value='Spherical',continuous_update=False,\n", " description='Type1:',disabled=False,layout=Layout(width='180px', height='30px'), style=style)\n", "\n", "azi = widgets.FloatSlider(min=0, max = 360, value = 45, step = 22.5, description = 'azi',continuous_update=False,\n", " orientation='vertical',layout=Layout(width='80px', height='200px'))\n", "azi.style.handle_color = 'gray'\n", "hmaj1 = widgets.FloatSlider(min=0.01, max = 10000.0, value = 5000.0, step = 500.0, description = 'hmaj1',continuous_update=False,\n", " orientation='vertical',layout=Layout(width='80px', height='200px'))\n", "hmaj1.style.handle_color = 'gray'\n", "hmin1 = widgets.FloatSlider(min = 0.01, max = 10000.0, value = 3000.0, step = 500.0, description = 'hmin1',continuous_update=False,\n", " orientation='vertical',layout=Layout(width='80px', height='200px'))\n", "hmin1.style.handle_color = 'gray'\n", "\n", "ptest = widgets.FloatSlider(min = 0.01, max = 0.9, value = 100.0, step = 0.1, description = 'prop test',continuous_update=False,\n", " orientation='vertical',layout=Layout(width='80px', height='200px'))\n", "ptest.style.handle_color = 'gray'\n", "\n", "ndata = widgets.IntSlider(min = 1, max = len(df), value = 100, step = 10, description = 'number data',continuous_update=False,\n", " orientation='vertical',layout=Layout(width='80px', height='200px'))\n", "ndata.style.handle_color = 'gray'\n", "\n", "uikvar = widgets.HBox([nug,it1,azi,hmaj1,hmin1,ptest,ndata],) \n", "\n", "uipars = widgets.HBox([uikvar],) \n", "uik = widgets.VBox([l,uipars],)\n", "\n", "# convenience function ot convert variogram model type to a integer\n", "def convert_type(it):\n", " if it == 'Spherical': \n", " return 1\n", " elif it == 'Exponential':\n", " return 2\n", " else: \n", " return 3\n", "\n", "# calculate the kriging-based uncertainty distributions and match truth values to percentiles and product plots\n", "def f_make_krige(nug,it1,azi,hmaj1,hmin1,ptest,ndata): \n", " text_trap = io.StringIO()\n", " sys.stdout = text_trap\n", " it1 = convert_type(it1)\n", " \n", " train, test = train_test_split(df.iloc[len(df)-ndata:,[0,1,3,]], test_size=ptest, random_state=73073)\n", " \n", " nst = 1; xlag = 10; nlag = int(hmaj1/xlag); c1 = 1.0-nug\n", " vario = GSLIB.make_variogram(nug,nst,it1,c1,azi,hmaj1,hmin1) # make model object\n", " index_maj,h_maj,gam_maj,cov_maj,ro_maj = geostats.vmodel(nlag,xlag,azi,vario) # project the model in the major azimuth # project the model in the 135 azimuth\n", " index_min,h_min,gam_min,cov_min,ro_min = geostats.vmodel(nlag,xlag,azi+90.0,vario) # project the model in the minor azimuth\n", " skmean = np.average(train['Porosity']) # calculate the input mean and sill for simple kriging\n", " sill = np.var(train['Porosity'])\n", " \n", " sk_est, sk_var, sk_weights = simple_simple_krige(train,'X','Y','Porosity',test,'X','Y',vario,skmean=skmean) # data, esitmation locations\n", " sk_std = np.sqrt(sk_var*sill) # standardize estimation variance by the sill and convert to std. dev.\n", " \n", " percentiles = norm.cdf(test['Porosity'],sk_est,sk_std) # calculate the percentiles of truth in the uncertainty models\n", " test[\"Percentile\"] = percentiles\n", " \n", " xlag = 10.0; nlag = int(hmaj1/xlag) # lags for variogram plotting\n", " \n", " plt.subplot(221) # plot variograms\n", " plt.plot([0,hmaj1*1.5],[1.0,1.0],color = 'black')\n", " plt.plot(h_maj,gam_maj,color = 'black',label = 'Major ' + str(azi)) \n", " plt.plot(h_min,gam_min,color = 'black',label = 'Minor ' + str(azi+90.0))\n", " deltas = [22.5, 45, 67.5]; \n", " ndelta = len(deltas); hd = np.zeros(ndelta); gamd = np.zeros(ndelta);\n", " color=iter(cm.plasma(np.linspace(0,1,ndelta)))\n", " for delta in deltas:\n", " index,hd,gamd,cov,ro = geostats.vmodel(nlag,xlag,azi+delta,vario);\n", " c=next(color)\n", " plt.plot(hd,gamd,color = c,label = 'Azimuth ' + str(azi+delta))\n", " plt.xlabel(r'Lag Distance $\\bf(h)$, (m)')\n", " plt.ylabel(r'$\\gamma \\bf(h)$')\n", " plt.title('Interpolated NSCORE Porosity Variogram Models')\n", " plt.xlim([0,hmaj1*1.5])\n", " plt.ylim([0,1.4])\n", " plt.legend(loc='upper left')\n", " \n", " plt.subplot(222) # plot the train and test data\n", " im = plt.scatter(train['X'],train['Y'],c=train['Porosity'],marker='o',s=30,cmap=cmap,vmin=vmin,vmax=vmax,alpha=0.8,\n", " linewidths=2.0,edgecolors=\"black\",label=\"train\",zorder=50)\n", " plt.scatter(test['X']+12.0,test['Y'],c=sk_est,marker='>',s=50,cmap=cmap,vmin=vmin,vmax=vmax,alpha=0.8,\n", " linewidths=0.5,edgecolors=\"black\",label=\"test\",zorder=10)\n", " plt.scatter(test['X']-12.0,test['Y'],c=test['Porosity'],marker='<',s=50,cmap=cmap,vmin=vmin,vmax=vmax,alpha=0.8,\n", " linewidths=0.5,edgecolors=\"black\",label=\"truth\",zorder=10)\n", " plt.scatter(test['X']-1,test['Y'],c='black',edgecolor='black',marker='o',s=7,cmap=cmap,vmin=vmin,vmax=vmax,alpha=0.8,\n", " linewidths=0.5,edgecolors=\"black\",label=\"truth\",zorder=100)\n", " plt.title(\"Training and Testing Data\")\n", " plt.xlim([0,1000]); plt.ylim([0,1000])\n", " plt.xlabel('X(m)'); plt.ylabel('Y(m)')\n", " legend = plt.legend(loc='lower left',ncols=4,fancybox=True,facecolor='white',framealpha=1, frameon=True).set_zorder(10000)\n", " cbar = plt.colorbar(im, orientation=\"vertical\", ticks=np.linspace(vmin, vmax, 10),format='%.2f')\n", " cbar.set_label('Porosity (fraction)', rotation=270, labelpad=20)\n", " plt.grid(True)\n", " \n", " fraction_in = np.zeros(bins) # calculate and plot the accuracy plot\n", " p_intervals = np.linspace(0.0,1.0,bins)\n", " for i,p in enumerate(p_intervals): \n", " test_result = (test['Percentile'] > 0.5-0.5*p) & (test['Percentile'] < 0.5+0.5*p)\n", " fraction_in[i] = test_result.sum()/len(test)\n", "\n", " plt.subplot(223) \n", " plt.scatter(p_intervals,fraction_in,c='grey',edgecolor='black',marker='o',alpha=0.8,zorder=100)\n", " plt.plot([0.0,1.0],[0.0,1.0],c='grey',zorder=100,ls='--')\n", " plt.fill_between([0.1,1],[0,0.9],[0,0],color='red',alpha=0.2,zorder=1)\n", " plt.fill_between([0,0.9],[0.1,1.0],[1.0,1.0],color='yellow',alpha=0.2,zorder=1)\n", " plt.xlim([0.0,1.0]); plt.ylim([0,1.0])\n", " plt.annotate('Accurate and Precise',xy=[0.3,0.3],rotation=40,fontsize=16)\n", " plt.annotate('Inaccurate and Imprecise',xy=[0.4,0.1],rotation=40,fontsize=16)\n", " plt.annotate('Accurate and Imprecise',xy=[0.2,0.5],rotation=40,fontsize=16)\n", " plt.title('Uncertainty Model at Unknown Location')\n", " plt.xlabel('Probability Interval'); plt.ylabel('Fraction In the Interval')\n", " \n", " plt.subplot(224) # plot the testing percentiles with the training data\n", " plt.scatter(train['X'],train['Y'],s=20,c='black',marker='o',cmap=cmap,vmin=vmin,vmax=vmax,alpha=0.8,linewidths=0.8,\n", " edgecolors=\"black\",label=\"train\")\n", " im = plt.scatter(test['X'],test['Y'],s=80.0,c=test['Percentile'],marker='^',cmap=cmap,vmin=0.0,vmax=1.0,alpha=0.8,linewidths=0.8,\n", " edgecolors=\"black\",label=\"test\")\n", " plt.title(\"Cross Validation Percentiles\")\n", " plt.xlim([0,1000]); plt.ylim([0,1000])\n", " plt.xlabel('X(m)'); plt.ylabel('Y(m)'); plt.legend()\n", " cbar = plt.colorbar(im, orientation=\"vertical\", ticks=np.linspace(0.0, 1.0, 10),format='%.2f')\n", " cbar.set_label('Porosity Truth Percentile (fraction)', rotation=270, labelpad=20) \n", " plt.grid(True) \n", "\n", " plt.subplots_adjust(left=0.0, bottom=0.0, right=1.8, top=2.2, wspace=0.3, hspace=0.3)\n", " plt.show()\n", " \n", "# connect the function to make the samples and plot to the widgets \n", "interactive_plot = widgets.interactive_output(f_make_krige, {'nug':nug, 'it1':it1, 'azi':azi, 'hmaj1':hmaj1, 'hmin1':hmin1, 'ptest':ptest, 'ndata':ndata})\n", "interactive_plot.clear_output(wait = True) # reduce flickering by delaying plot updating" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Interactive Uncertianty Checking Kriging Demostration\n", "\n", "* select the variogram model for simple kriging and observe the impact on the uncertainty model\n", "\n", "#### Michael Pyrcz, Associate Professor, University of Texas at Austin \n", "\n", "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)\n", "\n", "### The Inputs\n", "\n", "Select the variogram model and the proportion of data withheld for testing.\n", "\n", "* **nug**: nugget effect\n", "\n", "* **c1**: contributions of the sill\n", "\n", "* **hmaj1 / hmin1**: range in the major and minor direction\n", "\n", "* **(x1, y1),...(x3,y3)**: spatial data locations \n", "\n", "* **test proportion**: proportion of data withheld for testing" ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "79db16a82acd4607b1f8bfa34dd2bac6", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Text(value=' Simple Kriging, Michael Pyrcz, Associ…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "e7ea453f379847c39804b574ff1ebd83", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output()" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(uik, interactive_plot) # display the interactive plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Comments\n", "\n", "This was an interactive demonstration of uncertainty modeling checking with accuracy plots (Deutsch, 1996; Pyrcz and Deutsch, 2014). Much more could be done, I have other demonstrations on the basics of working with DataFrames, ndarrays, univariate statistics, plotting data, declustering, data transformations and many other workflows available at https://github.com/GeostatsGuy/PythonNumericalDemos and https://github.com/GeostatsGuy/GeostatsPy. \n", " \n", "#### The Author:\n", "\n", "### Michael Pyrcz, Professor, The University of Texas at Austin \n", "*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*\n", "\n", "With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. \n", "\n", "For more about Michael check out these links:\n", "\n", "#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n", "\n", "#### Want to Work Together?\n", "\n", "I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.\n", "\n", "* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! \n", "\n", "* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!\n", "\n", "* I can be reached at mpyrcz@austin.utexas.edu.\n", "\n", "I'm always happy to discuss,\n", "\n", "*Michael*\n", "\n", "Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin\n", "\n", "#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) \n", " " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "© Copyright daytum 2021. All Rights Reserved" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 2 }