{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "\n", "

\n", " \n", "\n", "

\n", "\n", "## Interactive Radia Basis Functions\n", "\n", "\n", "### Michael Pyrcz, Associate Professor, University of Texas at Austin \n", "\n", "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n", "\n", "\n", "### The Interactive Workflow\n", "\n", "Here's a simple workflow for interpolation, spatiotemporal prediction (estimaton) with radial basis functions. We use a 'toy problem' with the ability to change:\n", "\n", "* the number of data (the maximum number of basis functions)\n", "* the kernel and kernel parameter (e.g. width of the Gaussian kernel)\n", "* polynomial trend model order\n", "* post-processing with smoothing\n", "\n", "#### Spatial Estimation\n", "\n", "Consider the case of making an estimate at some unsampled location, $𝑧(\\bf{u}_0)$, where $z$ is the property of interest (e.g. porosity etc.) and $𝐮_0$ is a location vector describing the unsampled location.\n", "\n", "How would you do this given data, $𝑧(\\bf{𝐮}_1)$, $𝑧(\\bf{𝐮}_2)$, and $𝑧(\\bf{𝐮}_3)$?\n", "\n", "It would be natural to use a set of linear weights to formulate the estimator given the available data.\n", "\n", "\\begin{equation}\n", "z^{*}(\\bf{u}) = \\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} z(\\bf{u}_{\\alpha})\n", "\\end{equation}\n", "\n", "We could add an unbiasedness constraint to impose the sum of the weights equal to one. What we will do is assign the remainder of the weight (one minus the sum of weights) to the global average; therefore, if we have no informative data we will estimate with the global average of the property of interest.\n", "\n", "\\begin{equation}\n", "z^{*}(\\bf{u}) = \\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} z(\\bf{u}_{\\alpha}) + \\left(1-\\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} \\right) \\overline{z}\n", "\\end{equation}\n", "\n", "We will make a stationarity assumption, so let's assume that we are working with residuals, $y$. \n", "\n", "\\begin{equation}\n", "y^{*}(\\bf{u}) = z^{*}(\\bf{u}) - \\overline{z}(\\bf{u})\n", "\\end{equation}\n", "\n", "If we substitute this form into our estimator the estimator simplifies, since the mean of the residual is zero.\n", "\n", "\\begin{equation}\n", "y^{*}(\\bf{u}) = \\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} y(\\bf{u}_{\\alpha})\n", "\\end{equation}\n", "\n", "while satisfying the unbaisedness constraint. \n", "\n", "#### Radial Basis Function\n", "\n", "For radial basis functions the idea to fit the data with a set of additive radial functions centered at data locations.\n", "\n", "* the radial functions are defined as $\\theta [0,\\infty) \\Longrightarrow R$\n", "* the radial functions are a function of $\\theta_{\\alpha} = \\theta(||\\bf{u}-\\bf{u}_{\\alpha}||)$ for any set of nodes ${x_k}^n_{k=1}$\n", "\n", "For interpolation the radial basis function is the addition of radial functions\n", "\n", "\\begin{equation}\n", "y(\\bf{u}) = \\sum_{i=\\alpha}^{N} \\omega_{\\alpha} \\theta(||\\bf{u}-\\bf{u}_{\\alpha}||)\n", "\\end{equation}\n", "\n", "The weights are solved for with the matrix method of linear least squares.\n", "\n", "$$\n", "\\left[\\begin{array}{cc} \n", "\\theta(||\\bf{u}_1-\\bf{u}_1||) & \\theta(||\\bf{u}_2-\\bf{u}_1||) & \\ldots & \\theta(||\\bf{u}_n-\\bf{u}_{1}||)\\\\\n", "\\theta(||\\bf{u}_1-\\bf{u}_2||) & \\theta(||\\bf{u}_2-\\bf{u}_2||) & \\ldots & \\theta(||\\bf{u}_n-\\bf{u}_{2}||)\\\\\n", "\\vdots & \\vdots & \\ddots & \\vdots \\\\\n", "\\theta(||\\bf{u}_1-\\bf{u}_n||) & \\theta(||\\bf{u}_2-\\bf{u}_n||) & \\ldots & \\theta(||\\bf{u}_n-\\bf{u}_{n}||)\\\\\n", "\\end{array}\\right]\n", "\\left[\\begin{array}{cc} \n", "\\omega_1 \\\\\n", "\\omega_2 \\\\\n", "\\vdots \\\\\n", "\\omega_n \\\\\n", "\\end{array}\\right]\n", "=\n", "\\left[\\begin{array}{cc} \n", "\\hat{y}(\\bf{u}_1) \\\\\n", "\\hat{y}(\\bf{u}_2) \\\\\n", "\\vdots \\\\\n", "\\hat{y}(\\bf{u}_n) \\\\\n", "\\end{array}\\right]\n", "$$\n", "\n", "Types of radial basis functions include:\n", "\n", "1. Gaussian\n", "\n", "\\begin{equation}\n", "\\theta(r) = e^{-(\\epsilon r)^2}\n", "\\end{equation}\n", "\n", "2. Thin Plate Spline\n", "\n", "\\begin{equation}\n", "\\theta(r) = r^2 ln(r)\n", "\\end{equation}\n", "\n", "3. Inverse Quadratic\n", "\n", "\\begin{equation}\n", "\\theta(r) = \\frac{1}{1 + (\\epsilon r)^2}\n", "\\end{equation}\n", "\n", "\n", "\n", "#### Objective \n", "\n", "The objective is to remove the hurdles of subsurface modeling workflow construction by providing building blocks and sufficient examples. This is not a coding class per se, but we need the ability to 'script' workflows working with numerical methods. \n", "\n", "#### Getting Started\n", "\n", "Here's the steps to get setup in Python with the GeostatsPy package:\n", "\n", "1. Install Anaconda 3 on your machine (https://www.anaconda.com/download/). \n", "2. From Anaconda Navigator (within Anaconda3 group), go to the environment tab, click on base (root) green arrow and open a terminal. \n", "3. In the terminal type: pip install geostatspy. \n", "4. Open Jupyter and in the top block get started by copy and pasting the code block below from this Jupyter Notebook to start using the geostatspy functionality. \n", "\n", "You will need to copy the data file to your working directory. They are available here:\n", "\n", "* Tabular data - sample_data.csv at https://git.io/fh4gm.\n", "\n", "There are exampled below with these functions. You can go here to see a list of the available functions, https://git.io/fh4eX, other example workflows and source code. \n", "\n", "#### Load the required libraries\n", "\n", "The following code loads the required libraries." ] }, { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "import geostatspy.GSLIB as GSLIB # GSLIB utilies, visualization and wrapper\n", "import geostatspy.geostats as geostats # GSLIB methods convert to Python " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We will also need some standard packages. These should have been installed with Anaconda 3." ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "scrolled": true }, "outputs": [], "source": [ "%matplotlib inline\n", "import os # to set current working directory \n", "import sys # supress output to screen for interactive variogram modeling\n", "import io\n", "import numpy as np # arrays and matrix math\n", "import pandas as pd # DataFrames\n", "import matplotlib.pyplot as plt # plotting\n", "from matplotlib.pyplot import cm # color maps\n", "from matplotlib.patches import Ellipse # plot an ellipse\n", "import math # sqrt operator\n", "from sklearn.model_selection import train_test_split # train / test DatFrame split\n", "from scipy.interpolate import RBFInterpolator # radial basis function interpolator\n", "from ipywidgets import interactive # widgets and interactivity\n", "from ipywidgets import widgets \n", "from ipywidgets import Layout\n", "from ipywidgets import Label\n", "from ipywidgets import VBox, HBox" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "If you get a package import error, you may have to first install some of these packages. This can usually be accomplished by opening up a command window on Windows and then typing 'python -m pip install [package-name]'. More assistance is available with the respective package docs. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Setup the Spatial Dataset\n", "\n", "We load a simple spatial dataset from Dr. Pyrcz's GitHub account." ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "data": { "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
DepthNporosityPorosity
010.25-1.370.071576
110.50-2.080.057081
210.75-1.670.065451
311.00-1.160.075863
411.25-0.240.094646
\n", "
" ], "text/plain": [ " Depth Nporosity Porosity\n", "0 10.25 -1.37 0.071576\n", "1 10.50 -2.08 0.057081\n", "2 10.75 -1.67 0.065451\n", "3 11.00 -1.16 0.075863\n", "4 11.25 -0.24 0.094646" ] }, "execution_count": 3, "metadata": {}, "output_type": "execute_result" } ], "source": [ "#df2 = pd.read_csv(\"1D_Porosity.csv\") # read a .csv file in as a DataFrame \n", "df = pd.read_csv(\"https://raw.githubusercontent.com/GeostatsGuy/GeoDataSets/master/1D_Porosity.csv\") # load the data from Dr. Pyrcz's github repository\n", "df['Depth'] = df['Depth'] + 10.0 # shift depths\n", "df['Porosity'] = GSLIB.affine(df['Nporosity'].values,0.10,0.02) # transform porosity values original units\n", "df.head() # preview the resulting dataset " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Interactive Radial Basis Functions Method\n", "\n", "The following code includes:\n", "\n", "* dashboard with select the kernel, number of data, polynomial trend degree and smoothing and observe the model \n", "\n", "* plots of individual basis functions and the resulting model with train and test data" ] }, { "cell_type": "code", "execution_count": 17, "metadata": {}, "outputs": [], "source": [ "#import warnings; warnings.simplefilter('ignore')\n", "\n", "# interactive calculation of the sample set (control of source parametric distribution and number of samples)\n", "style = {'description_width': 'initial'}\n", "l = widgets.Text(value=' Radial Basis Functions, Michael Pyrcz, Associate Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))\n", "eps = widgets.FloatLogSlider(min = -2, max = 3, value = 1, step = 0.1, description = '$\\epsilon$',orientation='horizontal',\n", " layout=Layout(width='380px', height='30px'))\n", "eps.style.handle_color = 'gray'\n", "\n", "kernel = widgets.Dropdown(options=['Gaussian','thin_plate_spline','linear','inverse_quadratic'],value='Gaussian',\n", " description=r'$\\theta$',disabled=False,layout=Layout(width='120px', height='30px'), style=style,continuous_update=False)\n", "\n", "nd = widgets.IntSlider(min=2, max = len(df), value = 5, step = 1, description = '$n$',\n", " orientation='horizontal',layout=Layout(width='380px', height='30px'),continuous_update=False)\n", "nd.style.handle_color = 'gray'\n", "\n", "degree = widgets.IntSlider(min=-1, max = 5, value = 0, step = 1, description = '$\\circ$',\n", " orientation='horizontal',layout=Layout(width='380px', height='30px'),continuous_update=False)\n", "degree.style.handle_color = 'gray'\n", "\n", "smoothing = widgets.IntSlider(min=-1, max = 5, value = 0, step = 1, description = '$\\overline{x}$',\n", " orientation='horizontal',layout=Layout(width='380px', height='30px'),continuous_update=False)\n", "smoothing.style.handle_color = 'gray'\n", "\n", "uipars = widgets.HBox([kernel,eps,nd,degree,smoothing],) # basic widget formatting \n", "\n", "uik = widgets.VBox([l,uipars],)\n", "\n", "def make_model(eps,nd,kernel,degree,smoothing,): # function to take parameters, make sample and plot\n", " \n", " nbins = 10000; depth_min = 0.0; depth_max = 30.0\n", " depth_bins = np.linspace(depth_min, depth_max, nbins) # set the bins for prediction\n", " \n", " if nd < len(df):\n", " X_train, X_test, y_train, y_test = train_test_split(df.iloc[:,[0]], df.iloc[:,[2]], test_size=len(df)-nd, random_state=73074)\n", " else:\n", " X_train = df['Depth']\n", " y_train = df['Porosity']\n", " \n", " RBF_model = RBFInterpolator(X_train['Depth'].values.reshape(-1,1),y_train['Porosity'].values.reshape(-1,1),\n", " neighbors = None, smoothing = smoothing, kernel = kernel,epsilon = eps, degree=degree)\n", " RBF_model_no_poly = RBFInterpolator(X_train['Depth'].values.reshape(-1,1),y_train['Porosity'].values.reshape(-1,1),\n", " neighbors = None, smoothing = 0, kernel = kernel,epsilon = eps, degree=-1)\n", " \n", " pred_Nporosity = RBF_model(depth_bins.reshape(-1,1))\n", " pred_Nporosity_no_poly = RBF_model_no_poly(depth_bins.reshape(-1,1))\n", "\n", " plt.subplot(1,1,1)\n", " plt.plot(depth_bins,pred_Nporosity,'black',linewidth=2,label='RBF')\n", " plt.plot(depth_bins,pred_Nporosity_no_poly,'grey',linewidth=1,ls='--',label='RBF$_{0}$')\n", " plt.plot(X_train['Depth'].values,y_train['Porosity'].values, 'o', markerfacecolor='red', markeredgecolor='black', alpha=1.0, label = \"Train\")\n", " if nd < len(df):\n", " plt.plot(X_test['Depth'].values,y_test['Porosity'].values, 'o', markerfacecolor='blue', markeredgecolor='black', alpha=1.0, label = \"Test\")\n", " \n", " plt.xlim([depth_min,depth_max]); plt.ylim([0,0.2]); plt.xlabel('Depth (m)'); plt.ylabel('Porosity (faction)'); plt.legend(loc='upper right'); plt.grid(True)\n", " plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=0.8, wspace=0.2, hspace=0.2)\n", " plt.show()\n", " \n", " \n", "# connect the function to make the samples and plot to the widgets \n", "interactive_plot = widgets.interactive_output(make_model, {'eps':eps, 'nd':nd, 'kernel':kernel, 'degree':degree,\n", " 'smoothing':smoothing})\n", "interactive_plot.clear_output(wait = True) # reduce flickering by delaying plot updating" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Interactive Radial Basis Function \n", "\n", "* select the kernel, number of data, polynomial trend degree and smoothing and observe the model \n", "\n", "#### Michael Pyrcz, Professor, University of Texas at Austin \n", "\n", "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)\n", "\n", "### The Inputs\n", "\n", "Select the variogram model and the data locations:\n", "\n", "* **$\\theta$**: kernel, **$\\epsilon$**: kernel parameter, **$n$**: number of data, **$\\circ$**: polynomial trend degree (-1 = no trend), **$\\overline{x}$**: smoothing (0 = none)" ] }, { "cell_type": "code", "execution_count": 18, "metadata": {}, "outputs": [ { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "4fe1be0422a94bdb9c1f12f42a1f0639", "version_major": 2, "version_minor": 0 }, "text/plain": [ "VBox(children=(Text(value=' Radial Basis Functions, Michael Pyr…" ] }, "metadata": {}, "output_type": "display_data" }, { "data": { "application/vnd.jupyter.widget-view+json": { "model_id": "3ace045815ef491b82219bc3f1f262ed", "version_major": 2, "version_minor": 0 }, "text/plain": [ "Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '
', 'i…" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "display(uik, interactive_plot) # display the interactive plot" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "#### Comments\n", "\n", "This was an interactive demonstration of radial basis function for spatial data analytics. Much more could be done, I have other demonstrations on the basics of working with DataFrames, ndarrays, univariate statistics, plotting data, declustering, data transformations and many other workflows available at https://github.com/GeostatsGuy/PythonNumericalDemos and https://github.com/GeostatsGuy/GeostatsPy. \n", " \n", "#### The Author:\n", "\n", "### Michael Pyrcz, Associate Professor, University of Texas at Austin \n", "*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*\n", "\n", "With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. \n", "\n", "For more about Michael check out these links:\n", "\n", "#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n", "\n", "#### Want to Work Together?\n", "\n", "I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.\n", "\n", "* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! \n", "\n", "* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!\n", "\n", "* I can be reached at mpyrcz@austin.utexas.edu.\n", "\n", "I'm always happy to discuss,\n", "\n", "*Michael*\n", "\n", "Michael Pyrcz, Ph.D., P.Eng. Associate Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin\n", "\n", "#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) \n", " " ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.4" } }, "nbformat": 4, "nbformat_minor": 2 }