{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<p align=\"center\">\n",
    "    <img src=\"https://github.com/GeostatsGuy/GeostatsPy/blob/master/TCG_color_logo.png?raw=true\" width=\"220\" height=\"240\" />\n",
    "\n",
    "</p>\n",
    "\n",
    "## Interactive Trend Modeling for Spatial Estimation Demonstration\n",
    "\n",
    "\n",
    "### Michael Pyrcz, Associate Professor, University of Texas at Austin \n",
    "\n",
    "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n",
    "\n",
    "\n",
    "### The Interactive Workflow\n",
    "\n",
    "Here's a interactive demonstration of:\n",
    "\n",
    "* trend fitting\n",
    "* spatial estimation with a trend model vs. a stationary mean\n",
    "\n",
    "#### Trend Modeling\n",
    "\n",
    "Trend modeling is the modeling of local features, based on data and interpretation, that are deemed certain (known).  The trend is substracted from the data, leaving a residual that is modeled stochastically with uncertainty (treated as unknown).\n",
    "\n",
    "* geostatistical spatial estimation methods will make an assumption concerning stationarity\n",
    "    * in the presence of significant nonstationarity we can not rely on spatial estimates based on data + spatial continuity model\n",
    "* if we observe a trend, we should model the trend.\n",
    "    * then model the residuals stochastically\n",
    "\n",
    "Steps: \n",
    "\n",
    "1. model trend consistent with data and intepretation at all locations within the area of itnerest, integrate all available information and expertise.\n",
    "\n",
    "\\begin{equation}\n",
    "m(\\bf{u}_\\beta), \\, \\forall \\, \\beta \\in \\, AOI\n",
    "\\end{equation}\n",
    "\n",
    "2. substract trend from data at the $n$ data locations to formulate a residual at the data locations.\n",
    "\n",
    "\\begin{equation}\n",
    "y(\\bf{u}_{\\alpha}) = z(\\bf{u}_{\\alpha}) - m(\\bf{u}_{\\alpha}), \\, \\forall \\, \\alpha = 1, \\ldots, n\n",
    "\\end{equation}\n",
    "\n",
    "3. characterize the statistical behavoir of the residual $y(\\bf{u}_{\\alpha})$ integrating any information sources and interpretations.  For example the global cumulative distribution function and a measure of spatial continuity shown here.\n",
    "\n",
    "\\begin{equation}\n",
    "F_y(y) \\quad \\gamma_y(\\bf{h})\n",
    "\\end{equation}\n",
    "\n",
    "4. model the residual at all locations with $L$ multiple realizations.\n",
    "\n",
    "\\begin{equation}\n",
    "Y^\\ell(\\bf{u}_\\beta),  \\, \\forall \\, \\beta \\, \\in \\, AOI; \\, \\ell = 1, \\ldots, L\n",
    "\\end{equation}\n",
    "\n",
    "5. add the trend back in to the stochastic residual realizations to calculate the multiple realizations, $L$, of the property of interest based on the composite model of known deterministic trend, $m(\\bf{u}_\\alpha)$ and unknown stochastic residual, $y(\\bf{u}_\\alpha)$ \n",
    "\n",
    "\\begin{equation}\n",
    "Z^\\ell(\\bf{u}_\\beta) = Y^\\ell(\\bf{u}_\\beta) + m(\\bf{u}_\\beta),  \\, \\forall \\, \\beta \\in \\, AOI; \\, \\ell = 1, \\ldots, L\n",
    "\\end{equation}\n",
    "\n",
    "6. check the model, including quantification of the proportion of variance treated as known (trend) and unknown (residual).\n",
    "\n",
    "\\begin{equation}\n",
    "\\sigma^2_{Z} = \\sigma^2_{Y} + \\sigma^2_{m} + 2 \\cdot C_{Y,m}\n",
    "\\end{equation}\n",
    "\n",
    "given $C_{Y,m} \\to 0$:\n",
    "\n",
    "\\begin{equation}\n",
    "\\sigma^2_{Z} = \\sigma^2_{Y} + \\sigma^2_{m}\n",
    "\\end{equation}\n",
    "\n",
    "I can now describe the proportion of variance allocated to known and unknown components as follows:\n",
    "\n",
    "\\begin{equation}\n",
    "Prop_{Known} = \\frac{\\sigma^2_{m}}{\\sigma^2_{Y} + \\sigma^2_{m}} \\quad Prop_{Unknown} = \\frac{\\sigma^2_{Y}}{\\sigma^2_{Y} + \\sigma^2_{m}}\n",
    "\\end{equation}\n",
    "\n",
    "I provide some practical, data-driven methods for trend model, but I should indicate that:\n",
    "\n",
    "1. trend modeling is very important in reservoir modeling as it has large impact on local model accuracy and on the undertainty model\n",
    "2. trend modeling is used in almost every subsurface model, unless the data is dense enough to impose local trends\n",
    "3. trend modeling includes a high degree of expert judgement combined with the integration of various information sources\n",
    "\n",
    "We limit ourselves to simple data-driven methods, but acknowledge much more is needed. In fact, trend modeling requires a high degree of knowledge concerning local geoscience and engineering data and knowledge. \n",
    "\n",
    "Now I add some additional details on spatial estimation, kriging and variograms to support use of this interactive demonstration. See my lectures on these topics at [My YouTube Channel](www.youtube.com/GeostatsGuyLectures).\n",
    "\n",
    "#### Spatial Estimation\n",
    "\n",
    "Consider the case of making an estimate at some unsampled location, $𝑧(\\bf{u}_0)$, where $z$ is the property of interest (e.g. porosity etc.) and $𝐮_0$ is a location vector describing the unsampled location.\n",
    "\n",
    "How would you do this given data, $𝑧(\\bf{𝐮}_1)$, $𝑧(\\bf{𝐮}_2)$, and $𝑧(\\bf{𝐮}_3)$?\n",
    "\n",
    "It would be natural to use a set of linear weights to formulate the estimator given the available data.\n",
    "\n",
    "\\begin{equation}\n",
    "z^{*}(\\bf{u}) = \\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} z(\\bf{u}_{\\alpha})\n",
    "\\end{equation}\n",
    "\n",
    "We could add an unbiasedness constraint to impose the sum of the weights equal to one.  What we will do is assign the remainder of the weight (one minus the sum of weights) to the global average; therefore, if we have no informative data we will estimate with the global average of the property of interest.\n",
    "\n",
    "\\begin{equation}\n",
    "z^{*}(\\bf{u}) = \\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} z(\\bf{u}_{\\alpha}) + \\left(1-\\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} \\right) \\overline{z}\n",
    "\\end{equation}\n",
    "\n",
    "We will make a stationarity assumption, so let's assume that we are working with residuals, $y$. \n",
    "\n",
    "\\begin{equation}\n",
    "y^{*}(\\bf{u}) = z^{*}(\\bf{u}) - \\overline{z}(\\bf{u})\n",
    "\\end{equation}\n",
    "\n",
    "If we substitute this form into our estimator the estimator simplifies, since the mean of the residual is zero.\n",
    "\n",
    "\\begin{equation}\n",
    "y^{*}(\\bf{u}) = \\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} y(\\bf{u}_{\\alpha})\n",
    "\\end{equation}\n",
    "\n",
    "while satisfying the unbaisedness constraint.  \n",
    "\n",
    "#### Kriging\n",
    "\n",
    "Now the next question is what weights should we use?  \n",
    "\n",
    "We could use equal weighting, $\\lambda = \\frac{1}{n}$, and the estimator would be the average of the local data applied for the spatial estimate. This would not be very informative.\n",
    "\n",
    "We could assign weights considering the spatial context of the data and the estimate:\n",
    "\n",
    "* **spatial continuity** as quantified by the variogram (and covariance function)\n",
    "* **redundancy** the degree of spatial continuity between all of the available data with themselves \n",
    "* **closeness** the degree of spatial continuity between the avaiable data and the estimation location\n",
    "\n",
    "The kriging approach accomplishes this, calculating the best linear unbiased weights for the local data to estimate at the unknown location.  The derivation of the kriging system and the resulting linear set of equations is available in the lecture notes.  Furthermore kriging provides a measure of the accuracy of the estimate!  This is the kriging estimation variance (sometimes just called the kriging variance).\n",
    "\n",
    "\\begin{equation}\n",
    "\\sigma^{2}_{E}(\\bf{u}) = C(0) - \\sum^{n}_{\\alpha = 1} \\lambda_{\\alpha} C(\\bf{u}_0 - \\bf{u}_{\\alpha})\n",
    "\\end{equation}\n",
    "\n",
    "What is 'best' about this estimate? Kriging estimates are best in that they minimize the above estimation variance. \n",
    "\n",
    "#### Properties of Kriging\n",
    "\n",
    "Here are some important properties of kriging:\n",
    "\n",
    "* **Exact interpolator** - kriging estimates with the data values at the data locations\n",
    "* **Kriging variance** can be calculated before getting the sample information, as the kriging estimation variance is not dependent on the values of the data nor the kriging estimate, i.e. the kriging estimator is homoscedastic. \n",
    "* **Spatial context** - kriging takes into account, furthermore to the statements on spatial continuity, closeness and redundancy we can state that kriging accounts for the configuration of the data and structural continuity of the variable being estimated.\n",
    "* **Scale** - kriging may be generalized to account for the support volume of the data and estimate. We will cover this later.\n",
    "* **Multivariate** - kriging may be generalized to account for multiple secondary data in the spatial estimate with the cokriging system. We will cover this later.\n",
    "* **Smoothing effect** of kriging can be forecast. We will use this to build stochastic simulations later.\n",
    "\n",
    "#### Spatial Continuity \n",
    "\n",
    "**Spatial Continuity** is the correlation between values over distance.\n",
    "\n",
    "* No spatial continuity – no correlation between values over distance, random values at each location in space regardless of separation distance.\n",
    "\n",
    "* Homogenous phenomenon have perfect spatial continuity, since all values as the same (or very similar) they are correlated. \n",
    "\n",
    "We need a statistic to quantify spatial continuity! A convenient method is the Semivariogram.\n",
    "\n",
    "#### The Semivariogram\n",
    "\n",
    "Function of difference over distance.\n",
    "\n",
    "* The expected (average) squared difference between values separated by a lag distance vector (distance and direction), $h$:\n",
    "\n",
    "\\begin{equation}\n",
    "\\gamma(\\bf{h}) = \\frac{1}{2 N(\\bf{h})} \\sum^{N(\\bf{h})}_{\\alpha=1} (z(\\bf{u}_\\alpha) - z(\\bf{u}_\\alpha + \\bf{h}))^2  \n",
    "\\end{equation}\n",
    "\n",
    "where $z(\\bf{u}_\\alpha)$ and $z(\\bf{u}_\\alpha + \\bf{h})$ are the spatial sample values at tail and head locations of the lag vector respectively.\n",
    "\n",
    "* Calculated over a suite of lag distances to obtain a continuous function.\n",
    "\n",
    "* the $\\frac{1}{2}$ term converts a variogram into a semivariogram, but in practice the term variogram is used instead of semivariogram.\n",
    "* We prefer the semivariogram because it relates directly to the covariance function, $C_x(\\bf{h})$ and univariate variance, $\\sigma^2_x$:\n",
    "\n",
    "\\begin{equation}\n",
    "C_x(\\bf{h}) = \\sigma^2_x - \\gamma(\\bf{h})\n",
    "\\end{equation}\n",
    "\n",
    "Note the correlogram is related to the covariance function as:\n",
    "\n",
    "\\begin{equation}\n",
    "\\rho_x(\\bf{h}) = \\frac{C_x(\\bf{h})}{\\sigma^2_x}\n",
    "\\end{equation}\n",
    "\n",
    "The correlogram provides of function of the $\\bf{h}-\\bf{h}$ scatter plot correlation vs. lag offset $\\bf{h}$.  \n",
    "\n",
    "\\begin{equation}\n",
    "-1.0 \\le \\rho_x(\\bf{h}) \\le 1.0\n",
    "\\end{equation}   \n",
    "\n",
    "#### Getting Started\n",
    "\n",
    "Here's the steps to get setup in Python with the GeostatsPy package:\n",
    "\n",
    "1. Install Anaconda 3 on your machine (https://www.anaconda.com/download/). \n",
    "2. From Anaconda Navigator (within Anaconda3 group), go to the environment tab, click on base (root) green arrow and open a terminal. \n",
    "3. In the terminal type: pip install geostatspy. \n",
    "4. Open Jupyter and in the top block get started by copy and pasting the code block below from this Jupyter Notebook to start using the geostatspy functionality. \n",
    "\n",
    "There are examples below with these functions. You can go here to see a list of the available functions, https://git.io/fh4eX, other example workflows and source code. \n",
    "\n",
    "#### Load the required libraries\n",
    "\n",
    "The following code loads the required libraries."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "import os                                                   # to set current working directory \n",
    "import numpy as np                                          # arrays and matrix math\n",
    "import scipy                                                # hermite polynomials\n",
    "import scipy.stats as st                                    # statistical methods\n",
    "import pandas as pd                                         # DataFrames\n",
    "import matplotlib.pyplot as plt                             # for plotting\n",
    "from astropy.convolution import Gaussian1DKernel            # 1D kernel\n",
    "from astropy.convolution import convolve                    # sparse data convolution\n",
    "import geostatspy.geostats as geostats                      # variogram calculation\n",
    "import geostatspy.GSLIB as GSLIB                            # variogram models\n",
    "from ipywidgets import interactive                          # widgets and interactivity\n",
    "from ipywidgets import widgets                            \n",
    "from ipywidgets import Layout\n",
    "from ipywidgets import Label\n",
    "from ipywidgets import VBox, HBox\n",
    "import numpy.linalg as linalg                               # for linear algebra\n",
    "import scipy.spatial as sp                                  # for fast nearest neighbor search\n",
    "from numba import jit                                       # for numerical speed up\n",
    "import math                                                 # for trig functions etc."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Declare the required functions\n",
    "\n",
    "All required functions have been added to the [GeostatsPy](https://pypi.org/project/geostatspy/) package."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Set the working directory\n",
    "\n",
    "I always like to do this so I don't lose files and to simplify subsequent read and writes (avoid including the full address each time)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "#os.chdir(\"d:/PGE337\")                                      # set the working directory"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Loading Tabular Data and Set Up the Estimation Grid\n",
    "\n",
    "Here's the command to load our comma delimited data file in to a Pandas' DataFrame object. I also make exhaustive estimation grids and assign the data to the grid."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "#df = pd.read_csv(\"1D_Porosity.csv\")                        # read a .csv file in as a DataFrame  \n",
    "df = pd.read_csv(r\"https://raw.githubusercontent.com/GeostatsGuy/GeoDataSets/master/1D_Porosity.csv\") # load the data from Dr. Pyrcz's github repository\n",
    "df.head()                                                   # preview the data\n",
    "nc = 201; csiz = 10 / (nc-1); cmn = 0\n",
    "depth_values = np.linspace(0.0,10.0,nc)\n",
    "Npor_resid = np.zeros(len(df)); Npor_trend = np.zeros(len(df))\n",
    "Npor_values = np.full(nc,np.NaN)\n",
    "for idata, Npor in enumerate(df['Nporosity']):\n",
    "    idepth =  min(int((df['Depth'].values[idata] - cmn) / csiz), nc - 1)\n",
    "    Npor_values[idepth] = Npor  \n",
    "Y = np.full(len(df),cmn)\n",
    "df['Y'] = Y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Interactive Trend Modeling Dashboard\n",
    "\n",
    "Let's build an interactive display to fit a trend model and provide diagnostic plots.\n",
    "\n",
    "* data and trend model together\n",
    "* variograms of data, trend and residual\n",
    "* variance components of trend, residual and covariance"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {},
   "outputs": [],
   "source": [
    "import warnings; warnings.simplefilter('ignore')\n",
    "\n",
    "# dashboard: number of simulation locations and variogram parameters\n",
    "style = {'description_width': 'initial'}\n",
    "l = widgets.Text(value='                                              Trend Modeling, Michael Pyrcz, Associate Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))\n",
    "nwin = widgets.IntSlider(min = 1, max = 101, value = 5, step = 2, description = 'Trend Moving Window Size in Cells',orientation='horizontal',\n",
    "                          layout=Layout(width='800px', height='40px'),continuous_update = False,style = style)\n",
    "\n",
    "uik = widgets.VBox([l,nwin],)\n",
    "\n",
    "def f_make_trend(nwin): # function to take parameters, make sample and plot\n",
    "\n",
    "    gauss_1D_kernel = Gaussian1DKernel(nwin)                        # calculate trend model\n",
    "    trend = convolve(Npor_values, gauss_1D_kernel,fill_value=0.0)\n",
    "\n",
    "    for idata, Npor in enumerate(df['Nporosity']):                  # calculate residuals at data\n",
    "        idepth =  min(int((df['Depth'].values[idata] - cmn) / csiz), nc - 1)\n",
    "        Npor_resid[idata] = df['Nporosity'].values[idata] - trend[idepth]\n",
    "        Npor_trend[idata] = trend[idepth]\n",
    "        \n",
    "    df['Nporosity_resid'] = Npor_resid\n",
    "    df['Nporosity_trend'] = Npor_trend\n",
    "    \n",
    "    var_trend = np.var(df['Nporosity_trend'].values); var_resid = np.var(df['Nporosity_resid'].values)\n",
    "    cov_tr = np.cov(df['Nporosity_trend'].values,df['Nporosity_resid'].values)[1,0]\n",
    "    var_total = var_trend + var_resid + 2* cov_tr\n",
    "    \n",
    "    ptrend = var_trend / var_total; presid = var_resid / var_total; pcov = 2*cov_tr / var_total\n",
    "     \n",
    "    plt.subplot(131)                                                # plot dataset and trend\n",
    "    plt.scatter(df['Depth'],df['Nporosity'], label='Data', color = 'red', alpha = 1.0, edgecolor = 'black')\n",
    "    plt.plot(depth_values,trend, label='Trend Model', color = 'black', alpha = 0.8)\n",
    "    for idata, Npor in enumerate(df['Nporosity']):\n",
    "        plt.plot([df['Depth'].values[idata],df['Depth'].values[idata]],[df['Nporosity_trend'].values[idata],Npor],alpha=0.4,color='blue')\n",
    "          \n",
    "    plt.title('1D Dataset and Trend Model')\n",
    "    plt.xlabel('Z (m)'); plt.ylabel('NPorosity'); plt.xlim([0,10]); plt.ylim([-2.5,2.5])\n",
    "    plt.legend(); plt.grid()\n",
    "    \n",
    "    plt.subplot(132)                                                # plot variograms\n",
    "    nlags = 20\n",
    "    lags,var,npairs = geostats.gam_1D(df['Nporosity'].values,-999,999,0.25,1,nlags,0)\n",
    "    plt.scatter(lags,var,color='red',edgecolor='black',alpha=0.4,label='Data');\n",
    "    trend_lags,trend_var,trend_npairs = geostats.gam_1D(trend,-999,999,csiz,1,nlags*10,0)\n",
    "    plt.scatter(trend_lags,trend_var,color='black',edgecolor='black',alpha=0.4,label='Trend Model');\n",
    "    resid_lags,resid_var,resid_npairs = geostats.gam_1D(df['Nporosity_resid'].values,-999,999,0.25,1,nlags,0)\n",
    "    plt.scatter(resid_lags,resid_var,color='blue',edgecolor='black',alpha=0.4,label='Residual');\n",
    "    \n",
    "    plt.plot([0,1500],[1.0,1.0],color='red',linestyle = \"--\"); plt.xlim([0,nlags*0.25]); plt.ylim([0,2.0]); plt.xlabel('Lag Distance ($h$)'); plt.ylabel('$\\gamma$')\n",
    "    plt.plot([0,1500],[var_trend,var_trend],color='black',linestyle = \"--\"); plt.xlim([0,nlags*0.25]); plt.ylim([0,2.0]); plt.xlabel('Lag Distance ($h$)'); plt.ylabel('$\\gamma$')\n",
    "    plt.plot([0,1500],[var_resid,var_resid],color='blue',linestyle = \"--\"); plt.xlim([0,nlags*0.25]); plt.ylim([0,2.0]); plt.xlabel('Lag Distance ($h$)'); plt.ylabel('$\\gamma$')\n",
    "    plt.legend(loc='upper right'); plt.title('Data, Trend and Residual Variograms')\n",
    "    \n",
    "    plt.subplot(133)                                                # plot the pie chart \n",
    "    labels = ['Trend','Residual', 'Covariance']\n",
    "    plt.pie([ptrend, presid, pcov],radius = 1, autopct='%1.1f%%', \n",
    "            colors = ['#808080','#0066CD','#FFFF00'], explode = [.02,.02,0.02],wedgeprops = {\"edgecolor\":\"k\",'linewidth': 1})\n",
    "    plt.title('Variance Decomposition of Trend and Residual')\n",
    "    plt.legend(labels,loc='lower left')\n",
    "    \n",
    "    plt.subplots_adjust(left=0.0, bottom=0.0, right=3.0, top=1.2, wspace=0.2, hspace=0.2)\n",
    "    plt.show()\n",
    "    \n",
    "# connect the function to make the samples and plot to the widgets    \n",
    "interactive_plot = widgets.interactive_output(f_make_trend, {'nwin':nwin,})\n",
    "interactive_plot.clear_output(wait = True)               # reduce flickering by delaying plot updating"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Interactive Trend Fitting Demonstration\n",
    "\n",
    "* select the width of the convolution window and evaluate the resulting trend model \n",
    "\n",
    "#### Michael Pyrcz, Associate Professor, University of Texas at Austin \n",
    "\n",
    "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)\n",
    "\n",
    "### The Inputs\n",
    "\n",
    "Select the variogram model and the data locations:\n",
    "\n",
    "* **Trend Moving Window Size in Cells**: the size of the Gaussian kernel for the convolution-based trend model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "641124f676ac4fd386300ce82bc1d16c",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(Text(value='                                              Trend Modeling, Michael Pyrcz, Associ…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a3a51c25b1334bd192a33678141ebe29",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(uik, interactive_plot)                            # display the interactive plot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Reload the Data and Add Gaps\n",
    "\n",
    "To explore the impact of interpolation and extrapolation in spaital estimation, let's remove data to creat gaps in data coverage."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [],
   "source": [
    "#df2 = pd.read_csv(\"1D_Porosity.csv\")                        # read a .csv file in as a DataFrame  \n",
    "df2 = pd.read_csv(r\"https://raw.githubusercontent.com/GeostatsGuy/GeoDataSets/master/1D_Porosity.csv\") # load the data from Dr. Pyrcz's github repository\n",
    "df2 = df2.drop(labels = range(3,9), axis = 0); df2 = df2.drop(labels = range(20,27), axis = 0)\n",
    "df2 = df2.sample(frac = 0.7,random_state = 13)\n",
    "nc = 201; csiz = 10 / (nc-1); cmn = 0\n",
    "\n",
    "depth_values2 = np.linspace(0.0,10.0,nc)\n",
    "Npor_resid2 = np.zeros(len(df2)); Npor_trend2 = np.zeros(len(df2))\n",
    "Npor_values2 = np.full(nc,np.NaN)\n",
    "for idata, Npor in enumerate(df2['Nporosity']):\n",
    "    idepth =  min(int((df2['Depth'].values[idata] - cmn) / csiz), nc - 1)\n",
    "    Npor_values2[idepth] = Npor\n",
    "    \n",
    "Y = np.full(len(df2),cmn)\n",
    "df2['Y'] = Y"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Interactive Spatial Estimation Without and With Trend Dashboard\n",
    "\n",
    "Let's build an interactive display to fit a trend model and calculate spatial estimates."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [],
   "source": [
    "from IPython.utils import io\n",
    "import warnings; warnings.simplefilter('ignore')\n",
    "\n",
    "# dashboard: number of simulation locations and variogram parameters\n",
    "style = {'description_width': 'initial'}\n",
    "l = widgets.Text(value='                                              Trend Modeling for Spatial Estimation, Michael Pyrcz, Associate Professor, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))\n",
    "nwin = widgets.IntSlider(min = 1, max = 101, value = 5, step = 2, description = 'Trend Moving Window Size in Cells',orientation='horizontal',\n",
    "                          layout=Layout(width='800px', height='40px'),continuous_update = False,style = style)\n",
    "\n",
    "vrange = widgets.IntSlider(min = 1, max = 100, value = 5, step = 1, description = 'Variogram Range in Cells',orientation='horizontal',\n",
    "                          layout=Layout(width='800px', height='40px'),continuous_update = False,style = style)\n",
    "\n",
    "uika = widgets.HBox([nwin,vrange],)\n",
    "uik = widgets.VBox([l,uika],)\n",
    "\n",
    "def f_make_model(nwin,vrange): # function to take parameters, make sample and plot\n",
    "    gauss_1D_kernel = Gaussian1DKernel(nwin)                        # calculate trend model\n",
    "    trend2 = convolve(Npor_values, gauss_1D_kernel)\n",
    "    for idata, Npor in enumerate(df2['Nporosity']):                  # calculate residuals at data\n",
    "        idepth =  min(int((df2['Depth'].values[idata] - cmn) / csiz), nc - 1)\n",
    "        Npor_resid2[idata] = df2['Nporosity'].values[idata] - trend2[idepth]\n",
    "        Npor_trend2[idata] = trend2[idepth]\n",
    "            \n",
    "    df2['Nporosity_resid'] = Npor_resid2\n",
    "    df2['Nporosity_trend'] = Npor_trend2\n",
    "    \n",
    "    vario = GSLIB.make_variogram(nug=0.0,nst=1,it1=1,cc1=1.0,azi1=0,hmaj1=vrange*csiz,hmin1=vrange*csiz)\n",
    "    with io.capture_output() as captured:                          # supress function print output\n",
    "        est, kvar = geostats.kb2d(df2,'Depth','Y','Nporosity_resid',-999.9,999.9,nc,cmn,csiz,1,cmn,csiz,1,1,0,10,10,0,0,vario)\n",
    "    total_est = est + trend2\n",
    "    \n",
    "    with io.capture_output() as captured:                          # supress function print output\n",
    "        no_trend_est, no_trend_kvar = geostats.kb2d(df2,'Depth','Y','Nporosity',-999.9,999.9,nc,cmn,csiz,1,cmn,csiz,1,1,0,10,10,0,0,vario)\n",
    "    \n",
    "    plt.subplot(121)\n",
    "    plt.plot(depth_values2,no_trend_est[0], color = \"Red\", alpha = 0.4, label = \"Estimate with Stationary Mean\")\n",
    "    plt.scatter(df2['Depth'],df2['Nporosity'], label='Data', color = 'red', alpha = 1.0, edgecolor = 'black')\n",
    "    plt.title('Simple Kriging with Stationary Mean')\n",
    "    plt.xlabel('Z (m)'); plt.ylabel('Standardized Porosity'); plt.xlim([0,10]); plt.ylim([-2.5,2.5])\n",
    "    plt.legend(); plt.grid()\n",
    "    \n",
    "    plt.subplot(122)\n",
    "    plt.plot(depth_values2,total_est[0],color = \"red\",alpha = 0.4, label = \"Estimate with Trend Model\")\n",
    "    plt.plot(depth_values2,trend2, color = \"black\",alpha = 1.0,linestyle = \"--\", label = \"Trend\")\n",
    "    plt.scatter(df2['Depth'],df2['Nporosity'], label='Data', color = 'red', alpha = 1.0, edgecolor = 'black')\n",
    "    plt.title('Simple Kriging with Trend Model and Residual')\n",
    "    plt.xlabel('Z (m)'); plt.ylabel('Standardized Porosity'); plt.xlim([0,10]); plt.ylim([-2.5,2.5])\n",
    "    plt.legend(); plt.grid()\n",
    "    \n",
    "    plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=1.2, wspace=0.2, hspace=0.2)\n",
    "    plt.show()\n",
    "    \n",
    "# connect the function to make the samples and plot to the widgets    \n",
    "interactive_plot2 = widgets.interactive_output(f_make_model, {'nwin':nwin,'vrange':vrange,})\n",
    "interactive_plot2.clear_output(wait = True)               # reduce flickering by delaying plot updating"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Interactive Spatial Estimation Without and With Trend Demostration\n",
    "\n",
    "* select the width of the convolution window and variogram range and evaluate the kriging estimates with stationary mean and trend model\n",
    "\n",
    "#### Michael Pyrcz, Associate Professor, University of Texas at Austin \n",
    "\n",
    "##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)\n",
    "\n",
    "### The Inputs\n",
    "\n",
    "Select the variogram model and the data locations:\n",
    "\n",
    "* **Trend Moving Window Size in Cells**: the size of the Gaussian kernel for the convolution-based trend model.\n",
    "* **Variogram Range in Cells**: the range of the spherical variogram model in grid cells."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "05ef142aedbb4172969e1291531b65a1",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "VBox(children=(Text(value='                                              Trend Modeling for Spatial Estimation…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    },
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "b5166d6edcfa4737a26d8cda74b17f08",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "Output()"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "display(uik, interactive_plot2)                            # display the interactive plot"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "#### Comments\n",
    "\n",
    "This was an interactive demonstration of trend modeling with spatial prediction for spatial data analytics. Much more could be done, I have other demonstrations on the basics of working with DataFrames, ndarrays, univariate statistics, plotting data, declustering, data transformations and many other workflows available at https://github.com/GeostatsGuy/PythonNumericalDemos and https://github.com/GeostatsGuy/GeostatsPy. \n",
    "  \n",
    "#### The Author:\n",
    "\n",
    "### Michael Pyrcz, Associate Professor, University of Texas at Austin \n",
    "*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*\n",
    "\n",
    "With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. \n",
    "\n",
    "For more about Michael check out these links:\n",
    "\n",
    "#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n",
    "\n",
    "#### Want to Work Together?\n",
    "\n",
    "I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.\n",
    "\n",
    "* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! \n",
    "\n",
    "* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!\n",
    "\n",
    "* I can be reached at mpyrcz@austin.utexas.edu.\n",
    "\n",
    "I'm always happy to discuss,\n",
    "\n",
    "*Michael*\n",
    "\n",
    "Michael Pyrcz, Ph.D., P.Eng. Associate Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin\n",
    "\n",
    "#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)  \n",
    "  "
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": []
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": "Python 3 (ipykernel)",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.11.4"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}