\n",
"\n",
"## Interactive Demonstration of Machine Learning Norms\n",
"\n",
"#### Michael Pyrcz, Professor, The University of Texas at Austin \n",
"\n",
"##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)\n",
"\n",
"### Norms, Vector Norms\n",
"\n",
"Here is an interactive workflows demonstrationing the impact of choice on norm on a simple predictive machine learning model, linear regression, that should help you efficiently learn modeling parameter training, central for predictive machine learning.\n",
"\n",
"I have recorded a walk-through of this interactive dashboard in my [Data Science Interactive Python Demonstrations](https://www.youtube.com/playlist?list=PLG19vXLQHvSDy26fM3hDLg3VCU7U5BGZl) series on my [YouTube](https://www.youtube.com/@GeostatsGuyLectures) channel.\n",
"\n",
"* Join me for walk-through of this dashboard [04 Data Science Interactive: Norms](TBD). I'm stoked to guide you and share observations and things to try out! \n",
"\n",
"* I have a lecture on [Norms](https://www.youtube.com/watch?v=JmxGlrurQp0&list=PLG19vXLQHvSC2ZKFIkgVpI9fCjkN38kwf&index=20) as part of my [Machine Learning](https://www.youtube.com/playlist?list=PLG19vXLQHvSC2ZKFIkgVpI9fCjkN38kwf) course. Note, for all my recorded lecture the interactive and well-documented workflow demononstrations are available on my GitHub repository [GeostatsGuy's Python Numerical Demos](https://github.com/GeostatsGuy/PythonNumericalDemos).\n",
"\n",
"* Also, I have lecture with a summary of [Machine Learning](https://www.youtube.com/watch?v=zOUM_AnI1DQ&list=PLG19vXLQHvSC2ZKFIkgVpI9fCjkN38kwf&index=11).\n",
"\n",
"* Finally, I have lecture predictive machine learning wiwth [Linear Regression](https://www.youtube.com/watch?v=0fzbyhWiP84&list=PLG19vXLQHvSC2ZKFIkgVpI9fCjkN38kwf&index=21).\n",
"\n",
"#### Norms\n",
"\n",
"When we are training a machine learning model, or other statistical model, to training data we calculate the error at all training data, $\\Delta y_{\\alpha} = \\hat{y}_{\\alpha} - y_{\\alpha}, \\ \\alpha = 1,. \\ldots ,n_{train}$. Yet, for the purpose finding the best set of model parameters we need:\n",
"\n",
"1. to convert the error into a measure of loss, in other words, assign a cost to error\n",
"2. summarize the errors over all training data as a single value to support optimization\n",
"\n",
"Here is our vector of errors over all of the training data:\n",
"\n",
"\\begin{equation}\n",
"\\begin{pmatrix} \\Delta y_1 \\\\ \\Delta y_2 \\\\ \\Delta y_3 \\\\ \\vdots \\\\ \\Delta y_n \\end{pmatrix}\n",
"\\end{equation}\n",
"\n",
"Firstly, we can convert the error to a loss by adding a power to the errors. We use absolute value to avoid negative loss for odd $p$.\n",
"\n",
"\\begin{equation}\n",
"\\begin{pmatrix} |\\Delta y_1|^p \\\\ |\\Delta y_2|^p \\\\ |\\Delta y_3|^p \\\\ \\vdots \\\\ |\\Delta y_n|^p \\end{pmatrix}\n",
"\\end{equation}\n",
"\n",
"where $p$ is the power. The higher the $p$ the greater the sensitivity to large errors, e.g., outliers.\n",
"\n",
"Next we take this vector and we summarize as a single value, known as a **norm**, or as a **vector norm**.\n",
"\n",
"\\begin{equation}\n",
"\\begin{pmatrix} |\\Delta y_1|^p \\\\ |\\Delta y_2|^p \\\\ |\\Delta y_3|^p \\\\ \\vdots \\\\ |\\Delta y_n|^p \\end{pmatrix} \\rightarrow ||\\Delta y||_p \\quad ||\\Delta y||_p = \\left( \\sum_{\\alpha=1}^{n_{train}} | \\Delta y_{\\alpha} |^p \\right)^{\\frac{1}{p}}\n",
"\\end{equation}\n",
"\n",
"such that our norm of our error vector maps to a value $\\rightarrow [0,\\infty)$.\n",
"\n",
"#### Common Norms, Manhattan, Euclidean and the General p-Norm\n",
"\n",
"These are the common choices for norm.\n",
"\n",
"**Manhattan Norm**, known as the **L1 Norm**, $L^1$, where $p=1$ is defined as:\n",
"\n",
"\\begin{equation}\n",
"||\\Delta y||_1 = \\sum_{\\alpha=1}^{n_{train}} |\\Delta y_{\\alpha}| \n",
"\\end{equation}\n",
"\n",
"**Euclidean Norm**, known as the **L2 Norm**, $L^2$, where $p=2$ is defined as:\n",
"\n",
"\\begin{equation}\n",
"||\\Delta y||_2 = \\sqrt{ \\sum_{\\alpha=1}^{n_{train}} \\left( \\Delta y_{\\alpha} \\right)^2 }\n",
"\\end{equation}\n",
"\n",
"**p-Norm**, $L^p$, is defined as:\n",
"\n",
"\\begin{equation}\n",
"||\\Delta y||_p = \\left( \\sum_{\\alpha=1}^{n_{train}} | \\Delta y_{\\alpha} |^p \\right)^{\\frac{1}{p}}\n",
"\\end{equation}\n",
"\n",
"I provide more information in my [Norms](https://www.youtube.com/watch?v=JmxGlrurQp0&list=PLG19vXLQHvSC2ZKFIkgVpI9fCjkN38kwf&index=20) lecture, but it is good to mention that there are important differences between norms, e.g., L1 norm and L2 norm.\n",
"\n",
"| L1 Norm | L2 Norm |\n",
"| :-: | :-: |\n",
"| Robust | Not Very Robust |\n",
"| Unstable | Stable |\n",
"| Possibly Mulitple Solutions | Always a Single Solution |\n",
"| Feature Selection Built-in | No Feature Selection |\n",
"| Sparse Outputs | Non-sparse Outputs |\n",
"| No Analytics Solutions | Analytical Solutions Possible |\n",
"\n",
"A couple of definitions that will assist with understanding the differences above that you may observe in the interactivity:\n",
"\n",
"* **Robust**: resistant to outliers. \n",
"* **Unstable**: for small changes in training the trained model predictions may ‘jump’\n",
"* **Multiple Solutions**: multiple paths same lengths in a city (Manhattan distance)\n",
"* **Sparse Output**: model coefficients tend to 0.0.\n",
"\n",
"#### Norm Dashboard\n",
"\n",
"To demonstrate the impact of the choice of norms I wrote a linear regression algorithm that allows us to choose any $p$-norm! Yes, you can actually use fractional norms!\n",
"\n",
"* let's change the norm with and without an outlier and observe the impact on the linear regression model.\n",
"\n",
"#### Getting Started\n",
"\n",
"Here's the steps to get setup in Python with the GeostatsPy package:\n",
"\n",
"1. Install Anaconda 3 on your machine (https://www.anaconda.com/download/). \n",
"\n",
"That's all!\n",
"\n",
"#### Load the Required Libraries\n",
"\n",
"We will also need some standard Python packages. These should have been installed with Anaconda 3."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [],
"source": [
"%matplotlib inline\n",
"import sys # supress output to screen for interactive variogram modeling\n",
"import io\n",
"import numpy as np # arrays and matrix math\n",
"import pandas as pd # DataFrames\n",
"import matplotlib.pyplot as plt # plotting\n",
"from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks\n",
"from scipy.optimize import minimize # linear regression training by-hand with variable norms\n",
"from ipywidgets import interactive # widgets and interactivity\n",
"from ipywidgets import widgets \n",
"from ipywidgets import Layout\n",
"from ipywidgets import Label\n",
"from ipywidgets import VBox, HBox"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Declare Functions\n",
"\n",
"We have functions to perform linear regression for any norm. The code was modified from [N. Wouda](https://stackoverflow.com/questions/51883058/l1-norm-instead-of-l2-norm-for-cost-function-in-regression-model).\n",
"* I modified the original functions for a general p-norm linear regression method"
]
},
{
"cell_type": "code",
"execution_count": 6,
"metadata": {
"scrolled": true
},
"outputs": [],
"source": [
"def predict(X, params): # linear prediction\n",
" return X.dot(params)\n",
"\n",
"def loss_function(params, X, y, p): # custom p-norm, linear regression cost function\n",
" return np.sum(np.power(np.abs(y - predict(X, params)),p))\n",
"\n",
"def add_grid():\n",
" plt.gca().grid(True, which='major',linewidth = 1.0); plt.gca().grid(True, which='minor',linewidth = 0.2) # add y grids\n",
" plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)\n",
" plt.gca().xaxis.set_minor_locator(AutoMinorLocator()); plt.gca().yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Interactive Dashboard\n",
"\n",
"This code designed the interactive dashboard, prediction model and plots"
]
},
{
"cell_type": "code",
"execution_count": 7,
"metadata": {},
"outputs": [],
"source": [
"# widgets and dashboard\n",
"l = widgets.Text(value=' Machine Learning Norms Demonstration, Prof. Michael Pyrcz, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))\n",
"\n",
"p_norm = widgets.FloatSlider(min=0.1, max = 10, value=1.0, step = 0.2, description = '$L^{p}$',orientation='horizontal', style = {'description_width': 'initial'}, continuous_update=False)\n",
"n = widgets.IntSlider(min=15, max = 80, value=30, step = 1, description = '$n$',orientation='horizontal', style = {'description_width': 'initial'}, continuous_update=False)\n",
"std = widgets.FloatSlider(min=0.0, max = .95, value=0.00, step = 0.05, description = 'Error ($\\sigma$)',orientation='horizontal',style = {'description_width': 'initial'}, continuous_update=False)\n",
"xn = widgets.FloatSlider(min=0, max = 1.0, value=0.5, step = 0.05, description = '$X_{n+1}$',orientation='horizontal',style = {'description_width': 'initial'}, continuous_update=False)\n",
"yn = widgets.FloatSlider(min=0, max = 1.0, value=0.5, step = 0.05, description = '$Y_{n+1}$',orientation='horizontal', style = {'description_width': 'initial'}, continuous_update=False)\n",
"\n",
"ui1 = widgets.HBox([p_norm,n,std],)\n",
"ui2 = widgets.HBox([xn,yn],)\n",
"ui = widgets.VBox([l,ui1,ui2],)\n",
"\n",
"def run_plot(p_norm,n,std,xn,yn): # make data, fit models and plot\n",
"\n",
" np.random.seed(73073) # set random number seed for repeatable results\n",
"\n",
" X_seq = np.linspace(0,100.0,1000) # make data and add noise\n",
" X_seq = np.asarray([np.ones((len(X_seq),)), X_seq]).T\n",
" X = np.random.rand(n)*0.5\n",
" y = X*X + 0.0 # fit a parabola\n",
" y = y + np.random.normal(loc = 0.0,scale=std,size=n) # add noise\n",
" X = np.asarray([np.ones((n,)), X]).T # concatenate a vector of 1's for the constant term\n",
" \n",
" X = np.vstack([X,[1,xn]]); y = np.append(y,yn) # add the user specified data value to X and y\n",
" \n",
" x0 = [0.5,0.5] # initial guess of model parameters\n",
" p = 2.0\n",
" output_l2 = minimize(loss_function, x0, args=(X, y, p)) # train the L2 norm linear regression model\n",
" p = 1.0\n",
" output_l1 = minimize(loss_function, x0, args=(X, y, p)) # train the L1 norm linear regression model\n",
" p = 3.0\n",
" output_l3 = minimize(loss_function, x0, args=(X, y, p)) # train the L3 norm linear regression model\n",
" \n",
" p = p_norm\n",
" output_lcust = minimize(loss_function, x0, args=(X, y, p)) # train the p-norm linear regression model\n",
"\n",
" y_hat_l1 = predict(X_seq, output_l1.x) # predict over the range of X for all models\n",
" y_hat_l2 = predict(X_seq, output_l2.x)\n",
" y_hat_l3 = predict(X_seq, output_l3.x)\n",
" y_hat_lcust = predict(X_seq, output_lcust.x)\n",
" \n",
" plt.subplot(111) # plot the results\n",
" plt.scatter(X[:(n-1),1],y[:(n-1)],s=40,facecolor = 'white',edgecolor = 'black',alpha = 1.0,zorder=100)\n",
" plt.scatter(X[n,1],y[n],s=40,marker='x',color = 'black',alpha = 1.0,zorder=100)\n",
" plt.scatter(X[n,1],y[n],s=200,marker='o',lw=1.0,edgecolor = 'black',facecolor = 'white',alpha = 1.0,zorder=98)\n",
" plt.annotate(r'$n+1$',[X[n,1]+0.02,y[n]+0.02])\n",
" plt.plot(X_seq[:,1],y_hat_l1,c = 'blue',lw=7,alpha = 1.0,label = \"L1 Norm\",zorder=10)\n",
" plt.plot(X_seq[:,1],y_hat_l2,c = 'red',lw=7,alpha = 1.0,label = \"L2 Norm\",zorder=10)\n",
" plt.plot(X_seq[:,1],y_hat_l3,c = 'green',lw=7,alpha = 1.0,label = \"L3 Norm\",zorder=10)\n",
" plt.plot(X_seq[:,1],y_hat_lcust,c = 'white',lw=4,alpha = 1.0,zorder=18)\n",
" plt.plot(X_seq[:,1],y_hat_lcust,c = 'black',lw=2,alpha = 1.0,label = \"L\"+ str(p_norm) + \" Norm\",zorder=20)\n",
" plt.xlabel(r'Predictor Feature, $X_{1}$'); plt.ylabel(r'Response Feature, $y$'); plt.title('Linear Regression with Variable Norm')\n",
" plt.xlim([0.0,1.0]); plt.ylim([0.0,1.0])\n",
" plt.legend(loc = 'upper left'); add_grid()\n",
" \n",
" plt.subplots_adjust(left=0.0, bottom=0.0, right=1.0, top=1.2, wspace=0.9, hspace=0.3)\n",
" plt.show()\n",
" \n",
"# connect the function to make the samples and plot to the widgets \n",
"interactive_plot = widgets.interactive_output(run_plot, {'p_norm':p_norm,'n':n,'std':std,'xn':xn,'yn':yn})\n",
"interactive_plot.clear_output(wait = True) # reduce flickering by delaying plot updating"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interactive Machine Learning Norms Demonstation \n",
"\n",
"#### Michael Pyrcz, Professor, The University of Texas at Austin \n",
"\n",
"Observe the impact of choice of norm with variable number of sample data, the data noise, and an outlier! \n",
"\n",
"### The Inputs\n",
"\n",
"* **p-norm** - 1 = Manhattan norm, 2 = Euclidean norm, etc., **n** - number of data, **Error** - random error in standard deviations\n",
"* **$x_{n+1}$**, **$y_{n+1}$** - x and y location of an additional data value, potentially an outlier"
]
},
{
"cell_type": "code",
"execution_count": 8,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "bcc568fddf18475a810f87504430ac93",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(Text(value=' Machine Learning Norms Demonstration, Prof. …"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "1bd5c2acf117461abba7994efec7897e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '', 'i…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display(ui, interactive_plot) # display the interactive plot"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Comments\n",
"\n",
"This was a basic demonstration of machining learning norms. I have many other demonstrations and even basics of working with DataFrames, ndarrays, univariate statistics, plotting data, declustering, data transformations and many other workflows available at https://github.com/GeostatsGuy/PythonNumericalDemos and https://github.com/GeostatsGuy/GeostatsPy. \n",
" \n",
"#### The Author:\n",
"\n",
"### Michael Pyrcz, Professor, The University of Texas at Austin \n",
"*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*\n",
"\n",
"With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. \n",
"\n",
"For more about Michael check out these links:\n",
"\n",
"#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n",
"\n",
"#### Want to Work Together?\n",
"\n",
"I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.\n",
"\n",
"* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! \n",
"\n",
"* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!\n",
"\n",
"* I can be reached at mpyrcz@austin.utexas.edu.\n",
"\n",
"I'm always happy to discuss,\n",
"\n",
"*Michael*\n",
"\n",
"Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin\n",
"\n",
"#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}