
<p align="center">
    <img src="https://github.com/GeostatsGuy/GeostatsPy/blob/master/TCG_color_logo.png?raw=true" width="220" height="240" />

</p>

## Interactive Demonstration of Machine Learning Norms

#### Michael Pyrcz, Professor, The University of Texas at Austin 

##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)

### Norms, Vector Norms

Here is an interactive workflows demonstrationing the impact of choice on norm on a simple predictive machine learning model, linear regression, that should help you efficiently learn modeling parameter training, central for predictive machine learning.

I have recorded a walk-through of this interactive dashboard in my [Data Science Interactive Python Demonstrations](https://www.youtube.com/playlist?list=PLG19vXLQHvSDy26fM3hDLg3VCU7U5BGZl) series on my [YouTube](https://www.youtube.com/@GeostatsGuyLectures) channel.

* Join me for walk-through of this dashboard [04 Data Science Interactive: Norms](TBD). I'm stoked to guide you and share observations and things to try out!   

* I have a lecture on [Norms](https://www.youtube.com/watch?v=JmxGlrurQp0&list=PLG19vXLQHvSC2ZKFIkgVpI9fCjkN38kwf&index=20) as part of my [Machine Learning](https://www.youtube.com/playlist?list=PLG19vXLQHvSC2ZKFIkgVpI9fCjkN38kwf) course. Note, for all my recorded lecture the interactive and well-documented workflow demononstrations are available on my GitHub repository [GeostatsGuy's Python Numerical Demos](https://github.com/GeostatsGuy/PythonNumericalDemos).

* Also, I have lecture with a summary of [Machine Learning](https://www.youtube.com/watch?v=zOUM_AnI1DQ&list=PLG19vXLQHvSC2ZKFIkgVpI9fCjkN38kwf&index=11).

* Finally, I have lecture predictive machine learning wiwth [Linear Regression](https://www.youtube.com/watch?v=0fzbyhWiP84&list=PLG19vXLQHvSC2ZKFIkgVpI9fCjkN38kwf&index=21).

#### Norms

When we are training a machine learning model, or other statistical model, to training data we calculate the error at all training data, $\Delta y_{\alpha} = \hat{y}_{\alpha} - y_{\alpha}, \ \alpha = 1,. \ldots ,n_{train}$. Yet, for the purpose finding the best set of model parameters we need:

1. to convert the error into a measure of loss, in other words, assign a cost to error
2. summarize the errors over all training data as a single value to support optimization

Here is our vector of errors over all of the training data:

\begin{equation}
\begin{pmatrix} \Delta y_1 \\ \Delta y_2 \\ \Delta y_3 \\ \vdots \\ \Delta y_n \end{pmatrix}
\end{equation}

Firstly, we can convert the error to a loss by adding a power to the errors. We use absolute value to avoid negative loss for odd $p$.

\begin{equation}
\begin{pmatrix} |\Delta y_1|^p \\ |\Delta y_2|^p \\ |\Delta y_3|^p \\ \vdots \\ |\Delta y_n|^p \end{pmatrix}
\end{equation}

where $p$ is the power. The higher the $p$ the greater the sensitivity to large errors, e.g., outliers.

Next we take this vector and we summarize as a single value, known as a **norm**, or as a **vector norm**.

\begin{equation}
\begin{pmatrix} |\Delta y_1|^p \\ |\Delta y_2|^p \\ |\Delta y_3|^p \\ \vdots \\ |\Delta y_n|^p \end{pmatrix} \rightarrow ||\Delta y||_p \quad ||\Delta y||_p = \left( \sum_{\alpha=1}^{n_{train}} | \Delta y_{\alpha} |^p \right)^{\frac{1}{p}}
\end{equation}

such that our norm of our error vector maps to a value $\rightarrow [0,\infty)$.

#### Common Norms, Manhattan, Euclidean and the General p-Norm

These are the common choices for norm.

**Manhattan Norm**, known as the **L1 Norm**, $L^1$, where $p=1$ is defined as:

\begin{equation}
||\Delta y||_1 = \sum_{\alpha=1}^{n_{train}} |\Delta y_{\alpha}| 
\end{equation}

**Euclidean Norm**, known as the **L2 Norm**, $L^2$, where $p=2$ is defined as:

\begin{equation}
||\Delta y||_2 = \sqrt{ \sum_{\alpha=1}^{n_{train}} \left( \Delta y_{\alpha} \right)^2 }
\end{equation}

**p-Norm**, $L^p$, is defined as:

\begin{equation}
||\Delta y||_p = \left( \sum_{\alpha=1}^{n_{train}} | \Delta y_{\alpha} |^p \right)^{\frac{1}{p}}
\end{equation}

I provide more information in my [Norms](https://www.youtube.com/watch?v=JmxGlrurQp0&list=PLG19vXLQHvSC2ZKFIkgVpI9fCjkN38kwf&index=20) lecture, but it is good to mention that there are important differences between norms, e.g., L1 norm and L2 norm.

| L1 Norm | L2 Norm |
| :-: | :-: |
| Robust | Not Very Robust |
| Unstable | Stable |
| Possibly Mulitple Solutions | Always a Single Solution |
| Feature Selection Built-in | No Feature Selection |
| Sparse Outputs | Non-sparse Outputs |
| No Analytics Solutions | Analytical Solutions Possible |

A couple of definitions that will assist with understanding the differences above that you may observe in the interactivity:

* **Robust**: resistant to outliers. 
* **Unstable**: for small changes in training the trained model predictions may ‘jump’
* **Multiple Solutions**: multiple paths same lengths in a city (Manhattan distance)
* **Sparse Output**: model coefficients tend to 0.0.

#### Norm Dashboard

To demonstrate the impact of the choice of norms I wrote a linear regression algorithm that allows us to choose any $p$-norm! Yes, you can actually use fractional norms!

* let's change the norm with and without an outlier and observe the impact on the linear regression model.

#### Getting Started

Here's the steps to get setup in Python with the GeostatsPy package:

1. Install Anaconda 3 on your machine (https://www.anaconda.com/download/). 

That's all!

#### Load the Required Libraries

We will also need some standard Python packages. These should have been installed with Anaconda 3.

In [5]:
%matplotlib inline
import sys                                              # supress output to screen for interactive variogram modeling
import io
import numpy as np                                      # arrays and matrix math
import pandas as pd                                     # DataFrames
import matplotlib.pyplot as plt                         # plotting
from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks
from scipy.optimize import minimize                     # linear regression training by-hand with variable norms
from ipywidgets import interactive                      # widgets and interactivity
from ipywidgets import widgets                            
from ipywidgets import Layout
from ipywidgets import Label
from ipywidgets import VBox, HBox

#### Declare Functions

We have functions to perform linear regression for any norm. The code was modified from [N. Wouda](https://stackoverflow.com/questions/51883058/l1-norm-instead-of-l2-norm-for-cost-function-in-regression-model).
* I modified the original functions for a general p-norm linear regression method

In [6]:
def predict(X, params):                                 # linear prediction
    return X.dot(params)

def loss_function(params, X, y, p):                     # custom p-norm, linear regression cost function
    return np.sum(np.power(np.abs(y - predict(X, params)),p))

def add_grid():
    plt.gca().grid(True, which='major',linewidth = 1.0); plt.gca().grid(True, which='minor',linewidth = 0.2) # add y grids
    plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)
    plt.gca().xaxis.set_minor_locator(AutoMinorLocator()); plt.gca().yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks   

#### Interactive Dashboard

This code designed the interactive dashboard, prediction model and plots

In [7]:
# widgets and dashboard
l = widgets.Text(value='                                       Machine Learning Norms Demonstration, Prof. Michael Pyrcz, The University of Texas at Austin',layout=Layout(width='950px', height='30px'))

p_norm = widgets.FloatSlider(min=0.1, max = 10, value=1.0, step = 0.2, description = '$L^{p}$',orientation='horizontal', style = {'description_width': 'initial'}, continuous_update=False)
n = widgets.IntSlider(min=15, max = 80, value=30, step = 1, description = '$n$',orientation='horizontal', style = {'description_width': 'initial'}, continuous_update=False)
std = widgets.FloatSlider(min=0.0, max = .95, value=0.00, step = 0.05, description = 'Error ($\sigma$)',orientation='horizontal',style = {'description_width': 'initial'}, continuous_update=False)
xn = widgets.FloatSlider(min=0, max = 1.0, value=0.5, step = 0.05, description = '$X_{n+1}$',orientation='horizontal',style = {'description_width': 'initial'}, continuous_update=False)
yn = widgets.FloatSlider(min=0, max = 1.0, value=0.5, step = 0.05, description = '$Y_{n+1}$',orientation='horizontal', style = {'description_width': 'initial'}, continuous_update=False)

ui1 = widgets.HBox([p_norm,n,std],)
ui2 = widgets.HBox([xn,yn],)
ui = widgets.VBox([l,ui1,ui2],)

def run_plot(p_norm,n,std,xn,yn):                       # make data, fit models and plot

    np.random.seed(73073)                               # set random number seed for repeatable results

    X_seq = np.linspace(0,100.0,1000)                   # make data and add noise
    X_seq = np.asarray([np.ones((len(X_seq),)), X_seq]).T
    X = np.random.rand(n)*0.5
    y = X*X + 0.0 # fit a parabola
    y = y + np.random.normal(loc = 0.0,scale=std,size=n) # add noise
    X = np.asarray([np.ones((n,)), X]).T                 # concatenate a vector of 1's for the constant term
    
    X = np.vstack([X,[1,xn]]); y = np.append(y,yn)       # add the user specified data value to X and y
    
    x0 = [0.5,0.5]                                       # initial guess of model parameters
    p = 2.0
    output_l2 = minimize(loss_function, x0, args=(X, y, p)) # train the L2 norm linear regression model
    p = 1.0
    output_l1 = minimize(loss_function, x0, args=(X, y, p)) # train the L1 norm linear regression model
    p = 3.0
    output_l3 = minimize(loss_function, x0, args=(X, y, p)) # train the L3 norm linear regression model
    
    p = p_norm
    output_lcust = minimize(loss_function, x0, args=(X, y, p)) # train the p-norm linear regression model

    y_hat_l1 = predict(X_seq, output_l1.x)               # predict over the range of X for all models
    y_hat_l2 = predict(X_seq, output_l2.x)
    y_hat_l3 = predict(X_seq, output_l3.x)
    y_hat_lcust = predict(X_seq, output_lcust.x)
    
    plt.subplot(111)                                     # plot the results
    plt.scatter(X[:(n-1),1],y[:(n-1)],s=40,facecolor = 'white',edgecolor = 'black',alpha = 1.0,zorder=100)
    plt.scatter(X[n,1],y[n],s=40,marker='x',color = 'black',alpha = 1.0,zorder=100)
    plt.scatter(X[n,1],y[n],s=200,marker='o',lw=1.0,edgecolor = 'black',facecolor = 'white',alpha = 1.0,zorder=98)
    plt.annotate(r'$n+1$',[X[n,1]+0.02,y[n]+0.02])
    plt.plot(X_seq[:,1],y_hat_l1,c = 'blue',lw=7,alpha = 1.0,label = "L1 Norm",zorder=10)
    plt.plot(X_seq[:,1],y_hat_l2,c = 'red',lw=7,alpha = 1.0,label = "L2 Norm",zorder=10)
    plt.plot(X_seq[:,1],y_hat_l3,c = 'green',lw=7,alpha = 1.0,label = "L3 Norm",zorder=10)
    plt.plot(X_seq[:,1],y_hat_lcust,c = 'white',lw=4,alpha = 1.0,zorder=18)
    plt.plot(X_seq[:,1],y_hat_lcust,c = 'black',lw=2,alpha = 1.0,label = "L"+ str(p_norm) + " Norm",zorder=20)
    plt.xlabel(r'Predictor Feature, $X_{1}$'); plt.ylabel(r'Response Feature, $y$'); plt.title('Linear Regression with Variable Norm')
    plt.xlim([0.0,1.0]); plt.ylim([0.0,1.0])
    plt.legend(loc = 'upper left'); add_grid()
    
    plt.subplots_adjust(left=0.0, bottom=0.0, right=1.0, top=1.2, wspace=0.9, hspace=0.3)
    plt.show()
    
# connect the function to make the samples and plot to the widgets    
interactive_plot = widgets.interactive_output(run_plot, {'p_norm':p_norm,'n':n,'std':std,'xn':xn,'yn':yn})
interactive_plot.clear_output(wait = True)               # reduce flickering by delaying plot updating

### Interactive Machine Learning Norms Demonstation 

#### Michael Pyrcz, Professor, The University of Texas at Austin 

Observe the impact of choice of norm with variable number of sample data, the data noise, and an outlier! 

### The Inputs

* **p-norm** - 1 = Manhattan norm, 2 = Euclidean norm, etc., **n** - number of data, **Error** - random error in standard deviations
* **$x_{n+1}$**, **$y_{n+1}$** - x and y location of an additional data value, potentially an outlier

In [8]:
display(ui, interactive_plot)                           # display the interactive plot

VBox(children=(Text(value='                                       Machine Learning Norms Demonstration, Prof. …

Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 640x480 with 1 Axes>', 'i…

#### Comments

This was a basic demonstration of machining learning norms. I have many other demonstrations and even basics of working with DataFrames, ndarrays, univariate statistics, plotting data, declustering, data transformations and many other workflows available at https://github.com/GeostatsGuy/PythonNumericalDemos and https://github.com/GeostatsGuy/GeostatsPy. 
  
#### The Author:

### Michael Pyrcz, Professor, The University of Texas at Austin 
*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*

With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. 

For more about Michael check out these links:

#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)

#### Want to Work Together?

I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.

* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! 

* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!

* I can be reached at mpyrcz@austin.utexas.edu.

I'm always happy to discuss,

*Michael*

Michael Pyrcz, Ph.D., P.Eng. Professor, Cockrell School of Engineering and The Jackson School of Geosciences, The University of Texas at Austin

#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)  