\n",
"\n",
"## Interactive Linear Model Hypeparameter Tuning, Ridge and LASSO Regression\n",
"\n",
"#### Michael J. Pyrcz, Professor, The University of Texas at Austin \n",
"\n",
"##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Here's a set of interactive dashboards to explore the use of a hyperparameter to tune linear machine learning models. The hyperparameter controls the degree of fit to the data, known as model regularization. We will visualize this for both ridge and LASSO regression. \n",
"\n",
"To assist you with background content, I have a lectures available with linked interactive codes and well-documented workflows in Python:\n",
"\n",
"* [Linear Regression](https://youtu.be/0fzbyhWiP84) \n",
"* [Ridge Regression](https://youtu.be/pMGO40yXZ5Y?si=FwAFWWSqd10SV19h) \n",
"* [LASSO Regression](https://youtu.be/cVFYhlCCI_8?si=IoAmCEKGvzlGULON)\n",
"\n",
"I also have lectures on:\n",
"\n",
"* [Machine Learning Basics](https://youtu.be/zOUM_AnI1DQ?si=L1FxPRc-n9y8Yuk6)\n",
"* [Machine Learning Model Generalization and Overfit](https://youtu.be/GGoNTMrCBbk?si=itx1p3G6PG7witpe)\n",
"* [Machine Learning Model Norms](https://youtu.be/JmxGlrurQp0?si=FPC7-Et66bWRMhFl)\n",
" \n",
"these are all part of my [Machine Learning](https://www.youtube.com/playlist?list=PLG19vXLQHvSC2ZKFIkgVpI9fCjkN38kwf) course. Note, all recorded lectures, interactive and well-documented workflow demononstrations are available on my GitHub repository [GeostatsGuy's Python Numerical Demos](https://github.com/GeostatsGuy/PythonNumericalDemos). \n",
"\n",
"#### Linear Regression\n",
"\n",
"Linear regression for prediction. Here are some key aspects of linear regression:\n",
"\n",
"**Parametric Model**\n",
"\n",
"* the fit model is a simple weighted linear additive model based on all the available features, $x_1,\\ldots,x_m$.\n",
"\n",
"* the general form of the multilinear regression model takes the form of $y = \\sum_{\\alpha = 1}^m b_{\\alpha} X_{\\alpha} + b_0$\n",
"\n",
"* the specific form of the linear regression model takes the form $y = b_1 x + b_0$\n",
"\n",
"**Least Squares**\n",
"\n",
"* least squares optimization is applied to select the model parameters, $b_1,\\ldots,b_m,b_0$ \n",
"\n",
"* we minize the error over the trainind data $\\sum_{i=1}^n (y_i - (\\sum_{\\alpha = 1}^m b_{\\alpha} x_{\\alpha} + b_0))^2$\n",
"\n",
"* this could be simplified as the sum of square error over the training data, $\\sum_{i=1}^n (\\Delta y_i)^2$\n",
"\n",
"**Assumptions**\n",
"\n",
"* **Error-free** - predictor variables are error free, not random variables \n",
"* **Linearity** - response is linear combination of feature(s)\n",
"* **Constant Variance** - error in response is constant over predictor(s) value\n",
"* **Independence of Error** - error in response are uncorrelated with each other\n",
"* **No multicollinearity** - none of the features are redundant with other features \n",
"\n",
"#### Ridge Regression\n",
"\n",
"With ridge regression we add a hyperparameter, $\\lambda$, to our minimization, with a L2 shrinkage penalty term, $\\sum_{j=1}^m b_{\\alpha}^2$.\n",
"\n",
"\\begin{equation}\n",
"\\sum_{i=1}^n (y_i - (\\sum_{\\alpha = 1}^m b_{\\alpha} x_{\\alpha} + b_0))^2 + \\lambda \\sum_{j=1}^m b_{\\alpha}^2\n",
"\\end{equation}\n",
"\n",
"As a result ridge regression has 2 criteria:\n",
"\n",
"* set the model parameters to minimize the error with training data\n",
"\n",
"* shrink the estimates of the slope parameters towards zero\n",
"\n",
"Note: the intercept is not affected by lambda.\n",
"\n",
"The $\\lambda$ is a hyperparameter that controls the degree of fit of the model and may be related to the model variance and bias trade-off.\n",
"\n",
"* for $\\lambda \\rightarrow 0$ the solution approaches linear regression, there is no bias (relative to a linear model fit), but the variance is high\n",
"\n",
"* as $\\lambda$ increases the model variance decreases and the model bias increases\n",
"\n",
"* for $\\lambda \\rightarrow \\infty$ the coefficients approach 0.0 and the model approaches the global mean\n",
"\n",
"#### Lasso Regression\n",
"\n",
"With the lasso we add a hyperparameter, $\\lambda$, to our minimization, with a L1 shrinkage penalty term.\n",
"\n",
"\\begin{equation}\n",
"\\sum_{i=1}^n \\left(y_i - \\left(\\sum_{\\alpha = 1}^m b_{\\alpha} x_{\\alpha} + b_0 \\right) \\right)^2 + \\lambda \\sum_{j=1}^m |b_{\\alpha}|\n",
"\\end{equation}\n",
"\n",
"As a result the lasso has 2 criteria:\n",
"\n",
"1. set the model parameters to minimize the error with training data\n",
"\n",
"2. shrink the estimates of the slope parameters towards zero. Note: the intercept is not affected by the lambda, $\\lambda$, hyperparameter.\n",
"\n",
"Note the only difference between the lasso and ridge regression is:\n",
"\n",
"* for the lasso the shrinkage term is posed as an $\\ell_1$ penalty ($\\lambda \\sum_{\\alpha=1}^m |b_{\\alpha}|$) \n",
"\n",
"* for ridge regression the shrinkage term is posed as an $\\ell_2$ penalty ($\\lambda \\sum_{\\alpha=1}^m \\left(b_{\\alpha}\\right)^2$).\n",
"\n",
"While both ridge regression and the lasso shrink the model parameters ($b_{\\alpha}, \\alpha = 1,\\ldots,m$) towards zero:\n",
"\n",
"* the lasso parameters reach zero at different rates for each predictor feature as the lambda, $\\lambda$, hyperparameter increases. \n",
"\n",
"* as a result the lasso provides a method for feature ranking and selection!\n",
"\n",
"The lambda, $\\lambda$, hyperparameter controls the degree of fit of the model and may be related to the model variance and bias trade-off.\n",
"\n",
"* for $\\lambda \\rightarrow 0$ the prediction model approaches linear regression, there is lower model bias, but the model variance is higher\n",
"\n",
"* as $\\lambda$ increases the model variance decreases and the model bias increases\n",
"\n",
"* for $\\lambda \\rightarrow \\infty$ the coefficients all become 0.0 and the model is the global mean\n",
"\n",
"\n",
"#### Other Resources\n",
"\n",
"This is a tutorial / demonstration of **Linear Regression**. In $Python$, the $SciPy$ package, specifically the $Stats$ functions (https://docs.scipy.org/doc/scipy/reference/stats.html) provide excellent tools for efficient use of statistics. \n",
"I have previously provided this example in R and posted it on GitHub:\n",
"\n",
"* [Linear Regression in R](https://github.com/GeostatsGuy/geostatsr/blob/master/linear_regression_demo_v2.R)\n",
"* [Linear Regression in R markdown](https://github.com/GeostatsGuy/geostatsr/blob/master/linear_regression_demo_v2.Rmd) with docs \n",
"* [Linear Regression in R document](https://github.com/GeostatsGuy/geostatsr/blob/master/linear_regression_demo_v2.html) knit as an HTML document\n",
"\n",
"and also in Excel:\n",
"\n",
"* [Linear Regression in Excel](https://github.com/GeostatsGuy/ExcelNumericalDemos/blob/master/Linear_Regression_Demo_v2.xlsx)\n",
"\n",
"#### Getting Started\n",
"\n",
"Here's the steps to get setup in Python with the GeostatsPy package:\n",
"\n",
"1. Install Anaconda 3 on your machine (https://www.anaconda.com/download/). \n",
"2. From Anaconda Navigator (within Anaconda3 group), go to the environment tab, click on base (root) green arrow and open a terminal. \n",
"3. In the terminal type: pip install geostatspy. \n",
"4. Open Jupyter and in the top block get started by copy and pasting the code block below from this Jupyter Notebook to start using the geostatspy functionality. \n",
"\n",
"You may want to copy the data file to your working directory. They are available here:\n",
"\n",
"* Tabular data - [Density_Por_data.csv](https://raw.githubusercontent.com/GeostatsGuy/GeoDataSets/master/Density_Por_data.csv).\n",
"\n",
"or you can use the code below to load the data directly from my GitHub [GeoDataSets](https://github.com/GeostatsGuy/GeoDataSets) repository.\n",
"\n",
"#### Import Required Packages\n",
"\n",
"Let's import the GeostatsPy package."
]
},
{
"cell_type": "code",
"execution_count": 14,
"metadata": {},
"outputs": [],
"source": [
"supress_warnings = True\n",
"import math\n",
"import os # to set current working directory \n",
"import numpy as np # arrays and matrix math\n",
"import scipy.stats as st # statistical methods\n",
"import pandas as pd # DataFrames\n",
"import matplotlib.colors as colors # color bar normalization \n",
"import matplotlib.pyplot as plt # for plotting\n",
"from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks\n",
"from sklearn.linear_model import LinearRegression # linear regression\n",
"from sklearn.linear_model import Ridge # ridge regression\n",
"from sklearn.linear_model import Lasso # lASSO regression\n",
"from ipywidgets import interactive # widgets and interactivity\n",
"from ipywidgets import widgets \n",
"from ipywidgets import Layout\n",
"from ipywidgets import Label\n",
"from ipywidgets import VBox, HBox\n",
"cmap = plt.cm.inferno # default color bar, no bias and friendly for color vision defeciency\n",
"plt.rc('axes', axisbelow=True) # grid behind plotting elements\n",
"seed = 73073 # random number seed\n",
"if supress_warnings == True:\n",
" import warnings # supress any warnings for this demonstration\n",
" warnings.filterwarnings('ignore') "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"If you get a package import error, you may have to first install some of these packages. This can usually be accomplished by opening up a command window on Windows and then typing 'python -m pip install [package-name]'. More assistance is available with the respective package docs. \n",
"\n",
"#### Declare functions\n",
"\n",
"Let's define a couple of functions to streamline plotting correlation matrices and visualization of a decision tree regression model. "
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [],
"source": [
"def add_grid():\n",
" plt.gca().grid(True, which='major',linewidth = 1.0); plt.gca().grid(True, which='minor',linewidth = 0.2) # add y grids\n",
" plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)\n",
" plt.gca().xaxis.set_minor_locator(AutoMinorLocator()); plt.gca().yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Set the Working Directory\n",
"\n",
"I always like to do this so I don't lose files and to simplify subsequent read and writes (avoid including the full address each time). Also, in this case make sure to place the required (see below) data file in this working directory. "
]
},
{
"cell_type": "code",
"execution_count": 3,
"metadata": {},
"outputs": [],
"source": [
"#os.chdir(\"C:\\PGE337\") # set the working directory"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Make a Dataset\n",
"\n",
"Let's make a simple dataset with the following characteristics:\n",
"\n",
"* response feature, $Y$, is a linear combination of predictor features, $X_1$ and $X_2$\n",
"* intercept term, $b_0$, is 0.0 and can be neglected, for ease of model parameter space visualization (2D only)"
]
},
{
"cell_type": "code",
"execution_count": 4,
"metadata": {
"scrolled": false
},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"n = 50; seed = 73073\n",
"mean_x1 = 10.0; mean_x2 = 10.0; var_x1 = 4.0; var_x2 = 4.0; cov12 = 2.5\n",
"db1 = 3.0; db2 = 2.0; eps_std = 0.5\n",
"cov = np.array([[var_x1,cov12],[cov12,var_x2]])\n",
"np.random.seed(seed = seed)\n",
"X = np.random.multivariate_normal(mean = [mean_x1,mean_x2],cov = cov, size = n)\n",
"X1, X2 = np.split(X,indices_or_sections = 2,axis=1)\n",
"# y = b1*X1 + b2*X2 + np.random.normal(loc = 0.0,scale = eps_std,size = n)\n",
"y = np.reshape(db1*X1 + db2*X2,newshape=[n]) + np.reshape(np.random.normal(loc = 0.0,scale = eps_std,size = n),newshape=[n])\n",
"\n",
"sc = plt.scatter(X1,X2,c=y,edgecolor='black',cmap = plt.cm.inferno,vmin=20.0,vmax=60.0)\n",
"plt.xlim([5,15]); plt.ylim([5,15]); add_grid(); plt.xlabel('$X_1$'); plt.ylabel('$X_2$'); plt.title('Synthetic Linear Data')\n",
"cbar = plt.colorbar(sc, orientation=\"vertical\", ticks=np.linspace(20, 60, 10))\n",
"cbar.set_label('$Y$', rotation=270, labelpad=20)\n",
"\n",
"plt.subplots_adjust(left=0.0, bottom=0.0, right=1.0, top=1.0, wspace=0.2, hspace=0.2); plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Calculate the Solution Space\n",
"\n",
"Sample of mesh of possible linear regression model parameters and calculate the MSE over all data.\n",
"\n",
"* Note we are not attemption train and test split for model hyperparameter tuning. While we could add a hyper parameter through regularization with LASSO or ridge regression, we just want to visualize this problem."
]
},
{
"cell_type": "code",
"execution_count": 5,
"metadata": {},
"outputs": [
{
"data": {
"image/png": "\n",
"text/plain": [
""
]
},
"metadata": {
"needs_background": "light"
},
"output_type": "display_data"
}
],
"source": [
"clam = 0.0 \n",
"\n",
"linear = LinearRegression(fit_intercept = False).fit(X, y) # fit a linear model\n",
"linear_b1, linear_b2 = linear.coef_\n",
"\n",
"ridge = Ridge(alpha=clam,fit_intercept = False).fit(X, y) # fit a ridge model\n",
"ridge_b1, ridge_b2 = ridge.coef_\n",
"\n",
"L = 10\n",
"mult_b1 = np.zeros(L); mult_b2 = np.zeros(L); mult_lam = np.zeros(L); \n",
"for i,lam in enumerate(np.logspace(-4,6,L)):\n",
" ridge = Ridge(alpha=lam,fit_intercept = False).fit(X, y) # fit a ridge model\n",
" mult_b1[i], mult_b2[i] = ridge.coef_; mult_lam[i] = lam\n",
" \n",
"vmin = 0.0; vmax = 100.0; #vmax = np.max(MSE_mat) # set min and max MSE for visualization\n",
"norm = colors.Normalize(vmin=vmin, vmax=vmax)\n",
"nstep = 200; RSS_mat = np.zeros([nstep,nstep]) # set up the madel parameter mesh\n",
"reg_mat = np.zeros([nstep,nstep]); loss_mat = np.zeros([nstep,nstep]) \n",
"sb1 = -7.0; eb1 = 7.0; stepb1 = (eb1-sb1)/nstep; \n",
"sb2 = -7.0; eb2 = 7.0; stepb2 = (eb2-sb2)/nstep;\n",
"b1_vector = np.arange(sb1, eb1, stepb1); b2_vector = np.arange(eb2, sb2, -1*stepb2)\n",
"b1_mat, b2_mat = np.meshgrid(b1_vector, b2_vector)\n",
"\n",
"for ib1, b1 in enumerate(b1_vector): # calculate the MSE for all possible model parameters\n",
" for ib2, b2 in enumerate(b2_vector):\n",
" y_hat = np.reshape(b1 * X1 + b2 * X2,newshape=[n])\n",
" RSS_mat[ib2,ib1] =((y - y_hat) ** 2).sum()\n",
" reg_mat[ib2,ib1] = b1*b1 + b2*b2\n",
" loss_mat[ib2,ib1] = RSS_mat[ib2,ib1]+clam*reg_mat[ib2,ib1]\n",
"\n",
"plt.subplot(131)\n",
"vmin = np.percentile(RSS_mat.flatten(),q=1); vmax = np.percentile(RSS_mat.flatten(),q=40)\n",
"lvmin = math.log10(vmin); lvmax = math.log10(vmax)\n",
"im = plt.imshow(RSS_mat,interpolation = None,extent = [sb1,eb1,sb2,eb2],alpha=0.1,vmin=vmin,vmax=vmax,\n",
" cmap = plt.cm.inferno_r,zorder=1)\n",
"plt.contour(b1_mat,b2_mat,RSS_mat,levels = np.logspace(lvmin,lvmax,5),linewidths=np.linspace(3,0.3,5),\n",
" colors='black',alpha=0.7,zorder=10)\n",
"plt.scatter([linear_b1],[linear_b2],color='black',marker='x',s=30,zorder=100)\n",
"plt.title('Ridge RSS L2 Only and Optimal Model Parameters')\n",
"plt.xlabel('b1'); plt.ylabel('b2'); add_grid() \n",
"\n",
"plt.subplot(132)\n",
"vmin = np.percentile(reg_mat.flatten(),q=1); vmax = np.percentile(reg_mat.flatten(),q=40)\n",
"lvmin = math.log10(vmin); lvmax = math.log10(vmax)\n",
"im = plt.imshow(reg_mat,interpolation = None,extent = [sb1,eb1,sb2,eb2],alpha=0.1,vmin=vmin,vmax=vmax,\n",
" cmap = plt.cm.inferno_r,zorder=1)\n",
"plt.contour(b1_mat,b2_mat,reg_mat,levels = np.logspace(lvmin,lvmax,5),linewidths=np.linspace(3,0.3,5),\n",
" colors='black',alpha=0.7,zorder=10)\n",
"plt.scatter(0.0,0.0,color='black',marker='x',s=30,zorder=100)\n",
"plt.title('Ridge Shrinkage L2 Only and Optimal Model Parameters')\n",
"plt.xlabel('b1'); plt.ylabel('b2'); add_grid() \n",
"\n",
"plt.subplot(133)\n",
"vmin = np.percentile(loss_mat.flatten(),q=1); vmax = np.percentile(loss_mat.flatten(),q=40)\n",
"lvmin = math.log10(vmin); lvmax = math.log10(vmax)\n",
"im = plt.imshow(loss_mat,interpolation = None,extent = [sb1,eb1,sb2,eb2],alpha=0.1,vmin=vmin,vmax=vmax,\n",
" cmap = plt.cm.inferno_r,zorder=1)\n",
"plt.contour(b1_mat,b2_mat,loss_mat,levels=np.logspace(lvmin,lvmax,5),colors='black',alpha=1.0,\n",
" linewidths=np.linspace(2,0.3,5),zorder=10)\n",
"plt.scatter([linear_b1],[linear_b2],color='grey',marker='x',s=30,zorder=100)\n",
"plt.plot(mult_b1,mult_b2,color='grey',lw=1,ls='--')\n",
"plt.scatter(0.0,0.0,color='grey',marker='x',s=30,zorder=100)\n",
"\n",
"plt.scatter([ridge_b1],[ridge_b2],color='black',marker='x',s=30,zorder=100)\n",
"plt.title('Ridge RSS L2 + Shrinkage L2 and Regularized Model Parameters')\n",
"plt.xlabel('b1'); plt.ylabel('b2'); add_grid() \n",
"\n",
"plt.subplots_adjust(left=0.0, bottom=0.0, right=3.0, top=1.2, wspace=0.2, hspace=0.1); plt.show()"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Interactive Dashboard - Ridge Regression Loss Surface\n",
"\n",
"Let's start by visualizing the change in loss surface for ridge regression as we change the hyperparameter, $\\lambda$.\n",
"\n",
"* we use on 2 predictor feature and assume the intercept is 0.0 ($b_0 = 0.0$) so we can conveniently visualize the entire model parameter space in 2D."
]
},
{
"cell_type": "code",
"execution_count": 21,
"metadata": {},
"outputs": [],
"source": [
"l = widgets.Text(value=' Ridge Regression Regularization Demo, Michael Pyrcz, Professor, The University of Texas at Austin',\n",
" layout=Layout(width='950px', height='30px'))\n",
"# P_happening_label = widgets.Text(value='Probability of Happening',layout=Layout(width='50px',height='30px',line-size='0 px'))\n",
"clam = widgets.FloatLogSlider(min=1, max = 6, value=0, step = 0.25,description = r'$\\lambda$',orientation='horizontal', \n",
" style = {'description_width':'initial','button_color':'green'},layout=Layout(width='900px',height='40px'),\n",
" continuous_update=False,readout_format='.0f')\n",
"\n",
"ui_summary = widgets.HBox([clam],)\n",
"ui_summary1 = widgets.VBox([l,ui_summary],)\n",
"\n",
"def run_plot_summary(clam): \n",
" \n",
" linear = LinearRegression(fit_intercept = False).fit(X, y) # fit a linear model\n",
" linear_b1, linear_b2 = linear.coef_\n",
" \n",
" ridge = Ridge(alpha=clam,fit_intercept = False).fit(X, y) # fit a ridge model\n",
" ridge_b1, ridge_b2 = ridge.coef_\n",
" \n",
" L = 10\n",
" mult_b1 = np.zeros(L); mult_b2 = np.zeros(L); mult_lam = np.zeros(L); \n",
" for i,lam in enumerate(np.logspace(-4,6,L)):\n",
" ridge = Ridge(alpha=lam,fit_intercept = False).fit(X, y) # fit a ridge model\n",
" mult_b1[i], mult_b2[i] = ridge.coef_; mult_lam[i] = lam\n",
" \n",
" vmin = 0.0; vmax = 100.0; #vmax = np.max(MSE_mat) # set min and max MSE for visualization\n",
" norm = colors.Normalize(vmin=vmin, vmax=vmax)\n",
" nstep = 200; RSS_mat = np.zeros([nstep,nstep]) # set up the madel parameter mesh\n",
" reg_mat = np.zeros([nstep,nstep]); loss_mat = np.zeros([nstep,nstep]) \n",
" sb1 = -7.0; eb1 = 7.0; stepb1 = (eb1-sb1)/nstep; \n",
" sb2 = -7.0; eb2 = 7.0; stepb2 = (eb2-sb2)/nstep;\n",
" b1_vector = np.arange(sb1, eb1, stepb1); b2_vector = np.arange(eb2, sb2, -1*stepb2)\n",
" b1_mat, b2_mat = np.meshgrid(b1_vector, b2_vector)\n",
" \n",
" for ib1, b1 in enumerate(b1_vector): # calculate the MSE for all possible model parameters\n",
" for ib2, b2 in enumerate(b2_vector):\n",
" y_hat = np.reshape(b1 * X1 + b2 * X2,newshape=[n])\n",
" RSS_mat[ib2,ib1] =((y - y_hat) ** 2).sum()\n",
" reg_mat[ib2,ib1] = b1*b1 + b2*b2\n",
" loss_mat[ib2,ib1] = RSS_mat[ib2,ib1] + clam*reg_mat[ib2,ib1]\n",
" \n",
" plt.subplot(131)\n",
" vmin = np.percentile(RSS_mat.flatten(),q=1); vmax = np.percentile(RSS_mat.flatten(),q=40)\n",
" lvmin = math.log10(vmin); lvmax = math.log10(vmax)\n",
" im = plt.imshow(RSS_mat,interpolation = None,extent = [sb1,eb1,sb2,eb2],alpha=0.1,vmin=vmin,vmax=vmax,\n",
" cmap = plt.cm.inferno_r,zorder=1)\n",
" plt.contour(b1_mat,b2_mat,RSS_mat,levels = np.logspace(lvmin,lvmax,5),linewidths=np.linspace(3,0.3,5),\n",
" colors='black',alpha=0.7,zorder=10)\n",
" plt.scatter([linear_b1],[linear_b2],color='black',marker='x',s=30,zorder=100)\n",
" plt.title('Ridge RSS L2 Only and Optimal Model Parameters')\n",
" plt.xlabel('b1'); plt.ylabel('b2'); add_grid() \n",
" \n",
" plt.subplot(132)\n",
" vmin = np.percentile(reg_mat.flatten(),q=1); vmax = np.percentile(reg_mat.flatten(),q=40)\n",
" lvmin = math.log10(vmin); lvmax = math.log10(vmax)\n",
" im = plt.imshow(reg_mat,interpolation = None,extent = [sb1,eb1,sb2,eb2],alpha=0.1,vmin=vmin,vmax=vmax,\n",
" cmap = plt.cm.inferno_r,zorder=1)\n",
" plt.contour(b1_mat,b2_mat,reg_mat,levels = np.logspace(lvmin,lvmax,5),linewidths=np.linspace(3,0.3,5),\n",
" colors='black',alpha=0.7,zorder=10)\n",
" plt.scatter(0.0,0.0,color='black',marker='x',s=30,zorder=100)\n",
" plt.title('Ridge Shrinkage L2 Only and Optimal Model Parameters')\n",
" plt.xlabel('b1'); plt.ylabel('b2'); add_grid() \n",
" \n",
" plt.subplot(133)\n",
" vmin = np.percentile(loss_mat.flatten(),q=1); vmax = np.percentile(loss_mat.flatten(),q=40)\n",
" lvmin = math.log10(vmin); lvmax = math.log10(vmax)\n",
" im = plt.imshow(loss_mat,interpolation = None,extent = [sb1,eb1,sb2,eb2],alpha=0.1,vmin=vmin,vmax=vmax,\n",
" cmap = plt.cm.inferno_r,zorder=1)\n",
" plt.contour(b1_mat,b2_mat,loss_mat,levels=np.logspace(lvmin,lvmax,5),colors='black',alpha=1.0,\n",
" linewidths=np.linspace(2,0.3,5),zorder=10)\n",
" plt.scatter([linear_b1],[linear_b2],color='grey',marker='x',s=30,zorder=100)\n",
" plt.plot(mult_b1,mult_b2,color='grey',lw=1,ls='--')\n",
" plt.scatter(0.0,0.0,color='grey',marker='x',s=30,zorder=100)\n",
" \n",
" plt.scatter([ridge_b1],[ridge_b2],color='black',marker='x',s=30,zorder=100)\n",
" plt.title('Ridge RSS L2 + Shrinkage L2 and Regularized Model Parameters')\n",
" plt.xlabel('b1'); plt.ylabel('b2'); add_grid() \n",
" \n",
" plt.subplots_adjust(left=0.0, bottom=0.0, right=3.0, top=1.2, wspace=0.2, hspace=0.1); plt.show()\n",
" \n",
"interactive_plot_summary = widgets.interactive_output(run_plot_summary, {'clam':clam,})\n",
"interactive_plot_summary.clear_output(wait = True) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interactive Ridge Regression Loss Function with Variable $\\lambda$ Demonstration\n",
"\n",
"* change the $\\lambda$ hyperparameter and watch the loss surface change and the solution shift from linear regression to the global mean, all model parameters, $b_{\\alpha} = 0.0, \\alpha = 1,\\ldots,m$. This demonstration based on 2 predictor features dataset.\n",
"\n",
"#### Michael Pyrcz, Professor, The University of Texas at Austin \n",
"##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)"
]
},
{
"cell_type": "code",
"execution_count": 22,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "03458204143a4e8588512e0814d8099c",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(Text(value=' Ridge Regression Regularization Demo, Michae…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "766fcd6b21b5496383a1c8c6155d0ef9",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display(ui_summary1, interactive_plot_summary) # display the interactive plot"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Interactive Dashboard - LASSO Regression Loss Surface\n",
"\n",
"Let's start by visualizing the change in loss surface for LASSO regression as we change the hyperparameter, $\\lambda$.\n",
"\n",
"* we use on 2 predictor feature and assume the intercept is 0.0 ($b_0 = 0.0$) so we can conveniently visualize the entire model parameter space in 2D."
]
},
{
"cell_type": "code",
"execution_count": 23,
"metadata": {},
"outputs": [],
"source": [
"l = widgets.Text(value=' LASSO Regression Regularization Demo, Michael Pyrcz, Professor, The University of Texas at Austin',\n",
" layout=Layout(width='950px', height='30px'))\n",
"# P_happening_label = widgets.Text(value='Probability of Happening',layout=Layout(width='50px',height='30px',line-size='0 px'))\n",
"clam = widgets.FloatSlider(min=10, max = 3000, value=0, step = 35,description = r'$\\lambda$',orientation='horizontal', \n",
" style = {'description_width':'initial','button_color':'green'},layout=Layout(width='900px',height='40px'),\n",
" continuous_update=False,readout_format='.0f')\n",
"\n",
"ui_summary = widgets.HBox([clam],)\n",
"ui_summary3 = widgets.VBox([l,ui_summary],)\n",
"\n",
"def run_plot_summary3(clam): \n",
" \n",
" linear = LinearRegression(fit_intercept = False).fit(X, y) # fit a linear model\n",
" linear_b1, linear_b2 = linear.coef_\n",
" \n",
" lasso = Lasso(alpha=clam,fit_intercept = False).fit(X, y) # fit a ridge model\n",
" lasso_b1, lasso_b2 = lasso.coef_\n",
" \n",
" L = 10\n",
" mult_b1 = np.zeros(L); mult_b2 = np.zeros(L); mult_lam = np.zeros(L); \n",
" for i,lam in enumerate(np.linspace(10,600,L)):\n",
" lasso = Lasso(alpha=lam,fit_intercept = False).fit(X, y) # fit a ridge model\n",
" mult_b1[i], mult_b2[i] = lasso.coef_; mult_lam[i] = lam\n",
" \n",
" vmin = 0.0; vmax = 100.0; #vmax = np.max(MSE_mat) # set min and max MSE for visualization\n",
" norm = colors.Normalize(vmin=vmin, vmax=vmax)\n",
" nstep = 200; RSS_mat = np.zeros([nstep,nstep]) # set up the madel parameter mesh\n",
" reg_mat = np.zeros([nstep,nstep]); loss_mat = np.zeros([nstep,nstep]) \n",
" sb1 = -7.0; eb1 = 7.0; stepb1 = (eb1-sb1)/nstep; \n",
" sb2 = -7.0; eb2 = 7.0; stepb2 = (eb2-sb2)/nstep;\n",
" b1_vector = np.arange(sb1, eb1, stepb1); b2_vector = np.arange(eb2, sb2, -1*stepb2)\n",
" b1_mat, b2_mat = np.meshgrid(b1_vector, b2_vector)\n",
" \n",
" for ib1, b1 in enumerate(b1_vector): # calculate the MSE for all possible model parameters\n",
" for ib2, b2 in enumerate(b2_vector):\n",
" y_hat = np.reshape(b1 * X1 + b2 * X2,newshape=[n])\n",
" RSS_mat[ib2,ib1] =((y - y_hat) ** 2).sum()\n",
" reg_mat[ib2,ib1] = abs(b1) + abs(b2)\n",
" loss_mat[ib2,ib1] = (1/len(y))*RSS_mat[ib2,ib1] + 2*clam*reg_mat[ib2,ib1]\n",
" \n",
" plt.subplot(131)\n",
" vmin = np.percentile(RSS_mat.flatten(),q=1); vmax = np.percentile(RSS_mat.flatten(),q=40)\n",
" lvmin = math.log10(vmin); lvmax = math.log10(vmax)\n",
" im = plt.imshow(RSS_mat,interpolation = None,extent = [sb1,eb1,sb2,eb2],alpha=0.1,vmin=vmin,vmax=vmax,\n",
" cmap = plt.cm.inferno_r,zorder=1)\n",
" plt.contour(b1_mat,b2_mat,RSS_mat,levels = np.logspace(lvmin,lvmax,5),linewidths=np.linspace(3,0.3,5),\n",
" colors='black',alpha=0.7,zorder=10)\n",
" plt.scatter([linear_b1],[linear_b2],color='black',marker='x',s=30,zorder=100)\n",
" plt.title('Ridge RSS L2 Only and Optimal Model Parameters')\n",
" plt.xlabel('b1'); plt.ylabel('b2'); add_grid() \n",
" \n",
" plt.subplot(132)\n",
" vmin = np.percentile(reg_mat.flatten(),q=1); vmax = np.percentile(reg_mat.flatten(),q=40)\n",
" lvmin = math.log10(vmin); lvmax = math.log10(vmax)\n",
" im = plt.imshow(reg_mat,interpolation = None,extent = [sb1,eb1,sb2,eb2],alpha=0.1,vmin=vmin,vmax=vmax,\n",
" cmap = plt.cm.inferno_r,zorder=1)\n",
" plt.contour(b1_mat,b2_mat,reg_mat,levels = np.logspace(lvmin,lvmax,5),linewidths=np.linspace(3,0.3,5),\n",
" colors='black',alpha=0.7,zorder=10)\n",
" plt.scatter(0.0,0.0,color='black',marker='x',s=30,zorder=100)\n",
" plt.title('Ridge Shrinkage L2 Only and Optimal Model Parameters')\n",
" plt.xlabel('b1'); plt.ylabel('b2'); add_grid() \n",
" \n",
" plt.subplot(133)\n",
" vmin = np.percentile(loss_mat.flatten(),q=1); vmax = np.percentile(loss_mat.flatten(),q=40)\n",
" lvmin = math.log10(vmin); lvmax = math.log10(vmax)\n",
" im = plt.imshow(loss_mat,interpolation = None,extent = [sb1,eb1,sb2,eb2],alpha=0.1,vmin=vmin,vmax=vmax,\n",
" cmap = plt.cm.inferno_r,zorder=1)\n",
" plt.contour(b1_mat,b2_mat,loss_mat,levels=np.logspace(lvmin,lvmax,5),colors='black',alpha=1.0,\n",
" linewidths=np.linspace(2,0.3,5),zorder=10)\n",
" plt.scatter([linear_b1],[linear_b2],color='grey',marker='x',s=30,zorder=100)\n",
" plt.plot(mult_b1,mult_b2,color='grey',lw=1,ls='--')\n",
" plt.scatter(0.0,0.0,color='grey',marker='x',s=30,zorder=100)\n",
" \n",
" plt.scatter([lasso_b1],[lasso_b2],color='black',marker='x',s=30,zorder=100)\n",
" plt.title('Ridge RSS L2 + Shrinkage L2 and Regularized Model Parameters')\n",
" plt.xlabel('b1'); plt.ylabel('b2'); add_grid() \n",
" \n",
" plt.subplots_adjust(left=0.0, bottom=0.0, right=3.0, top=1.2, wspace=0.2, hspace=0.1); plt.show()\n",
" \n",
"interactive_plot_summary3 = widgets.interactive_output(run_plot_summary3, {'clam':clam,})\n",
"interactive_plot_summary3.clear_output(wait = True) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interactive LASSO Regression Loss Function with Variable $\\lambda$ Demonstration\n",
"\n",
"* change the $\\lambda$ hyperparameter and watch the loss surface change and the solution shift from linear regression to the global mean, all model parameters, $b_{\\alpha} = 0.0, \\alpha = 1,\\ldots,m$. This demonstration based on 2 predictor features dataset.\n",
"\n",
"#### Michael Pyrcz, Professor, The University of Texas at Austin \n",
"##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)"
]
},
{
"cell_type": "code",
"execution_count": 24,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "f4d6eae7a0fa482db6de110e362de145",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(Text(value=' LASSO Regression Regularization Demo, Michae…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "612c0a6f9d4442368f66607b0884657e",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '', 'i…"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display(ui_summary3, interactive_plot_summary3) # display the interactive plot"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Obeserve the shift from OLS only to regularizaion only loss surfaces as the hyperparameter, $\\lambda$, increases.\n",
"\n",
"#### Interactive Dashboard - Ridge vs. LASSO OLS vs. Regularization\n",
"\n",
"Let's start by visualizing the change in OLS vs. regularization components of the loss function for ridge regression and LASSO regression as we change the hyperparameter, $\\lambda$.\n",
"\n",
"* we use on 2 predictor feature and assume the intercept is 0.0 ($b_0 = 0.0$) so we can conveniently visualize the entire model parameter space in 2D."
]
},
{
"cell_type": "code",
"execution_count": 25,
"metadata": {},
"outputs": [],
"source": [
"l = widgets.Text(value=' Ridge L2 and LASSO L1 Regression Regularization Demo, Michael Pyrcz, Professor, The University of Texas at Austin',\n",
" layout=Layout(width='950px', height='30px'))\n",
"# P_happening_label = widgets.Text(value='Probability of Happening',layout=Layout(width='50px',height='30px',line-size='0 px'))\n",
"slam = widgets.IntSlider(min=1, max = 20, value=1, step = 1.0,description = r'$\\lambda$ level',orientation='horizontal', \n",
" style = {'description_width':'initial','button_color':'green'},layout=Layout(width='900px',height='40px'),\n",
" continuous_update=False,readout_format='.0f')\n",
"\n",
"ui_summary = widgets.HBox([slam],)\n",
"ui_summary2 = widgets.VBox([l,ui_summary],)\n",
"\n",
"def run_plot_summary2(slam): \n",
" ridge_lam_mat = np.logspace(1,6,20); lasso_lam_mat = np.linspace(1,600,20)\n",
" ridge_lam = ridge_lam_mat[slam-1]; lasso_lam = lasso_lam_mat[slam-1];\n",
" \n",
" # calculate paths over multiple hyperparameters\n",
" L = 100\n",
" ridge_mult_b1 = np.zeros(L); ridge_mult_b2 = np.zeros(L); ridge_mult_lam = np.zeros(L) \n",
" lasso_mult_b1 = np.zeros(L); lasso_mult_b2 = np.zeros(L); lasso_mult_lam = np.zeros(L)\n",
" for i,lam in enumerate(np.logspace(-4,6,L)):\n",
" ridge = Ridge(alpha=lam,fit_intercept = False).fit(X, y) # fit a ridge model\n",
" ridge_mult_b1[i], ridge_mult_b2[i] = ridge.coef_; ridge_mult_lam[i] = lam\n",
" \n",
" for i,lam in enumerate(np.logspace(0,3,L)):\n",
" lasso = Lasso(alpha=lam,fit_intercept = False).fit(X, y) # fit a LASSO model\n",
" lasso_mult_b1[i], lasso_mult_b2[i] = lasso.coef_; lasso_mult_lam[i] = lam\n",
" \n",
" # fit linear, Ridge and LASSO models\n",
" linear = LinearRegression(fit_intercept = False).fit(X, y) # fit a linear model\n",
" linear_b1, linear_b2 = linear.coef_\n",
" \n",
" ridge = Ridge(alpha=ridge_lam,fit_intercept = False).fit(X, y) # fit a ridge model\n",
" ridge_b1, ridge_b2 = ridge.coef_\n",
" \n",
" lasso = Lasso(alpha=lasso_lam,fit_intercept = False).fit(X, y) # fit a ridge model\n",
" lasso_b1, lasso_b2 = lasso.coef_ \n",
" \n",
" vmin = 0.0; vmax = 100.0; #vmax = np.max(MSE_mat) # set min and max MSE for visualization\n",
" norm = colors.Normalize(vmin=vmin, vmax=vmax)\n",
" nstep = 200; RSS_mat = np.zeros([nstep,nstep]) # set up the madel parameter mesh\n",
" ridge_reg_mat = np.zeros([nstep,nstep]); ridge_loss_mat = np.zeros([nstep,nstep]) \n",
" lasso_reg_mat = np.zeros([nstep,nstep]); lasso_loss_mat = np.zeros([nstep,nstep]) \n",
" sb1 = -7.0; eb1 = 7.0; stepb1 = (eb1-sb1)/nstep; \n",
" sb2 = -7.0; eb2 = 7.0; stepb2 = (eb2-sb2)/nstep;\n",
" b1_vector = np.arange(sb1, eb1, stepb1); b2_vector = np.arange(eb2, sb2, -1*stepb2)\n",
" b1_mat, b2_mat = np.meshgrid(b1_vector, b2_vector)\n",
" \n",
" for ib1, b1 in enumerate(b1_vector): # calculate the MSE for all possible model parameters\n",
" for ib2, b2 in enumerate(b2_vector):\n",
" y_hat = np.reshape(b1 * X1 + b2 * X2,newshape=[n])\n",
" RSS_mat[ib2,ib1] =((y - y_hat) ** 2).sum()\n",
" ridge_reg_mat[ib2,ib1] = b1*b1 + b2*b2\n",
" ridge_loss_mat[ib2,ib1] = RSS_mat[ib2,ib1]+ridge_lam*ridge_reg_mat[ib2,ib1]\n",
" \n",
" lasso_reg_mat[ib2,ib1] = abs(b1) + abs(b2)\n",
" lasso_loss_mat[ib2,ib1] = RSS_mat[ib2,ib1]+lasso_lam*lasso_reg_mat[ib2,ib1]\n",
" \n",
" ### find the solution\n",
" iridge_b1 = (np.absolute(b1_vector-ridge_b1)).argmin(); iridge_b2 = (np.absolute(b2_vector-ridge_b2)).argmin()\n",
" ridge_RSS = RSS_mat[iridge_b2,iridge_b1]; ridge_reg = ridge_reg_mat[iridge_b2,iridge_b1]\n",
" \n",
" ilasso_b1 = (np.absolute(b1_vector-lasso_b1)).argmin(); ilasso_b2 = (np.absolute(b2_vector-lasso_b2)).argmin()\n",
" lasso_RSS = RSS_mat[ilasso_b2,ilasso_b1]; lasso_reg = lasso_reg_mat[ilasso_b2,ilasso_b1]\n",
" \n",
" plt.subplot(121)\n",
" \n",
" vmin = np.percentile(RSS_mat.flatten(),q=1); vmax = np.percentile(RSS_mat.flatten(),q=20)\n",
" lvmin = math.log10(vmin); lvmax = math.log10(vmax)\n",
" plt.contour(b1_mat,b2_mat,RSS_mat,levels = np.logspace(lvmin,lvmax,5),linewidths=np.linspace(0.8,0.2,5),\n",
" colors='darkred',linestyles='dotted',alpha=0.4,zorder=10)\n",
" vmin = np.percentile(ridge_reg_mat.flatten(),q=1); vmax = np.percentile(ridge_reg_mat.flatten(),q=20)\n",
" lvmin = math.log10(vmin); lvmax = math.log10(vmax)\n",
" plt.contour(b1_mat,b2_mat,ridge_reg_mat,levels = np.logspace(lvmin,lvmax,5),linewidths=np.linspace(0.8,0.2,5),\n",
" colors='#cc5500',alpha=0.4,zorder=10)\n",
" \n",
" plt.contour(b1_mat,b2_mat,RSS_mat,levels = [ridge_RSS],linewidths=2.0,\n",
" colors='darkred',alpha=0.7,zorder=10)\n",
" plt.scatter([linear_b1],[linear_b2],color='darkred',marker='x',s=50,zorder=100)\n",
" \n",
" plt.contour(b1_mat,b2_mat,ridge_reg_mat,levels = [ridge_reg],linewidths=2.0,\n",
" colors='#cc5500',alpha=0.7,zorder=10)\n",
" plt.scatter([0.0],[0.0],color='#cc5500',marker='x',s=50,zorder=100)\n",
" \n",
" plt.scatter([ridge_b1],[ridge_b2],color='blue',edgecolor='black',marker='o',s=30,zorder=1000)\n",
" plt.plot(ridge_mult_b1,ridge_mult_b2,color='black',lw=1)\n",
" \n",
" plt.annotate('Global \\n Mean',(0.8,-0.5),ha='center',color = '#cc5500')\n",
" plt.annotate('Ordinary Least \\n Square',(linear_b1 + 1.6,linear_b2 - 0.5), ha='center',color='darkred')\n",
" \n",
" plt.plot([ridge_b1,ridge_b1],[sb1,ridge_b2],color='blue',lw=1.0,ls='--')\n",
" plt.annotate(r'$b_1$ = ' + str(np.round(ridge_b1,1)),(ridge_b1+0.2,-6.0),ha='center',color = 'blue',rotation=270)\n",
" \n",
" plt.plot([sb1,ridge_b1],[ridge_b2,ridge_b2],color='blue',lw=1.0,ls='--')\n",
" plt.annotate(r'$b_2$ = ' + str(np.round(ridge_b2,1)),(-6.0,ridge_b2+0.2),ha='center',color = 'blue')\n",
" \n",
" plt.title(r'Ridge Regression RSS L2 and Regularization L2 for $\\lambda = $' + str(np.round(ridge_lam)))\n",
" plt.xlabel(r'$b_1$ Model Parameter'); plt.ylabel(r'$b_2$ Model Parameter'); add_grid() \n",
" plt.xlim([sb1,eb1]); plt.ylim([sb2,eb2])\n",
" \n",
" plt.subplot(122)\n",
" \n",
" vmin = np.percentile(RSS_mat.flatten(),q=1); vmax = np.percentile(RSS_mat.flatten(),q=20)\n",
" lvmin = math.log10(vmin); lvmax = math.log10(vmax)\n",
" plt.contour(b1_mat,b2_mat,RSS_mat,levels = np.logspace(lvmin,lvmax,5),linewidths=np.linspace(0.8,0.2,5),\n",
" colors='darkred',linestyles='dotted',alpha=0.4,zorder=10)\n",
" vmin = np.percentile(lasso_reg_mat.flatten(),q=1); vmax = np.percentile(lasso_reg_mat.flatten(),q=20)\n",
" lvmin = math.log10(vmin); lvmax = math.log10(vmax)\n",
" plt.contour(b1_mat,b2_mat,lasso_reg_mat,levels = np.logspace(lvmin,lvmax,5),linewidths=np.linspace(0.8,0.2,5),\n",
" colors='#cc5500',alpha=0.4,zorder=10)\n",
" \n",
" plt.contour(b1_mat,b2_mat,RSS_mat,levels = [lasso_RSS],linewidths=2.0,\n",
" colors='darkred',alpha=0.7,zorder=10)\n",
" plt.scatter([linear_b1],[linear_b2],color='darkred',marker='x',s=50,zorder=100)\n",
" \n",
" plt.contour(b1_mat,b2_mat,lasso_reg_mat,levels = [lasso_reg],linewidths=2.0,\n",
" colors='#cc5500',alpha=0.7,zorder=10)\n",
" plt.scatter([0.0],[0.0],color='#cc5500',marker='x',s=50,zorder=100)\n",
" \n",
" plt.scatter([lasso_b1],[lasso_b2],color='blue',edgecolor='black',marker='o',s=30,zorder=1000)\n",
" plt.plot(lasso_mult_b1,lasso_mult_b2,color='black',lw=1)\n",
" \n",
" plt.annotate('Global \\n Mean',(0.8,-0.5),ha='center',color = '#cc5500')\n",
" plt.annotate('Ordinary Least \\n Square',(linear_b1 + 1.6,linear_b2 - 0.5), ha='center',color='darkred')\n",
" \n",
" plt.plot([lasso_b1,lasso_b1],[sb1,lasso_b2],color='blue',lw=1.0,ls='--')\n",
" plt.annotate(r'$b_1$ = ' + str(np.round(lasso_b1,1)),(lasso_b1+0.2,-6.0),ha='center',color = 'blue',rotation=270)\n",
" \n",
" plt.plot([sb1,lasso_b1],[lasso_b2,lasso_b2],color='blue',lw=1.0,ls='--')\n",
" plt.annotate(r'$b_2$ = ' + str(np.round(lasso_b2,1)),(-6.0,lasso_b2+0.2),ha='center',color = 'blue')\n",
" \n",
" plt.title(r'LASSO Regression RSS L2 and Regularization L1 for $\\lambda = $' + str(np.round(lasso_lam,1)))\n",
" plt.xlabel(r'$b_1$ Model Parameter'); plt.ylabel(r'$b_2$ Model Parameter'); add_grid() \n",
" plt.xlim([sb1,eb1]); plt.ylim([sb2,eb2])\n",
" \n",
" plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=1.2, wspace=0.2, hspace=0.1); plt.show()\n",
" \n",
"interactive_plot_summary2 = widgets.interactive_output(run_plot_summary2, {'slam':slam,})\n",
"interactive_plot_summary.clear_output(wait = True) "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### Interactive Ridge vs. LASSO Regression Regularization with Variable $\\lambda$ Demonstration\n",
"\n",
"* change the $\\lambda$ hyperparameter level and watch the solution shift from linear regression to the global mean, all model parameters, $b_{\\alpha} = 0.0, \\alpha = 1,\\ldots,m$. This demonstration based on 2 predictor features dataset and I use hyperparameter levels since ridge and LASSO respond quite differently to $\\lambda$ magnitude.\n",
"\n",
"#### Michael Pyrcz, Professor, The University of Texas at Austin \n",
"##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)"
]
},
{
"cell_type": "code",
"execution_count": 26,
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "558707475bc845768d45b311f242ff84",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"VBox(children=(Text(value=' Ridge L2 and LASSO L1 Regression Regularizat…"
]
},
"metadata": {},
"output_type": "display_data"
},
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "15a63008e2c44a428060603fa158879f",
"version_major": 2,
"version_minor": 0
},
"text/plain": [
"Output()"
]
},
"metadata": {},
"output_type": "display_data"
}
],
"source": [
"display(ui_summary2, interactive_plot_summary2) # display the interactive plot"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"#### Comments\n",
"\n",
"These interactive dashboards were designed as an educational tool to assist in learning about hyperparameter tuning and model regularization. The dashboards include:\n",
"\n",
"* the loss function as surfaces for ridge regression to observe the impact of the hyperparameter, model regularization, on the solution. \n",
"\n",
"* the loss function components for OLS and regularization for ridge regression and LASSO regression to observe the impact the hyperparameter on the solution.\n",
"\n",
"Specifically we can observe the balancing of the ordinary least square solution, data fit only, and regularization, lowest absolute (closest to zero) possible model parameters.\n",
"\n",
"I hope this is helpful,\n",
"\n",
"*Michael*\n",
"\n",
"Michael Pyrcz, Ph.D., P.Eng. Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin\n",
"On twitter I'm the @GeostatsGuy.\n",
"\n",
"***\n",
"\n",
"#### More on Michael Pyrcz and the Texas Center for Geostatistics:\n",
"\n",
"### Michael Pyrcz, Professor, University of Texas at Austin \n",
"*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*\n",
"\n",
"With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. \n",
"\n",
"For more about Michael check out these links:\n",
"\n",
"#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n",
"\n",
"#### Want to Work Together?\n",
"\n",
"I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.\n",
"\n",
"* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! \n",
"\n",
"* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!\n",
"\n",
"* I can be reached at mpyrcz@austin.utexas.edu.\n",
"\n",
"I'm always happy to discuss,\n",
"\n",
"*Michael*\n",
"\n",
"Michael Pyrcz, Ph.D., P.Eng. Associate Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin\n",
"\n",
"#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig) | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.4"
}
},
"nbformat": 4,
"nbformat_minor": 2
}