<p align="center">
    <img src="https://github.com/GeostatsGuy/GeostatsPy/blob/master/TCG_color_logo.png?raw=true" width="220" height="240" />

</p>

## Interactive Convolution and k-Nearest Neighbours Regression

#### Michael J. Pyrcz, Professor, The University of Texas at Austin 

##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)

Here's a set of interactive dashboards to explore k-Nearest Neighbours-based predictive machine learning models. Since it is useful to start from convolution, my first dash board is a simpel convolution regression model. Then I switch from convolution with a set window size to k-nearest neighbours with addaptive window size scaling based on local data density. I also show the difference between exhaustive and sparsely sampled cases.

To assist you with background content, I have a lectures available with linked interactive codes and well-documented workflows in Python:

* [k-Nearest Neighbours Regression](https://youtu.be/lzmeChSYvv8?si=EBxfAfSt3MD8tV_1) 
* [k-Nearest Neighbours Considerations](https://youtu.be/Zw1WAH6s5yg?si=1MiR-x8jY3wer5v6) 

I also have lectures on:

* [Machine Learning Basics](https://youtu.be/zOUM_AnI1DQ?si=L1FxPRc-n9y8Yuk6)
* [Machine Learning Model Generalization and Overfit](https://youtu.be/GGoNTMrCBbk?si=itx1p3G6PG7witpe)
* [Machine Learning Model Norms](https://youtu.be/JmxGlrurQp0?si=FPC7-Et66bWRMhFl)
    
these are all part of my [Machine Learning](https://www.youtube.com/playlist?list=PLG19vXLQHvSC2ZKFIkgVpI9fCjkN38kwf) course. Note, all recorded lectures, interactive and well-documented workflow demononstrations are available on my GitHub repository [GeostatsGuy's Python Numerical Demos](https://github.com/GeostatsGuy/PythonNumericalDemos). 

#### k-Nearest Neighbours Regression

TBD

#### Other Resources

This is a tutorial / demonstration of **k-Nearest Neighbours Regression**.  In $Python$, the $SciPy$ package, specifically the $Stats$ functions (https://docs.scipy.org/doc/scipy/reference/stats.html) provide excellent tools for efficient use of statistics.  
I have previously provided this example in R and posted it on GitHub:

#### Getting Started

Here's the steps to get setup in Python with the GeostatsPy package:

1. Install Anaconda 3 on your machine (https://www.anaconda.com/download/). 
2. From Anaconda Navigator (within Anaconda3 group), go to the environment tab, click on base (root) green arrow and open a terminal. 
3. In the terminal type: pip install geostatspy. 
4. Open Jupyter and in the top block get started by copy and pasting the code block below from this Jupyter Notebook to start using the geostatspy functionality. 

#### Import Required Packages

Let's import the GeostatsPy package.

In [1]:
supress_warnings = True
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.ticker import (MultipleLocator, AutoMinorLocator) # control of axes ticks
from matplotlib.ticker import (MultipleLocator, 
                               FormatStrFormatter, 
                               AutoMinorLocator) 
from ipywidgets import interactive                          # widgets and interactivity
from ipywidgets import widgets                            
from ipywidgets import Layout
from ipywidgets import Label
from ipywidgets import VBox, HBox
import heapq
cmap = plt.cm.inferno                                       # default color bar, no bias and friendly for color vision defeciency
plt.rc('axes', axisbelow=True)                              # grid behind plotting elements
seed = 73073                                                # random number seed
if supress_warnings == True:
    import warnings                                         # supress any warnings for this demonstration
    warnings.filterwarnings('ignore')  
cmap = plt.cm.inferno

If you get a package import error, you may have to first install some of these packages. This can usually be accomplished by opening up a command window on Windows and then typing 'python -m pip install [package-name]'. More assistance is available with the respective package docs. 

#### Declare functions

Let's define a couple of functions to streamline plotting correlation matrices and visualization of a decision tree regression model. 

In [2]:
def add_grid():
    plt.gca().grid(True, which='major',linewidth = 1.0); plt.gca().grid(True, which='minor',linewidth = 0.2) # add y grids
    plt.gca().tick_params(which='major',length=7); plt.gca().tick_params(which='minor', length=4)
    plt.gca().xaxis.set_minor_locator(AutoMinorLocator()); plt.gca().yaxis.set_minor_locator(AutoMinorLocator()) # turn on minor ticks

def findClosestElements(arr, k, x): # function from https://www.geeksforgeeks.org/find-k-closest-elements-given-value/, thank you!)
    # Create a max heap to store the pairs of absolute differences and negative values
    max_heap = []
 
    for num in arr:
        # Skip if the element is equal to x
        if num == x:
            continue
 
        # Calculate the absolute difference and add the pair to the max heap
        diff = abs(num - x)
        heapq.heappush(max_heap, (-diff, num))
 
        # If the size of the max heap exceeds k, remove the element with the maximum absolute difference
        if len(max_heap) > k:
            heapq.heappop(max_heap)
 
    # Store the result in an array
    result = []
 
    # Retrieve the top k elements from the max heap
    while max_heap:
        # Get the top element from the max heap
        diff, num = heapq.heappop(max_heap)
 
        # Add the value to the result array
        result.append(num)
 
    # Return the closest numbers in ascending order
    return sorted(result)  

#### Interactive Dashboard - convolution-based regression in 1D

Let's start by visualizing convolution-based regression in 1D

In [3]:
l = widgets.Text(value='                                                     Convolution Regression Demo  -   Michael Pyrcz, Professor, The University of Texas at Austin',
        layout=Layout(width='950px', height='30px'))
# P_happening_label = widgets.Text(value='Probability of Happening',layout=Layout(width='50px',height='30px',line-size='0 px'))

ftype = widgets.Dropdown(value='Uniform',options=['Uniform', 'Distance'], description='Wt.Shape:',
        layout=Layout(width='300px',height='40px'))

fs = widgets.IntSlider(min=1, max = 50, value=5, step = 1,description = ' Wt.Size:',orientation='horizontal', 
        style = {'description_width':'initial','button_color':'green'},layout=Layout(width='300px',height='40px'),
        continuous_update=False,readout_format='.0f')

psamp = widgets.FloatSlider(min=0.0, max = 1.0, value=0.2, step = 0.1,description = r'Sample Proportion:',orientation='horizontal', 
        style = {'description_width':'initial','button_color':'green'},layout=Layout(width='300px',height='40px'),
        continuous_update=False,readout_format='.2f')

ui_summary = widgets.HBox([ftype,fs,psamp],)
ui_summary1 = widgets.VBox([l,ui_summary],)

def run_plot_summary(ftype,fs,psamp):
    gr = 10; seed = 13
    
    np.random.seed(seed=seed)
    x = np.linspace(1,100,100)
    g = np.convolve(np.random.normal(loc=10,scale=2,size=100+2*gr),np.ones(gr*2+1)/(gr*2+1),mode='same')[gr:-gr]
    
    f = np.ones(fs*2+1)
    if ftype == 'Uniform':
        f = f/(fs*2+1)
    elif ftype == 'Distance':
        for i in range(fs*2+1):
            f[i] = abs(1/( (i - fs) +0.01))
        f[fs] = 1.5; f = f/sum(f)
        
    # y = np.convolve(g,f,mode='full')
    y = np.zeros(len(x))
    for i in range(0,len(x)):
        ssum = 0; count = 0; sum_wt = 0.0
        for j in range(-fs,fs,1):
            if i+j >= 0 and i+j < len(x):
    #                 print(j, i, i+j)
                ssum = ssum + g[i+j]*f[j+fs]
                count = count + 1; sum_wt = sum_wt + f[j+fs]
    #           print(f[j+fs],g[i+j])
    #   print(sum_wt); print(ssum)
        if(sum_wt > 0):
            y[i] = ssum / (sum_wt)
        else: 
            y[i] = np.NaN
    #     print(i)
    
    y2 = np.zeros(len(x))
    np.random.seed(seed=seed)
    ind = np.where(np.random.rand(len(x)) < psamp, True, False)
    
    for i in range(0,len(x)):
        ssum = 0; count = 0; sum_wt = 0.0
        for j in range(-fs,fs,1):
            if i+j >= 0 and i+j < len(x):
                if ind[i+j]:
    #                 print(j, i, i+j)
                    ssum = ssum + g[i+j]*f[j+fs]
                    count = count + 1; sum_wt = sum_wt + f[j+fs]
    #                 print(f[j+fs],g[i+j])
    #     print(sum_wt); print(ssum)
        if(sum_wt > 0):
            y2[i] = ssum / (sum_wt)
        else: 
            y2[i] = np.NaN
    #     print(y2[i])
    
    plt.subplot(131)
    plt.plot(np.linspace(-fs,fs,fs*2+1),f,color='black',lw=2,label='Weighting Function, $f(\Delta)$',zorder=100)
    plt.plot([-fs,-fs],[0,f[0]],color='black',lw=2,zorder=100)
    plt.plot([fs,fs],[0,f[len(f)-1]],color='black',lw=2,zorder=100)
    plt.fill_between(np.linspace(-fs,fs,fs*2+1),np.zeros(fs*2+1),f,color='black',alpha=0.4,zorder=10)
    plt.xlim([-100,100]); plt.ylim([0,np.max(f)*1.5])

#    plt.gca().yaxis.set_major_locator(MultipleLocator(1)) 
    plt.gca().yaxis.set_major_formatter(FormatStrFormatter('% 1.3f')) 
    
    
    add_grid(); plt.legend(loc='upper left')
    plt.xlabel(r'$\Delta$ (m)'); plt.title('Weighting Function, $f(\Delta)$')
    
    plt.subplot(132)
    plt.plot(x,g,color='black',lw=2,label=r'Original Function, $g(x)$')
    # plt.plot(x,y[fs:-fs],color='red',lw=2,label=r'Convolved Function, $(f * g)$')
    plt.plot(x,y,color='red',lw=2,label=r'Convolved Function, $(f * g)$')
    plt.xlim([1,100]); plt.ylim([8,12])
    add_grid(); plt.legend(loc='upper left')
    plt.xlabel(r'$x$ (m)'); plt.title('Original and Convolved Functions')
    
    plt.subplot(133)
    plt.plot(x,g,color='black',alpha=0.3,lw=2,label=r'Original Function, $g(x)$')
    plt.scatter(x[ind],g[ind],color='black',edgecolor='black',label=r'Sample Data',zorder=10)
    plt.plot(x,y2,color='red',lw=2,label=r'Sparse Convolved Function, $(f * g)$')
    plt.xlim([1,100]); plt.ylim([8,12])
    add_grid(); plt.legend(loc='upper left')
    plt.xlabel(r'$x$ (m)'); plt.title('Original and Sparse DataConvolved Functions')
    
    plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=1.2, wspace=0.2, hspace=0.1); plt.show()  
        
interactive_plot_summary = widgets.interactive_output(run_plot_summary, {'ftype':ftype,'fs':fs,'psamp':psamp})
interactive_plot_summary.clear_output(wait = True)  

### Interactive Convolution-based Regression Demonstration

* select weights 'Uniform' or 'Distance', weighting window size, & proportion of samples. Then compare the exhaustive & sparsely sampled cases.

#### Michael Pyrcz, Professor, The University of Texas at Austin 
##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)

In [4]:
display(ui_summary1, interactive_plot_summary)                           # display the interactive plot

VBox(children=(Text(value='                                                     Convolution Regression Demo  -…

Output()

#### Interactive Dashboard - k-Nearest Neighbours in 1D

Let's start by visualizing the change in k-Nearest Neighbours with change in k and weighting function.

In [7]:
l = widgets.Text(value='                                      k-Nearest Neighbours Regression Demo  -   Michael Pyrcz, Professor, The University of Texas at Austin',
        layout=Layout(width='950px', height='30px'))
# P_happening_label = widgets.Text(value='Probability of Happening',layout=Layout(width='50px',height='30px',line-size='0 px'))

ftype = widgets.Dropdown(value='Uniform',options=['Uniform', 'Distance'], description='Wt.Shape:',
        layout=Layout(width='300px',height='40px'))

k = widgets.IntSlider(min=1, max = 100, value=5, step = 1,description = ' k:',orientation='horizontal', 
        style = {'description_width':'initial','button_color':'green'},layout=Layout(width='300px',height='40px'),
        continuous_update=False,readout_format='.0f')

psamp = widgets.FloatSlider(min=0.0, max = 1.0, value=0.2, step = 0.1,description = r'Sample Proportion:',orientation='horizontal', 
        style = {'description_width':'initial','button_color':'green'},layout=Layout(width='300px',height='40px'),
        continuous_update=False,readout_format='.2f')

ui_summary = widgets.HBox([ftype,k,psamp],)
ui_summary2 = widgets.VBox([l,ui_summary],)

def run_plot_summary2(ftype,k,psamp):
    gr = 10; seed = 13; fs = 100
    
    np.random.seed(seed=seed)
    x = np.linspace(1,100,100)
    g = np.convolve(np.random.normal(loc=10,scale=2,size=100+2*gr),np.ones(gr*2+1)/(gr*2+1),mode='same')[gr:-gr]
    
    f = np.ones(fs*2+1)
    if ftype == 'Uniform':
        f = f/(fs*2+1)
    elif ftype == 'Distance':
        for i in range(fs*2+1):
            f[i] = abs(1/( (i - fs) +0.01))
        f[fs] = 1.5; f = f/sum(f)
        
    # y = np.convolve(g,f,mode='full')
    y = np.zeros(len(x))
    for i in range(0,len(x)):
        ssum = 0; count = 0; sum_wt = 0.0
        nearest = findClosestElements(x, k, i)
#         print(nearest)
        for j in nearest:
            ssum = ssum + g[int(j-1)]*f[int(j-1)-i+fs]
            count = count + 1; sum_wt = sum_wt + f[int(j-1)-i+fs]
    #           print(f[j+fs],g[i+j])
    #   print(sum_wt); print(ssum)
        if(sum_wt > 0):
            y[i] = ssum / (sum_wt)
        else: 
            y[i] = np.NaN
    #     print(i)
    
    y2 = np.zeros(len(x))
    np.random.seed(seed=seed)
    ind = np.where(np.random.rand(len(x)) < psamp, True, False)
    xind = x[ind]
    for i in range(0,len(x)):
        ssum = 0; count = 0; sum_wt = 0.0
        nearest_sp = findClosestElements(xind, k, i+0.01)
#        print(nearest_sp)
        for j in nearest_sp:
            ssum = ssum + g[int(j-1)]*f[int(j-1)-i+fs]
            count = count + 1; sum_wt = sum_wt + f[int(j-1)-i+fs]
    #           print(f[j+fs],g[i+j])
#         print(sum_wt); print(ssum)
        if(sum_wt > 0):
            y2[i] = ssum / (sum_wt)
        else: 
            y2[i] = np.NaN
    #     print(i)
    
#     for i in range(0,len(x)):
#         ssum = 0; count = 0; sum_wt = 0.0
#         for j in range(-fs,fs,1):
#             if i+j >= 0 and i+j < len(x):
#                 if ind[i+j]:
#     #                 print(j, i, i+j)
#                     ssum = ssum + g[i+j]*f[j+fs]
#                     count = count + 1; sum_wt = sum_wt + f[j+fs]
#     #                 print(f[j+fs],g[i+j])
#     #     print(sum_wt); print(ssum)
#         if(sum_wt > 0):
#             y2[i] = ssum / (sum_wt)
#         else: 
#             y2[i] = np.NaN
#     #     print(y2[i])
    
    plt.subplot(131)
    plt.plot(np.linspace(-fs,fs,fs*2+1),f,color='black',lw=2,label='Weighting Function, $f(\Delta)$',zorder=100)
    plt.plot([-fs,-fs],[0,f[0]],color='black',lw=2,zorder=100)
    plt.plot([fs,fs],[0,f[len(f)-1]],color='black',lw=2,zorder=100)
    plt.fill_between(np.linspace(-fs,fs,fs*2+1),np.zeros(fs*2+1),f,color='black',alpha=0.4,zorder=10)
    plt.xlim([-100,100]); plt.ylim([0,np.max(f)*1.5])

#    plt.gca().yaxis.set_major_locator(MultipleLocator(1)) 
    plt.gca().yaxis.set_major_formatter(FormatStrFormatter('% 1.3f')) 
    
    
    add_grid(); plt.legend(loc='upper left')
    plt.xlabel(r'$\Delta$ (m)'); plt.title('Weighting Function, $f(\Delta)$')
    
    plt.subplot(132)
    plt.plot(x,g,color='black',lw=2,label=r'Original Function, $g(x)$')
    # plt.plot(x,y[fs:-fs],color='red',lw=2,label=r'Convolved Function, $(f * g)$')
    plt.plot(x,y,color='red',lw=2,label=r'Convolved Function, $(f * g)$')
    plt.xlim([1,100]); plt.ylim([8,12])
    add_grid(); plt.legend(loc='upper left')
    plt.xlabel(r'$x$ (m)'); plt.title('Exhuastive Truth & k_Nearest Neighbours')
    
    plt.subplot(133)
    plt.plot(x,g,color='black',alpha=0.3,lw=2,label=r'Original Function, $g(x)$')
    plt.scatter(x[ind],g[ind],color='black',edgecolor='black',label=r'Sample Data',zorder=10)
    plt.plot(x,y2,color='red',lw=2,label=r'Sparse Convolved Function, $(f * g)$')
    plt.xlim([1,100]); plt.ylim([8,12])
    add_grid(); plt.legend(loc='upper left')
    plt.xlabel(r'$x$ (m)'); plt.title('Exhuastive Truth, Sparse Data, & k_Nearest Neighbours')
    
    plt.subplots_adjust(left=0.0, bottom=0.0, right=2.0, top=1.2, wspace=0.2, hspace=0.1); plt.show()  
        
interactive_plot_summary2 = widgets.interactive_output(run_plot_summary2, {'ftype':ftype,'k':k,'psamp':psamp})
interactive_plot_summary2.clear_output(wait = True)  

### Interactive k-Nearest Neighbours Regression Demonstration

* select weights 'Uniform' or 'Distance', k, & proportion of samples. Then compare the exhaustive & sparsely sampled cases.

#### Michael Pyrcz, Professor, The University of Texas at Austin 
##### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1) | [GeostatsPy](https://github.com/GeostatsGuy/GeostatsPy)

In [8]:
display(ui_summary2, interactive_plot_summary2)                           # display the interactive plot

VBox(children=(Text(value='                                      k-Nearest Neighbours Regression Demo  -   Mic…

Output(outputs=({'output_type': 'display_data', 'data': {'text/plain': '<Figure size 432x288 with 3 Axes>', 'i…

#### Comments

These interactive dashboards were designed as an educational tool to assist in learning about convolution and specifically k-nearest neighbours predictive machine learning. The dashboards include:

TBD


I hope this is helpful,

*Michael*

Michael Pyrcz, Ph.D., P.Eng. Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin
On twitter I'm the @GeostatsGuy.

***

#### More on Michael Pyrcz and the Texas Center for Geostatistics:

### Michael Pyrcz, Professor, University of Texas at Austin 
*Novel Data Analytics, Geostatistics and Machine Learning Subsurface Solutions*

With over 17 years of experience in subsurface consulting, research and development, Michael has returned to academia driven by his passion for teaching and enthusiasm for enhancing engineers' and geoscientists' impact in subsurface resource development. 

For more about Michael check out these links:

#### [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)

#### Want to Work Together?

I hope this content is helpful to those that want to learn more about subsurface modeling, data analytics and machine learning. Students and working professionals are welcome to participate.

* Want to invite me to visit your company for training, mentoring, project review, workflow design and / or consulting? I'd be happy to drop by and work with you! 

* Interested in partnering, supporting my graduate student research or my Subsurface Data Analytics and Machine Learning consortium (co-PIs including Profs. Foster, Torres-Verdin and van Oort)? My research combines data analytics, stochastic modeling and machine learning theory with practice to develop novel methods and workflows to add value. We are solving challenging subsurface problems!

* I can be reached at mpyrcz@austin.utexas.edu.

I'm always happy to discuss,

*Michael*

Michael Pyrcz, Ph.D., P.Eng. Associate Professor The Hildebrand Department of Petroleum and Geosystems Engineering, Bureau of Economic Geology, The Jackson School of Geosciences, The University of Texas at Austin

#### More Resources Available at: [Twitter](https://twitter.com/geostatsguy) | [GitHub](https://github.com/GeostatsGuy) | [Website](http://michaelpyrcz.com) | [GoogleScholar](https://scholar.google.com/citations?user=QVZ20eQAAAAJ&hl=en&oi=ao) | [Book](https://www.amazon.com/Geostatistical-Reservoir-Modeling-Michael-Pyrcz/dp/0199731446) | [YouTube](https://www.youtube.com/channel/UCLqEr-xV-ceHdXXXrTId5ig)  | [LinkedIn](https://www.linkedin.com/in/michael-pyrcz-61a648a1)
