{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Introduction: Poisson Process and Poisson Distribution\n", "\n", "In this notebook, we'll look at a Poisson process and model both the probability of the expected number of events and the waiting time between events. Poisson processes occur frequently in real life (or many phenomonenon can be approximated by a Poisson process) and provide a relatively simple distribution to explore. " ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# Poisson Process: Observing shooting stars\n", "\n", "We'll work through the following Poisson Process:\n", "\n", "The average time between shooting stars = 12 minutes (5 meteors / hour). " ] }, { "cell_type": "code", "execution_count": 1, "metadata": { "ExecuteTime": { "end_time": "2019-01-20T22:28:29.161934Z", "start_time": "2019-01-20T22:28:27.550065Z" } }, "outputs": [ { "data": { "text/html": [ "" ], "text/vnd.plotly.v1+html": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "# Standard data science\n", "import pandas as pd\n", "import numpy as np\n", "\n", "np.random.seed(42)\n", "\n", "# Display all cell outputs\n", "from IPython.core.interactiveshell import InteractiveShell\n", "InteractiveShell.ast_node_interactivity = 'all'\n", "\n", "# Visualizations\n", "import plotly.plotly as py\n", "import plotly.graph_objs as go\n", "from plotly.offline import iplot\n", "\n", "# Cufflinks for dataframes\n", "import cufflinks as cf\n", "cf.go_offline()\n", "cf.set_config_file(world_readable=True, theme='pearl')" ] }, { "cell_type": "code", "execution_count": 2, "metadata": { "ExecuteTime": { "end_time": "2019-01-20T22:28:29.166993Z", "start_time": "2019-01-20T22:28:29.164623Z" } }, "outputs": [], "source": [ "from scipy.special import factorial" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "The only parameter of the Poisson distribution is $\\lambda$, the event rate, (or rate parameter). This represents the expected number of events in an interval. If we have a rate in events / time, we can get to the expected events by multiplying the time.\n", "\n", "We find the probability of a number of events, k, in an interval using the Poisson Probability Density Function:\n", "\n", "$$P(k{\\text{ events in interval}})=e^{-\\lambda }{\\frac {\\lambda ^{k}}{k!}}$$\n", "\n", "Let's work through an example." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Poisson Probabilities" ] }, { "cell_type": "code", "execution_count": 3, "metadata": { "ExecuteTime": { "end_time": "2019-01-20T22:28:29.172786Z", "start_time": "2019-01-20T22:28:29.168539Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The probability of 3 meteors in 60 minutes is 14.04%.\n" ] } ], "source": [ "events_per_minute = 1/12\n", "minutes = 60\n", "\n", "# Expected events\n", "lam = events_per_minute * minutes\n", "\n", "k = 3\n", "p_k = np.exp(-lam) * np.power(lam, k) / factorial(k)\n", "print(f'The probability of {k} meteors in {minutes} minutes is {100*p_k:.2f}%.')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can do the same calculation by simulating 10,000 hours." ] }, { "cell_type": "code", "execution_count": 4, "metadata": { "ExecuteTime": { "end_time": "2019-01-20T22:28:29.183418Z", "start_time": "2019-01-20T22:28:29.174178Z" } }, "outputs": [ { "data": { "text/plain": [ "0.1377" ] }, "execution_count": 4, "metadata": {}, "output_type": "execute_result" } ], "source": [ "x = np.random.poisson(lam, 10000)\n", "(x == 3).mean()" ] }, { "cell_type": "markdown", "metadata": { "ExecuteTime": { "end_time": "2019-01-20T02:07:31.875121Z", "start_time": "2019-01-20T02:07:31.585896Z" } }, "source": [ "Let's write a quick function to calculate probabilities for a number of events. " ] }, { "cell_type": "code", "execution_count": 5, "metadata": { "ExecuteTime": { "end_time": "2019-01-20T22:28:29.190041Z", "start_time": "2019-01-20T22:28:29.185512Z" } }, "outputs": [ { "data": { "text/plain": [ "0.14037389581428056" ] }, "execution_count": 5, "metadata": {}, "output_type": "execute_result" } ], "source": [ "def calc_prob(events_per_minute, minutes, k):\n", " # Calculate probability of k events in specified number of minutes\n", " lam = events_per_minute * minutes\n", " return np.exp(-lam) * np.power(lam, k) / factorial(k)\n", " \n", "calc_prob(events_per_minute, minutes, 3)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "We can use this function to generate a distribution of probabilities for different numbers of events." ] }, { "cell_type": "code", "execution_count": 6, "metadata": { "ExecuteTime": { "end_time": "2019-01-20T22:28:29.195580Z", "start_time": "2019-01-20T22:28:29.191745Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The most likely value is 4 with probability 0.1755\n" ] } ], "source": [ "# Different numbers\n", "ns = np.arange(12)\n", "p_n = calc_prob(events_per_minute, minutes, ns)\n", "\n", "print(f'The most likely value is {np.argmax(p_n)} with probability {np.max(p_n):.4f}')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "It turns out in this situation, 4 and 5 events have the exact same probability (because our rate parameter is a whole number)." ] }, { "cell_type": "code", "execution_count": 7, "metadata": { "ExecuteTime": { "end_time": "2019-01-20T22:28:29.200635Z", "start_time": "2019-01-20T22:28:29.196894Z" } }, "outputs": [ { "data": { "text/plain": [ "array([0.17546737, 0.17546737])" ] }, "execution_count": 7, "metadata": {}, "output_type": "execute_result" } ], "source": [ "p_n[4:6]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Poisson Distribution\n", "\n", "To show the distribution, we plot the probabiliy on the y-axis versus the number of events on the x-axis. This represents the probability density function of the Poisson process." ] }, { "cell_type": "code", "execution_count": 8, "metadata": { "ExecuteTime": { "end_time": "2019-01-20T22:28:29.207265Z", "start_time": "2019-01-20T22:28:29.202112Z" } }, "outputs": [], "source": [ "def plot_pdf(x, p_x, title=''):\n", " # Plot PDF of Poisson distribution\n", " df = pd.DataFrame({'x': x, 'y': p_x})\n", " print(f'The most likely value is {np.argmax(p_x)} with probability {np.max(p_x):.4f}')\n", " annotations = [dict(x=x, y=y+0.01, text=f'{y:.2f}', \n", " showarrow=False, textangle=0) for x, y in zip(df['x'], df['y'])]\n", " df.iplot(kind='scatter', mode='markers+lines',\n", " x='x', y='y', xTitle='Number of Events',\n", " yTitle='Probability', annotations=annotations,\n", " title=title)" ] }, { "cell_type": "code", "execution_count": 9, "metadata": { "ExecuteTime": { "end_time": "2019-01-20T22:28:30.101912Z", "start_time": "2019-01-20T22:28:29.208588Z" } }, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "The most likely value is 4 with probability 0.1755\n" ] }, { "data": { "application/vnd.plotly.v1+json": { "config": { "linkText": "Export to plot.ly", "plotlyServerURL": "https://plot.ly", "showLink": true }, "data": [ { "line": { "color": "rgba(255, 153, 51, 1.0)", "dash": "solid", "shape": "linear", "width": 1.3 }, "marker": { "size": 12, "symbol": "circle" }, "mode": "markers+lines", "name": "y", "text": "", "type": "scatter", "uid": "5e03f831-b1fb-4e4f-9ae9-63496e0c10cb", "x": [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11 ], "y": [ 0.006737946999085467, 0.03368973499542734, 0.08422433748856833, 0.14037389581428056, 0.1754673697678507, 0.1754673697678507, 0.1462228081398756, 0.104444862957054, 0.06527803934815875, 0.03626557741564375, 0.01813278870782187, 0.00824217668537358 ] } ], "layout": { "annotations": [ { "arrowcolor": "#9499A3", "arrowhead": 7, "ax": 0, "ay": -100, "font": { "color": "#4D5663", "size": 12 }, "showarrow": false, "text": "0.01", "textangle": 0, "x": 0, "xref": "x", "y": 0.016737946999085467, "yref": "y" }, { "arrowcolor": "#9499A3", "arrowhead": 7, "ax": 0, "ay": -100, "font": { "color": "#4D5663", "size": 12 }, "showarrow": false, "text": "0.03", "textangle": 0, "x": 1, "xref": "x", "y": 0.04368973499542734, "yref": "y" }, { "arrowcolor": "#9499A3", "arrowhead": 7, "ax": 0, "ay": -100, "font": { "color": "#4D5663", "size": 12 }, "showarrow": false, "text": "0.08", "textangle": 0, "x": 2, "xref": "x", "y": 0.09422433748856833, "yref": "y" }, { "arrowcolor": "#9499A3", "arrowhead": 7, "ax": 0, "ay": -100, "font": { "color": "#4D5663", "size": 12 }, "showarrow": false, "text": "0.14", "textangle": 0, "x": 3, "xref": "x", "y": 0.15037389581428057, "yref": "y" }, { "arrowcolor": "#9499A3", "arrowhead": 7, "ax": 0, "ay": -100, "font": { "color": "#4D5663", "size": 12 }, "showarrow": false, "text": "0.18", "textangle": 0, "x": 4, "xref": "x", "y": 0.18546736976785072, "yref": "y" }, { "arrowcolor": "#9499A3", "arrowhead": 7, "ax": 0, "ay": -100, "font": { "color": "#4D5663", "size": 12 }, "showarrow": false, "text": "0.18", "textangle": 0, "x": 5, "xref": "x", "y": 0.18546736976785072, "yref": "y" }, { "arrowcolor": "#9499A3", "arrowhead": 7, "ax": 0, "ay": -100, "font": { "color": "#4D5663", "size": 12 }, "showarrow": false, "text": "0.15", "textangle": 0, "x": 6, "xref": "x", "y": 0.1562228081398756, "yref": "y" }, { "arrowcolor": "#9499A3", "arrowhead": 7, "ax": 0, "ay": -100, "font": { "color": "#4D5663", "size": 12 }, "showarrow": false, "text": "0.10", "textangle": 0, "x": 7, "xref": "x", "y": 0.114444862957054, "yref": "y" }, { "arrowcolor": "#9499A3", "arrowhead": 7, "ax": 0, "ay": -100, "font": { "color": "#4D5663", "size": 12 }, "showarrow": false, "text": "0.07", "textangle": 0, "x": 8, "xref": "x", "y": 0.07527803934815874, "yref": "y" }, { "arrowcolor": "#9499A3", "arrowhead": 7, "ax": 0, "ay": -100, "font": { "color": "#4D5663", "size": 12 }, "showarrow": false, "text": "0.04", "textangle": 0, "x": 9, "xref": "x", "y": 0.04626557741564375, "yref": "y" }, { "arrowcolor": "#9499A3", "arrowhead": 7, "ax": 0, "ay": -100, "font": { "color": "#4D5663", "size": 12 }, "showarrow": false, "text": "0.02", "textangle": 0, "x": 10, "xref": "x", "y": 0.02813278870782187, "yref": "y" }, { "arrowcolor": "#9499A3", "arrowhead": 7, "ax": 0, "ay": -100, "font": { "color": "#4D5663", "size": 12 }, "showarrow": false, "text": "0.01", "textangle": 0, "x": 11, "xref": "x", "y": 0.01824217668537358, "yref": "y" } ], "legend": { "bgcolor": "#F5F6F9", "font": { "color": "#4D5663" } }, "paper_bgcolor": "#F5F6F9", "plot_bgcolor": "#F5F6F9", "title": "Probability of Number of Meteors in One Hour", "titlefont": { "color": "#4D5663" }, "xaxis": { "gridcolor": "#E1E5ED", "showgrid": true, "tickfont": { "color": "#4D5663" }, "title": "Number of Events", "titlefont": { "color": "#4D5663" }, "zerolinecolor": "#E1E5ED" }, "yaxis": { "gridcolor": "#E1E5ED", "showgrid": true, "tickfont": { "color": "#4D5663" }, "title": "Probability", "titlefont": { "color": "#4D5663" }, "zerolinecolor": "#E1E5ED" } } }, "text/html": [ "
" ], "text/vnd.plotly.v1+html": [ "" ] }, "metadata": {}, "output_type": "display_data" } ], "source": [ "plot_pdf(ns, p_n, title='Probability of Number of Meteors in One Hour')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Distribution with Differing Rates\n", "\n", "Let's make the probability density function with differing numbers of meteors per hour." ] }, { "cell_type": "code", "execution_count": 10, "metadata": { "ExecuteTime": { "end_time": "2019-01-20T22:28:30.108956Z", "start_time": "2019-01-20T22:28:30.103234Z" } }, "outputs": [], "source": [ "def plot_different_rates(events_per_minute, minutes, ns, title=''):\n", " df = pd.DataFrame()\n", " annotations=[]\n", " colors = ['orange', 'green', 'red', 'blue', 'purple', 'brown']\n", " for i, events in enumerate(events_per_minute):\n", " probs = calc_prob(events, minutes, ns)\n", " annotations.append(dict(x=np.argmax(probs)+1, y=np.max(probs)+0.025, \n", " text=f'{int(events * minutes)} MPH