# Tools for analyzing Medium article statistics The Medium stats Python toolkit is a suite of tools for retrieving, analyzing, predicting, and visualizing your Medium article stats. You can also run on my Medium statistics which are located in `data/` * Note: running on Mac may first require setting `export OBJC_DISABLE_INITIALIZE_FORK_SAFETY=YES` from the command line [to enable parallel processing](https://stackoverflow.com/questions/50168647/multiprocessing-causes-python-to-crash-and-gives-an-error-may-have-been-in-progr) * For complete usage refer to `Medium Stats Analysis` * Data retrieval code lives in `retrieval.py` * Visualization and analysis code is in `visuals.py` * See also the Medium article ["Medium Analysis in Python"](https://medium.com/@williamkoehrsen/analyzing-medium-story-stats-with-python-24c6491a8ff0) * Contributions are welcome and appreciated * For help contact wjk68@case.edu or twitter.com/@koehrsen_will ## Basic usage ### Use your own Medium statistics 1. Go to the stats page https://medium.com/me/stats 2. Scroll all the way down to the bottom so all the articles are loaded 3. Right click, and hit 'save as' 4. Save the file as `stats.html` in the `data/` directory. You can also save the responses to do a similar analysis. ![](images/stats-saving-medium.gif) If you don't do this, you can still go to the next step and use the provided data! ## Retrieving Statistics * Open up a Jupyter Notebook or Python terminal in the `medium/` directory and run ``` from retrieval import get_data df = get_data(fname='stats.html') ``` ## Analysis and Visualization * Interactive plots are not rendered on GitHub. To view the plots with their full capability, use NBviewer ([`Medium Stats Analysis` on NBviewer](https://nbviewer.jupyter.org/github/WillKoehrsen/Data-Analysis/blob/master/medium/Medium%20Stats%20Analysis.ipynb)) * All plots can be opened in the plotly online editor to finish up for publication * __Histogram__: `make_hist(df, x, category=None)` * __Cumulative plot__: `make_cum_plot(df, y, category=None, ranges=False)` * __Scatter plots__: `make_scatter_plot(df, x, y, fits=None, xlog=False, ylog=False, category=None, scale=None, sizeref=2, annotations=None, ranges=False, title_override=None)` * __Scatter plot with three variables__: pass in `category` or `scale` to `make_scatter_plot` * __Univariate Linear Regression__: `make_linear_regression(df, x, y, intercept_0)` * __Univariate polynomial fitting__: `make_poly_fits(df, x, y, degree=6)` * __Multivariate Linear Regression__: pass in list of `x` to `make_linear_regression` * __Future extrapolation__: `make_extrapolation(df, y, years, degree=4)` * More methods will be coming soon! * Submit pull requests with your own code, or open issues for suggestions!