{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial: analysis of pandas docstring errors\n", "\n", "In this tutorial we will perform exploratory data analysis of\n", "the pandas docstring errors.\n", "\n", "We will use two source files obtained from previous tutorials:\n", "\n", "- `docstring_errors_pandas023.hd5`\n", "- `pandas_page_views_2018.parquet`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import os\n", "import pandas\n", "\n", "DOCSTRING_ERRORS_FNAME = os.path.join('data', 'docstring_errors_pandas023.hd5')\n", "PAGE_VIEWS_FNAME = os.path.join('data', 'pandas_page_views_2018.parquet')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Join the two sources of data\n", "\n", "- Load the data for every source\n", "- Transform the \"primary key\" of the sources so they match\n", "- Join both sources into a single `DataFrame`" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Use pandas to answer your questions about the data\n", "\n", "- Discuss what questions you want to get answered\n", "- Use pandas to get the answers for them\n", "- Is more cleaning needed?\n", "- Do you need to reshape the data?\n", "- Do you need to group by it?\n", "- Can you use pandas visualization?" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Solutions" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load solutions/page_views_eda.py" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.3" } }, "nbformat": 4, "nbformat_minor": 2 }