{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial: Part 1 - Basic operations\n", "\n", "![](img/titanic.jpg)\n", "\n", "## Titanic dataset\n", "\n", "Dataset source: https://www.kaggle.com/c/titanic/data\n", "\n", "Features:\n", "- **PassengerId:** Id of every passenger.\n", "- **Survived:** This feature have value 0 and 1. 0 for not survived and 1 for survived.\n", "- **Pclass:** There are 3 classes of passengers. Class1, Class2 and Class3.\n", "- **Name:** Name of passenger.\n", "- **Sex:** Gender of passenger.\n", "- **Age:** Age of passenger.\n", "- **SibSp:** Indication that passenger have siblings and spouse.\n", "- **Parch:** Whether a passenger is alone or have family.\n", "- **Ticket:** Ticket no of passenger.\n", "- **Fare:** Indicating the fare.\n", "- **Cabin:** The cabin of passenger.\n", "- **Embarked:** The embarked category.\n", "- **Initial:** Initial name of passenger." ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Loading data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas\n", "\n", "titanic = pandas.read_csv('data/titanic.csv.gz')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Exploring the data" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "Sample of the data" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.info()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.describe()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%matplotlib inline\n", "\n", "titanic[['Age', 'Fare']].boxplot();" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Passenger gender" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic['Sex'].head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic['Sex'].value_counts()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic['Sex'].value_counts(normalize=True)" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "from matplotlib import pyplot\n", "\n", "titanic['Sex'].value_counts().plot(kind='bar')\n", "\n", "pyplot.title('Number of Titanic passengers by gender')\n", "pyplot.ylabel('Number of passengers');" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic['Sex'].replace({'male': 'M', 'female': 'F'}).value_counts().plot(kind='bar')\n", "\n", "pyplot.title('Number of Titanic passengers by gender')\n", "pyplot.ylabel('Number of passengers')\n", "pyplot.xticks(rotation=0);" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

EXERCISE 1: Checking where the passengers embarked

\n", " \"\"\n", "

Tasks:\n", "

\n", "

\n", "

Hints:\n", "

\n", "

\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load solutions/titanic_1.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Passenger classes" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic.pivot_table(values='PassengerId', index='Pclass', columns='Survived', aggfunc='count')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "(titanic.assign(Survived=titanic['Survived'].replace({0: 'No', 1: 'Yes'}))\n", " .pivot_table(values='PassengerId', index='Pclass', columns='Survived', aggfunc='count')\n", " .loc[:, ['Yes', 'No']])" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

EXERCISE 2: Checking survival by sex and class

\n", " \"\"\n", "

Tasks:\n", "

\n", "

\n", "

Hints:\n", "

\n", "

\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load solutions/titanic_2.py" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "### Passenger names" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic['Name'].head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "name = 'Futrelle, Mrs. Jacques Heath (Lily May Peel)'\n", "name" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "splitted_names = name.split(',')\n", "splitted_names" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "reversed_splitted_names = splitted_names[::-1]\n", "reversed_splitted_names" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "joined_names = ' '.join(reversed_splitted_names)\n", "joined_names" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "joined_names.strip()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "full_names = (titanic['Name'].str.split(',')\n", " .str[::-1]\n", " .str.join(' ')\n", " .str.strip())\n", "full_names.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "full_names.str.startswith('Rev.').head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "full_names[full_names.str.startswith('Rev.')]" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "titanic[full_names.str.startswith('Rev.')]" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

EXERCISE 3: Who was the Titanic Captain?

\n", " \"\"\n", "

Tasks:\n", "

\n", "

\n", "

Hints:\n", "

\n", "

\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "%load solutions/titanic_3.py" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }