{ "cells": [ { "cell_type": "markdown", "metadata": {}, "source": [ "# Tutorial: Part 3 - Loading and merging data\n", "\n", "![](img/earth.jpg)\n", "\n", "## GeoNames dataset\n", "\n", "Dataset source: http://download.geonames.org/export/dump/\n", "\n", "Features:\n", "- **geonameid:** integer id of record in geonames database\n", "- **name:** name of geographical point (utf8) varchar(200)\n", "- **asciiname:** name of geographical point in plain ascii characters, varchar(200)\n", "- **alternatenames:** alternatenames, comma separated, ascii names automatically transliterated, convenience attribute from alternatename table, varchar(10000)\n", "- **latitude:** latitude in decimal degrees (wgs84)\n", "- **longitude:** longitude in decimal degrees (wgs84)\n", "- **feature class:** see http://www.geonames.org/export/codes.html, char(1)\n", "- **feature code:** see http://www.geonames.org/export/codes.html, varchar(10)\n", "- **country code:** ISO-3166 2-letter country code, 2 characters\n", "- **cc2:** alternate country codes, comma separated, ISO-3166 2-letter country code, 200 characters\n", "- **admin1 code:** fipscode (subject to change to iso code), see exceptions below, see file admin1Codes.txt for display names of this code; varchar(20)\n", "- **admin2 code:** code for the second administrative division, a county in the US, see file admin2Codes.txt; varchar(80) \n", "- **admin3 code:** code for third level administrative division, varchar(20)\n", "- **admin4 code:** code for fourth level administrative division, varchar(20)\n", "- **population:** bigint (8 byte int) \n", "- **elevation:** in meters, integer\n", "- **dem:** digital elevation model, srtm3 or gtopo30, average elevation of 3''x3'' (ca 90mx90m) or 30''x30'' (ca 900mx900m) area in meters, integer. srtm processed by cgiar/ciat.\n", "- **timezone:** the iana timezone id (see file timeZone.txt) varchar(40)\n", "- **modification date:** date of last modification in yyyy-MM-dd format" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "import pandas\n", "\n", "cities = pandas.read_csv('data/cities1000.zip',\n", " sep='\\t',\n", " names=['geonameid', 'name', 'asciiname', 'alternatenames',\n", " 'latitude', 'longitude', 'feature class',\n", " 'feature code', 'country code', 'cc2',\n", " 'admin1 code', 'admin2 code', 'admin3 code', 'admin4 code',\n", " 'population', 'elevation', 'dem', 'timezone', 'modification date'],\n", " index_col='geonameid',\n", " parse_dates=['modification date'],\n", " low_memory=False)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "
\n", "

EXERCICE: Explore the dataset

\n", "

Tasks:\n", "

\n", "

\n", "

Hints:\n", "

\n", "

\n", "
" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cities.info()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cities.nlargest(n=5, columns='population')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "cities.nlargest(n=20, columns='elevation')" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "https://github.com/mledoze/countries" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "countries = pandas.read_json('data/countries.json.gz')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "countries.head()" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [ "pandas.read_table('http://download.geonames.org/export/dump/featureCodes_en.txt',\n", " names=('code', 'name', 'description'), index_col='code')" ] }, { "cell_type": "code", "execution_count": null, "metadata": {}, "outputs": [], "source": [] } ], "metadata": { "kernelspec": { "display_name": "Python 3", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.0" } }, "nbformat": 4, "nbformat_minor": 2 }