Project A¶

Using the data files loaded in notebook 4, or other data that you’ve located using a .csv or other, please do the following:

Start a new notebook and create the first cell as a markdown cell. Write your name as a header, and a paragraph of text that describes the dataset, where you found it, and what the different columns represent, i.e. what are your variables. 1. Slice the data by column 2. Slice the data by row using both .loc and .iloc 3. Use the .groupby method to create a grouped set of data 4. Create the following visualizations use the Seaborn Tutorial for help (https://seaborn.pydata.org/tutorial/categorical.html)

- distplot - boxplot - violin_plot - barplot

Write a brief summary of any patterns noticed and differences between categorical distributions.

Additional API Examples¶

In our first project, we will use datasets obtained through web API’s to write a nice report that includes visualizations, and reproducible code including data. Our options involve using the NYCOpenData portal API or the World Bank Climate Data API.

NYC Open Data¶

Below, we load a dataset from the NYC Open Data site. You can search for other datasets if you would like, or you may use the city’s recent data on mathematics performance in grades 3 - 8. To begin, we load the requests library, and enter the API Endpoint url from the site. This comes as a JSON or javascript file, so we need to use the read_json method to change this to a Pandas DataFrame.

In [1]:

import requests

In [2]:

math = requests.get('https://data.cityofnewyork.us/resource/uqrh-uk4g.json')

In [5]:

math

Out[5]:

<Response [200]>

In [7]:

math.text[:300]

Out[7]:

'[{"dbn":"01M015","demographic":"Asian","grade":"3","mean_scale_score":"s","num_level_1":"s","num_level_2":"s","num_level_3":"s","num_level_3_and_4":"s","num_level_4":"s","number_tested":"3","pct_level_1":"s","pct_level_2":"s","pct_level_3":"s","pct_level_3_and_4":"s","pct_level_4":"s","year":"2006"}'

In [8]:

import pandas as pd

In [10]:

math = pd.read_json(math.text)

In [11]:

math.head()

Out[11]:

	dbn	demographic	grade	mean_scale_score	num_level_1	num_level_2	num_level_3	num_level_3_and_4	num_level_4	number_tested	pct_level_1	pct_level_2	pct_level_3	pct_level_3_and_4	pct_level_4	year
0	01M015	Asian	3	s	s	s	s	s	s	3	s	s	s	s	s	2006
1	01M015	Black	3	662	0	3	9	9	0	12	0	25	75	75	0	2006
2	01M015	Hispanic	3	670	1	8	10	15	5	24	4.2	33.3	41.7	62.5	20.8	2006
3	01M015	Asian	3	s	s	s	s	s	s	3	s	s	s	s	s	2007
4	01M015	Black	3	s	s	s	s	s	s	4	s	s	s	s	s	2007

Climate Data¶

The World Bank has an API that allows access to a large amount of climate data. Here is a snippet from the documentation:

About the Climate Data API

The Climate Data API provides programmatic access to most of the climate data used on the World Bank’s Climate Change Knowledge Portal. Web developers can use this API to access the knowledge portal’s data in real time to support their own applications, so long as they abide by the World Bank’s Terms of Use.

In [12]:

url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/CAN.csv'

In [13]:

canada = requests.get(url)

In [18]:

canada

Out[18]:

<Response [200]>

In [19]:

canada.text[:199]

Out[19]:

'year,data\n1901,-7.67241907119751\n1902,-7.862711429595947\n1903,-7.910782814025879\n1904,-8.155729293823242\n1905,-7.547311305999756\n1906,-7.684103488922119\n1907,-8.413553237915039\n1908,-7.79092931747436'

In [25]:

df = pd.read_(canada.text)

---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-25-009399ea74d6> in <module>()
----> 1 df = pd.read_table(canada.text)

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    653                     skip_blank_lines=skip_blank_lines)
    654
--> 655         return _read(filepath_or_buffer, kwds)
    656
    657     parser_f.__name__ = name

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    403
    404     # Create the parser.
--> 405     parser = TextFileReader(filepath_or_buffer, **kwds)
    406
    407     if chunksize or iterator:

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    762             self.options['has_index_names'] = kwds['has_index_names']
    763
--> 764         self._make_engine(self.engine)
    765
    766     def close(self):

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
    983     def _make_engine(self, engine='c'):
    984         if engine == 'c':
--> 985             self._engine = CParserWrapper(self.f, **self.options)
    986         else:
    987             if engine == 'python':

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1603         kwds['allow_leading_cols'] = self.index_col is not False
   1604
-> 1605         self._reader = parsers.TextReader(src, **kwds)
   1606
   1607         # XXX

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__ (pandas/_libs/parsers.c:4209)()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source (pandas/_libs/parsers.c:8873)()

FileNotFoundError: File b'year,data\n1901,-7.67241907119751\n1902,-7.862711429595947\n1903,-7.910782814025879\n1904,-8.155729293823242\n1905,-7.547311305999756\n1906,-7.684103488922119\n1907,-8.413553237915039\n1908,-7.790929317474365\n1909,-8.23930549621582\n1910,-7.774611473083496\n1911,-8.114446640014648\n1912,-7.885402679443359\n1913,-7.987940311431885\n1914,-7.965937614440918\n1915,-7.144039154052734\n1916,-8.132978439331055\n1917,-8.499914169311523\n1918,-8.204662322998047\n1919,-8.035985946655273\n1920,-7.830679893493652\n1921,-7.685777187347412\n1922,-8.334989547729492\n1923,-8.022125244140625\n1924,-7.574568271636963\n1925,-7.951625823974609\n1926,-7.792789459228516\n1927,-7.961727142333984\n1928,-7.237975120544434\n1929,-8.123651504516602\n1930,-7.302305698394775\n1931,-6.646339416503906\n1932,-7.821688652038574\n1933,-8.693134307861328\n1934,-7.964327335357666\n1935,-8.166967391967773\n1936,-8.59422492980957\n1937,-7.3392534255981445\n1938,-6.856348991394043\n1939,-7.669107913970947\n1940,-6.799381256103516\n1941,-7.233104705810547\n1942,-7.097812652587891\n1943,-7.2231879234313965\n1944,-6.603946685791016\n1945,-7.646345615386963\n1946,-7.739509582519531\n1947,-7.161524295806885\n1948,-7.659969329833984\n1949,-7.696352958679199\n1950,-8.517829895019531\n1951,-7.903929710388184\n1952,-6.680769920349121\n1953,-6.7539520263671875\n1954,-7.334064483642578\n1955,-7.586000919342041\n1956,-8.27430534362793\n1957,-7.763300895690918\n1958,-6.903257846832275\n1959,-7.8713836669921875\n1960,-6.951033115386963\n1961,-7.946412086486816\n1962,-7.465360164642334\n1963,-7.363328456878662\n1964,-8.241130828857422\n1965,-8.078269958496094\n1966,-7.83267879486084\n1967,-7.973592281341553\n1968,-7.3681206703186035\n1969,-7.0392913818359375\n1970,-7.720573425292969\n1971,-7.469780921936035\n1972,-9.525187492370605\n1973,-6.853341579437256\n1974,-8.428787231445312\n1975,-7.621856689453125\n1976,-7.588895320892334\n1977,-6.557257652282715\n1978,-7.993335723876953\n1979,-7.845717430114746\n1980,-7.049171447753906\n1981,-5.506665229797363\n1982,-8.5137939453125\n1983,-7.463536262512207\n1984,-7.414198398590088\n1985,-7.432916164398193\n1986,-7.71035099029541\n1987,-6.4537835121154785\n1988,-6.610030174255371\n1989,-7.706485748291016\n1990,-7.6779985427856445\n1991,-7.095147132873535\n1992,-7.697887420654297\n1993,-6.986419677734375\n1994,-6.888780117034912\n1995,-6.850322723388672\n1996,-7.337457180023193\n1997,-6.88342809677124\n1998,-5.186192989349365\n1999,-5.975519180297852\n2000,-6.7265448570251465\n2001,-5.930727958679199\n2002,-6.852164268493652\n2003,-6.402592658996582\n2004,-7.529717445373535\n2005,-5.863758563995361\n2006,-5.543209552764893\n2007,-6.819293975830078\n2008,-7.2008957862854\n2009,-6.997011661529541\n2010,-4.703649520874023\n2011,-5.9335737228393555\n2012,-5.714600563049316\n' does not exist

In [22]:

df.head()

Out[22]:

	year	data
0	1901	-7.672419
1	1902	-7.862711
2	1903	-7.910783
3	1904	-8.155729
4	1905	-7.547311

In [26]:

frame = pd.DataFrame(canada.text)

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-9d5b746d4789> in <module>()
----> 1 frame = pd.DataFrame(canada.text)

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    352                                          copy=False)
    353             else:
--> 354                 raise ValueError('DataFrame constructor not properly called!')
    355
    356         NDFrame.__init__(self, mgr, fastpath=True)

ValueError: DataFrame constructor not properly called!

In [29]:

canada.text

Out[29]:

'year,data\n1901,-7.67241907119751\n1902,-7.862711429595947\n1903,-7.910782814025879\n1904,-8.155729293823242\n1905,-7.547311305999756\n1906,-7.684103488922119\n1907,-8.413553237915039\n1908,-7.790929317474365\n1909,-8.23930549621582\n1910,-7.774611473083496\n1911,-8.114446640014648\n1912,-7.885402679443359\n1913,-7.987940311431885\n1914,-7.965937614440918\n1915,-7.144039154052734\n1916,-8.132978439331055\n1917,-8.499914169311523\n1918,-8.204662322998047\n1919,-8.035985946655273\n1920,-7.830679893493652\n1921,-7.685777187347412\n1922,-8.334989547729492\n1923,-8.022125244140625\n1924,-7.574568271636963\n1925,-7.951625823974609\n1926,-7.792789459228516\n1927,-7.961727142333984\n1928,-7.237975120544434\n1929,-8.123651504516602\n1930,-7.302305698394775\n1931,-6.646339416503906\n1932,-7.821688652038574\n1933,-8.693134307861328\n1934,-7.964327335357666\n1935,-8.166967391967773\n1936,-8.59422492980957\n1937,-7.3392534255981445\n1938,-6.856348991394043\n1939,-7.669107913970947\n1940,-6.799381256103516\n1941,-7.233104705810547\n1942,-7.097812652587891\n1943,-7.2231879234313965\n1944,-6.603946685791016\n1945,-7.646345615386963\n1946,-7.739509582519531\n1947,-7.161524295806885\n1948,-7.659969329833984\n1949,-7.696352958679199\n1950,-8.517829895019531\n1951,-7.903929710388184\n1952,-6.680769920349121\n1953,-6.7539520263671875\n1954,-7.334064483642578\n1955,-7.586000919342041\n1956,-8.27430534362793\n1957,-7.763300895690918\n1958,-6.903257846832275\n1959,-7.8713836669921875\n1960,-6.951033115386963\n1961,-7.946412086486816\n1962,-7.465360164642334\n1963,-7.363328456878662\n1964,-8.241130828857422\n1965,-8.078269958496094\n1966,-7.83267879486084\n1967,-7.973592281341553\n1968,-7.3681206703186035\n1969,-7.0392913818359375\n1970,-7.720573425292969\n1971,-7.469780921936035\n1972,-9.525187492370605\n1973,-6.853341579437256\n1974,-8.428787231445312\n1975,-7.621856689453125\n1976,-7.588895320892334\n1977,-6.557257652282715\n1978,-7.993335723876953\n1979,-7.845717430114746\n1980,-7.049171447753906\n1981,-5.506665229797363\n1982,-8.5137939453125\n1983,-7.463536262512207\n1984,-7.414198398590088\n1985,-7.432916164398193\n1986,-7.71035099029541\n1987,-6.4537835121154785\n1988,-6.610030174255371\n1989,-7.706485748291016\n1990,-7.6779985427856445\n1991,-7.095147132873535\n1992,-7.697887420654297\n1993,-6.986419677734375\n1994,-6.888780117034912\n1995,-6.850322723388672\n1996,-7.337457180023193\n1997,-6.88342809677124\n1998,-5.186192989349365\n1999,-5.975519180297852\n2000,-6.7265448570251465\n2001,-5.930727958679199\n2002,-6.852164268493652\n2003,-6.402592658996582\n2004,-7.529717445373535\n2005,-5.863758563995361\n2006,-5.543209552764893\n2007,-6.819293975830078\n2008,-7.2008957862854\n2009,-6.997011661529541\n2010,-4.703649520874023\n2011,-5.9335737228393555\n2012,-5.714600563049316\n'

Using the Documentation¶

Seems this is not so easy. Luckily, the climate data is also available as part of the wbdata package. Use the documentation to pull and analyze data related to Climate indicators, or a different choice using the documentation at: http://wbdata.readthedocs.io/en/latest/.