Project A

Using the data files loaded in notebook 4, or other data that you’ve located using a .csv or other, please do the following:

Start a new notebook and create the first cell as a markdown cell. Write your name as a header, and a paragraph of text that describes the dataset, where you found it, and what the different columns represent, i.e. what are your variables. 1. Slice the data by column 2. Slice the data by row using both .loc and .iloc 3. Use the .groupby method to create a grouped set of data 4. Create the following visualizations use the Seaborn Tutorial for help (https://seaborn.pydata.org/tutorial/categorical.html)
- distplot - boxplot - violin_plot - barplot

Write a brief summary of any patterns noticed and differences between categorical distributions.

Additional API Examples

In our first project, we will use datasets obtained through web API’s to write a nice report that includes visualizations, and reproducible code including data. Our options involve using the NYCOpenData portal API or the World Bank Climate Data API.

NYC Open Data

Below, we load a dataset from the NYC Open Data site. You can search for other datasets if you would like, or you may use the city’s recent data on mathematics performance in grades 3 - 8. To begin, we load the requests library, and enter the API Endpoint url from the site. This comes as a JSON or javascript file, so we need to use the read_json method to change this to a Pandas DataFrame.

In [1]:
import requests
In [2]:
math = requests.get('https://data.cityofnewyork.us/resource/uqrh-uk4g.json')
In [5]:
math
Out[5]:
<Response [200]>
In [7]:
math.text[:300]
Out[7]:
'[{"dbn":"01M015","demographic":"Asian","grade":"3","mean_scale_score":"s","num_level_1":"s","num_level_2":"s","num_level_3":"s","num_level_3_and_4":"s","num_level_4":"s","number_tested":"3","pct_level_1":"s","pct_level_2":"s","pct_level_3":"s","pct_level_3_and_4":"s","pct_level_4":"s","year":"2006"}'
In [8]:
import pandas as pd
In [10]:
math = pd.read_json(math.text)
In [11]:
math.head()
Out[11]:
dbn demographic grade mean_scale_score num_level_1 num_level_2 num_level_3 num_level_3_and_4 num_level_4 number_tested pct_level_1 pct_level_2 pct_level_3 pct_level_3_and_4 pct_level_4 year
0 01M015 Asian 3 s s s s s s 3 s s s s s 2006
1 01M015 Black 3 662 0 3 9 9 0 12 0 25 75 75 0 2006
2 01M015 Hispanic 3 670 1 8 10 15 5 24 4.2 33.3 41.7 62.5 20.8 2006
3 01M015 Asian 3 s s s s s s 3 s s s s s 2007
4 01M015 Black 3 s s s s s s 4 s s s s s 2007

Climate Data

The World Bank has an API that allows access to a large amount of climate data. Here is a snippet from the documentation:

About the Climate Data API
The Climate Data API provides programmatic access to most of the climate data used on the World Bank’s Climate Change Knowledge Portal. Web developers can use this API to access the knowledge portal’s data in real time to support their own applications, so long as they abide by the World Bank’s Terms of Use.
In [12]:
url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/CAN.csv'
In [13]:
canada = requests.get(url)
In [18]:
canada
Out[18]:
<Response [200]>
In [19]:
canada.text[:199]
Out[19]:
'year,data\n1901,-7.67241907119751\n1902,-7.862711429595947\n1903,-7.910782814025879\n1904,-8.155729293823242\n1905,-7.547311305999756\n1906,-7.684103488922119\n1907,-8.413553237915039\n1908,-7.79092931747436'
In [25]:
df = pd.read_(canada.text)
---------------------------------------------------------------------------
FileNotFoundError                         Traceback (most recent call last)
<ipython-input-25-009399ea74d6> in <module>()
----> 1 df = pd.read_table(canada.text)

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
    653                     skip_blank_lines=skip_blank_lines)
    654
--> 655         return _read(filepath_or_buffer, kwds)
    656
    657     parser_f.__name__ = name

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
    403
    404     # Create the parser.
--> 405     parser = TextFileReader(filepath_or_buffer, **kwds)
    406
    407     if chunksize or iterator:

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
    762             self.options['has_index_names'] = kwds['has_index_names']
    763
--> 764         self._make_engine(self.engine)
    765
    766     def close(self):

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
    983     def _make_engine(self, engine='c'):
    984         if engine == 'c':
--> 985             self._engine = CParserWrapper(self.f, **self.options)
    986         else:
    987             if engine == 'python':

~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
   1603         kwds['allow_leading_cols'] = self.index_col is not False
   1604
-> 1605         self._reader = parsers.TextReader(src, **kwds)
   1606
   1607         # XXX

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__ (pandas/_libs/parsers.c:4209)()

pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source (pandas/_libs/parsers.c:8873)()

FileNotFoundError: File b'year,data\n1901,-7.67241907119751\n1902,-7.862711429595947\n1903,-7.910782814025879\n1904,-8.155729293823242\n1905,-7.547311305999756\n1906,-7.684103488922119\n1907,-8.413553237915039\n1908,-7.790929317474365\n1909,-8.23930549621582\n1910,-7.774611473083496\n1911,-8.114446640014648\n1912,-7.885402679443359\n1913,-7.987940311431885\n1914,-7.965937614440918\n1915,-7.144039154052734\n1916,-8.132978439331055\n1917,-8.499914169311523\n1918,-8.204662322998047\n1919,-8.035985946655273\n1920,-7.830679893493652\n1921,-7.685777187347412\n1922,-8.334989547729492\n1923,-8.022125244140625\n1924,-7.574568271636963\n1925,-7.951625823974609\n1926,-7.792789459228516\n1927,-7.961727142333984\n1928,-7.237975120544434\n1929,-8.123651504516602\n1930,-7.302305698394775\n1931,-6.646339416503906\n1932,-7.821688652038574\n1933,-8.693134307861328\n1934,-7.964327335357666\n1935,-8.166967391967773\n1936,-8.59422492980957\n1937,-7.3392534255981445\n1938,-6.856348991394043\n1939,-7.669107913970947\n1940,-6.799381256103516\n1941,-7.233104705810547\n1942,-7.097812652587891\n1943,-7.2231879234313965\n1944,-6.603946685791016\n1945,-7.646345615386963\n1946,-7.739509582519531\n1947,-7.161524295806885\n1948,-7.659969329833984\n1949,-7.696352958679199\n1950,-8.517829895019531\n1951,-7.903929710388184\n1952,-6.680769920349121\n1953,-6.7539520263671875\n1954,-7.334064483642578\n1955,-7.586000919342041\n1956,-8.27430534362793\n1957,-7.763300895690918\n1958,-6.903257846832275\n1959,-7.8713836669921875\n1960,-6.951033115386963\n1961,-7.946412086486816\n1962,-7.465360164642334\n1963,-7.363328456878662\n1964,-8.241130828857422\n1965,-8.078269958496094\n1966,-7.83267879486084\n1967,-7.973592281341553\n1968,-7.3681206703186035\n1969,-7.0392913818359375\n1970,-7.720573425292969\n1971,-7.469780921936035\n1972,-9.525187492370605\n1973,-6.853341579437256\n1974,-8.428787231445312\n1975,-7.621856689453125\n1976,-7.588895320892334\n1977,-6.557257652282715\n1978,-7.993335723876953\n1979,-7.845717430114746\n1980,-7.049171447753906\n1981,-5.506665229797363\n1982,-8.5137939453125\n1983,-7.463536262512207\n1984,-7.414198398590088\n1985,-7.432916164398193\n1986,-7.71035099029541\n1987,-6.4537835121154785\n1988,-6.610030174255371\n1989,-7.706485748291016\n1990,-7.6779985427856445\n1991,-7.095147132873535\n1992,-7.697887420654297\n1993,-6.986419677734375\n1994,-6.888780117034912\n1995,-6.850322723388672\n1996,-7.337457180023193\n1997,-6.88342809677124\n1998,-5.186192989349365\n1999,-5.975519180297852\n2000,-6.7265448570251465\n2001,-5.930727958679199\n2002,-6.852164268493652\n2003,-6.402592658996582\n2004,-7.529717445373535\n2005,-5.863758563995361\n2006,-5.543209552764893\n2007,-6.819293975830078\n2008,-7.2008957862854\n2009,-6.997011661529541\n2010,-4.703649520874023\n2011,-5.9335737228393555\n2012,-5.714600563049316\n' does not exist
In [22]:
df.head()
Out[22]:
year data
0 1901 -7.672419
1 1902 -7.862711
2 1903 -7.910783
3 1904 -8.155729
4 1905 -7.547311
In [26]:
frame = pd.DataFrame(canada.text)
---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-26-9d5b746d4789> in <module>()
----> 1 frame = pd.DataFrame(canada.text)

~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
    352                                          copy=False)
    353             else:
--> 354                 raise ValueError('DataFrame constructor not properly called!')
    355
    356         NDFrame.__init__(self, mgr, fastpath=True)

ValueError: DataFrame constructor not properly called!
In [29]:
canada.text
Out[29]:
'year,data\n1901,-7.67241907119751\n1902,-7.862711429595947\n1903,-7.910782814025879\n1904,-8.155729293823242\n1905,-7.547311305999756\n1906,-7.684103488922119\n1907,-8.413553237915039\n1908,-7.790929317474365\n1909,-8.23930549621582\n1910,-7.774611473083496\n1911,-8.114446640014648\n1912,-7.885402679443359\n1913,-7.987940311431885\n1914,-7.965937614440918\n1915,-7.144039154052734\n1916,-8.132978439331055\n1917,-8.499914169311523\n1918,-8.204662322998047\n1919,-8.035985946655273\n1920,-7.830679893493652\n1921,-7.685777187347412\n1922,-8.334989547729492\n1923,-8.022125244140625\n1924,-7.574568271636963\n1925,-7.951625823974609\n1926,-7.792789459228516\n1927,-7.961727142333984\n1928,-7.237975120544434\n1929,-8.123651504516602\n1930,-7.302305698394775\n1931,-6.646339416503906\n1932,-7.821688652038574\n1933,-8.693134307861328\n1934,-7.964327335357666\n1935,-8.166967391967773\n1936,-8.59422492980957\n1937,-7.3392534255981445\n1938,-6.856348991394043\n1939,-7.669107913970947\n1940,-6.799381256103516\n1941,-7.233104705810547\n1942,-7.097812652587891\n1943,-7.2231879234313965\n1944,-6.603946685791016\n1945,-7.646345615386963\n1946,-7.739509582519531\n1947,-7.161524295806885\n1948,-7.659969329833984\n1949,-7.696352958679199\n1950,-8.517829895019531\n1951,-7.903929710388184\n1952,-6.680769920349121\n1953,-6.7539520263671875\n1954,-7.334064483642578\n1955,-7.586000919342041\n1956,-8.27430534362793\n1957,-7.763300895690918\n1958,-6.903257846832275\n1959,-7.8713836669921875\n1960,-6.951033115386963\n1961,-7.946412086486816\n1962,-7.465360164642334\n1963,-7.363328456878662\n1964,-8.241130828857422\n1965,-8.078269958496094\n1966,-7.83267879486084\n1967,-7.973592281341553\n1968,-7.3681206703186035\n1969,-7.0392913818359375\n1970,-7.720573425292969\n1971,-7.469780921936035\n1972,-9.525187492370605\n1973,-6.853341579437256\n1974,-8.428787231445312\n1975,-7.621856689453125\n1976,-7.588895320892334\n1977,-6.557257652282715\n1978,-7.993335723876953\n1979,-7.845717430114746\n1980,-7.049171447753906\n1981,-5.506665229797363\n1982,-8.5137939453125\n1983,-7.463536262512207\n1984,-7.414198398590088\n1985,-7.432916164398193\n1986,-7.71035099029541\n1987,-6.4537835121154785\n1988,-6.610030174255371\n1989,-7.706485748291016\n1990,-7.6779985427856445\n1991,-7.095147132873535\n1992,-7.697887420654297\n1993,-6.986419677734375\n1994,-6.888780117034912\n1995,-6.850322723388672\n1996,-7.337457180023193\n1997,-6.88342809677124\n1998,-5.186192989349365\n1999,-5.975519180297852\n2000,-6.7265448570251465\n2001,-5.930727958679199\n2002,-6.852164268493652\n2003,-6.402592658996582\n2004,-7.529717445373535\n2005,-5.863758563995361\n2006,-5.543209552764893\n2007,-6.819293975830078\n2008,-7.2008957862854\n2009,-6.997011661529541\n2010,-4.703649520874023\n2011,-5.9335737228393555\n2012,-5.714600563049316\n'

Using the Documentation

Seems this is not so easy. Luckily, the climate data is also available as part of the wbdata package. Use the documentation to pull and analyze data related to Climate indicators, or a different choice using the documentation at: http://wbdata.readthedocs.io/en/latest/.