Project A¶
Using the data files loaded in notebook 4, or other data that you’ve located using a .csv or other, please do the following:
.loc
and .iloc
3. Use the
.groupby
method to create a grouped set of data 4. Create the
following visualizations use the Seaborn Tutorial for help
(https://seaborn.pydata.org/tutorial/categorical.html)distplot
- boxplot
- violin_plot
- barplot
Write a brief summary of any patterns noticed and differences between categorical distributions.
Additional API Examples¶
In our first project, we will use datasets obtained through web API’s to
write a nice report that includes visualizations, and reproducible code
including data. Our options involve using the NYCOpenData
portal API
or the World Bank Climate Data API.
NYC Open Data¶
Below, we load a dataset from the NYC Open Data site. You can search for
other datasets if you would like, or you may use the city’s recent data
on mathematics performance in grades 3 - 8. To begin, we load the
requests
library, and enter the API Endpoint url from the site. This
comes as a JSON or javascript file, so we need to use the read_json
method to change this to a Pandas DataFrame.
In [1]:
import requests
In [2]:
math = requests.get('https://data.cityofnewyork.us/resource/uqrh-uk4g.json')
In [5]:
math
Out[5]:
<Response [200]>
In [7]:
math.text[:300]
Out[7]:
'[{"dbn":"01M015","demographic":"Asian","grade":"3","mean_scale_score":"s","num_level_1":"s","num_level_2":"s","num_level_3":"s","num_level_3_and_4":"s","num_level_4":"s","number_tested":"3","pct_level_1":"s","pct_level_2":"s","pct_level_3":"s","pct_level_3_and_4":"s","pct_level_4":"s","year":"2006"}'
In [8]:
import pandas as pd
In [10]:
math = pd.read_json(math.text)
In [11]:
math.head()
Out[11]:
dbn | demographic | grade | mean_scale_score | num_level_1 | num_level_2 | num_level_3 | num_level_3_and_4 | num_level_4 | number_tested | pct_level_1 | pct_level_2 | pct_level_3 | pct_level_3_and_4 | pct_level_4 | year | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 01M015 | Asian | 3 | s | s | s | s | s | s | 3 | s | s | s | s | s | 2006 |
1 | 01M015 | Black | 3 | 662 | 0 | 3 | 9 | 9 | 0 | 12 | 0 | 25 | 75 | 75 | 0 | 2006 |
2 | 01M015 | Hispanic | 3 | 670 | 1 | 8 | 10 | 15 | 5 | 24 | 4.2 | 33.3 | 41.7 | 62.5 | 20.8 | 2006 |
3 | 01M015 | Asian | 3 | s | s | s | s | s | s | 3 | s | s | s | s | s | 2007 |
4 | 01M015 | Black | 3 | s | s | s | s | s | s | 4 | s | s | s | s | s | 2007 |
Climate Data¶
The World Bank has an API that allows access to a large amount of climate data. Here is a snippet from the documentation:
About the Climate Data API
The Climate Data API provides programmatic access to most of the climate data used on the World Bank’s Climate Change Knowledge Portal. Web developers can use this API to access the knowledge portal’s data in real time to support their own applications, so long as they abide by the World Bank’s Terms of Use.
In [12]:
url = 'http://climatedataapi.worldbank.org/climateweb/rest/v1/country/cru/tas/year/CAN.csv'
In [13]:
canada = requests.get(url)
In [18]:
canada
Out[18]:
<Response [200]>
In [19]:
canada.text[:199]
Out[19]:
'year,data\n1901,-7.67241907119751\n1902,-7.862711429595947\n1903,-7.910782814025879\n1904,-8.155729293823242\n1905,-7.547311305999756\n1906,-7.684103488922119\n1907,-8.413553237915039\n1908,-7.79092931747436'
In [25]:
df = pd.read_(canada.text)
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
<ipython-input-25-009399ea74d6> in <module>()
----> 1 df = pd.read_table(canada.text)
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in parser_f(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, escapechar, comment, encoding, dialect, tupleize_cols, error_bad_lines, warn_bad_lines, skipfooter, skip_footer, doublequote, delim_whitespace, as_recarray, compact_ints, use_unsigned, low_memory, buffer_lines, memory_map, float_precision)
653 skip_blank_lines=skip_blank_lines)
654
--> 655 return _read(filepath_or_buffer, kwds)
656
657 parser_f.__name__ = name
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _read(filepath_or_buffer, kwds)
403
404 # Create the parser.
--> 405 parser = TextFileReader(filepath_or_buffer, **kwds)
406
407 if chunksize or iterator:
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, f, engine, **kwds)
762 self.options['has_index_names'] = kwds['has_index_names']
763
--> 764 self._make_engine(self.engine)
765
766 def close(self):
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in _make_engine(self, engine)
983 def _make_engine(self, engine='c'):
984 if engine == 'c':
--> 985 self._engine = CParserWrapper(self.f, **self.options)
986 else:
987 if engine == 'python':
~/anaconda3/lib/python3.6/site-packages/pandas/io/parsers.py in __init__(self, src, **kwds)
1603 kwds['allow_leading_cols'] = self.index_col is not False
1604
-> 1605 self._reader = parsers.TextReader(src, **kwds)
1606
1607 # XXX
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader.__cinit__ (pandas/_libs/parsers.c:4209)()
pandas/_libs/parsers.pyx in pandas._libs.parsers.TextReader._setup_parser_source (pandas/_libs/parsers.c:8873)()
FileNotFoundError: File b'year,data\n1901,-7.67241907119751\n1902,-7.862711429595947\n1903,-7.910782814025879\n1904,-8.155729293823242\n1905,-7.547311305999756\n1906,-7.684103488922119\n1907,-8.413553237915039\n1908,-7.790929317474365\n1909,-8.23930549621582\n1910,-7.774611473083496\n1911,-8.114446640014648\n1912,-7.885402679443359\n1913,-7.987940311431885\n1914,-7.965937614440918\n1915,-7.144039154052734\n1916,-8.132978439331055\n1917,-8.499914169311523\n1918,-8.204662322998047\n1919,-8.035985946655273\n1920,-7.830679893493652\n1921,-7.685777187347412\n1922,-8.334989547729492\n1923,-8.022125244140625\n1924,-7.574568271636963\n1925,-7.951625823974609\n1926,-7.792789459228516\n1927,-7.961727142333984\n1928,-7.237975120544434\n1929,-8.123651504516602\n1930,-7.302305698394775\n1931,-6.646339416503906\n1932,-7.821688652038574\n1933,-8.693134307861328\n1934,-7.964327335357666\n1935,-8.166967391967773\n1936,-8.59422492980957\n1937,-7.3392534255981445\n1938,-6.856348991394043\n1939,-7.669107913970947\n1940,-6.799381256103516\n1941,-7.233104705810547\n1942,-7.097812652587891\n1943,-7.2231879234313965\n1944,-6.603946685791016\n1945,-7.646345615386963\n1946,-7.739509582519531\n1947,-7.161524295806885\n1948,-7.659969329833984\n1949,-7.696352958679199\n1950,-8.517829895019531\n1951,-7.903929710388184\n1952,-6.680769920349121\n1953,-6.7539520263671875\n1954,-7.334064483642578\n1955,-7.586000919342041\n1956,-8.27430534362793\n1957,-7.763300895690918\n1958,-6.903257846832275\n1959,-7.8713836669921875\n1960,-6.951033115386963\n1961,-7.946412086486816\n1962,-7.465360164642334\n1963,-7.363328456878662\n1964,-8.241130828857422\n1965,-8.078269958496094\n1966,-7.83267879486084\n1967,-7.973592281341553\n1968,-7.3681206703186035\n1969,-7.0392913818359375\n1970,-7.720573425292969\n1971,-7.469780921936035\n1972,-9.525187492370605\n1973,-6.853341579437256\n1974,-8.428787231445312\n1975,-7.621856689453125\n1976,-7.588895320892334\n1977,-6.557257652282715\n1978,-7.993335723876953\n1979,-7.845717430114746\n1980,-7.049171447753906\n1981,-5.506665229797363\n1982,-8.5137939453125\n1983,-7.463536262512207\n1984,-7.414198398590088\n1985,-7.432916164398193\n1986,-7.71035099029541\n1987,-6.4537835121154785\n1988,-6.610030174255371\n1989,-7.706485748291016\n1990,-7.6779985427856445\n1991,-7.095147132873535\n1992,-7.697887420654297\n1993,-6.986419677734375\n1994,-6.888780117034912\n1995,-6.850322723388672\n1996,-7.337457180023193\n1997,-6.88342809677124\n1998,-5.186192989349365\n1999,-5.975519180297852\n2000,-6.7265448570251465\n2001,-5.930727958679199\n2002,-6.852164268493652\n2003,-6.402592658996582\n2004,-7.529717445373535\n2005,-5.863758563995361\n2006,-5.543209552764893\n2007,-6.819293975830078\n2008,-7.2008957862854\n2009,-6.997011661529541\n2010,-4.703649520874023\n2011,-5.9335737228393555\n2012,-5.714600563049316\n' does not exist
In [22]:
df.head()
Out[22]:
year | data | |
---|---|---|
0 | 1901 | -7.672419 |
1 | 1902 | -7.862711 |
2 | 1903 | -7.910783 |
3 | 1904 | -8.155729 |
4 | 1905 | -7.547311 |
In [26]:
frame = pd.DataFrame(canada.text)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-26-9d5b746d4789> in <module>()
----> 1 frame = pd.DataFrame(canada.text)
~/anaconda3/lib/python3.6/site-packages/pandas/core/frame.py in __init__(self, data, index, columns, dtype, copy)
352 copy=False)
353 else:
--> 354 raise ValueError('DataFrame constructor not properly called!')
355
356 NDFrame.__init__(self, mgr, fastpath=True)
ValueError: DataFrame constructor not properly called!
In [29]:
canada.text
Out[29]:
'year,data\n1901,-7.67241907119751\n1902,-7.862711429595947\n1903,-7.910782814025879\n1904,-8.155729293823242\n1905,-7.547311305999756\n1906,-7.684103488922119\n1907,-8.413553237915039\n1908,-7.790929317474365\n1909,-8.23930549621582\n1910,-7.774611473083496\n1911,-8.114446640014648\n1912,-7.885402679443359\n1913,-7.987940311431885\n1914,-7.965937614440918\n1915,-7.144039154052734\n1916,-8.132978439331055\n1917,-8.499914169311523\n1918,-8.204662322998047\n1919,-8.035985946655273\n1920,-7.830679893493652\n1921,-7.685777187347412\n1922,-8.334989547729492\n1923,-8.022125244140625\n1924,-7.574568271636963\n1925,-7.951625823974609\n1926,-7.792789459228516\n1927,-7.961727142333984\n1928,-7.237975120544434\n1929,-8.123651504516602\n1930,-7.302305698394775\n1931,-6.646339416503906\n1932,-7.821688652038574\n1933,-8.693134307861328\n1934,-7.964327335357666\n1935,-8.166967391967773\n1936,-8.59422492980957\n1937,-7.3392534255981445\n1938,-6.856348991394043\n1939,-7.669107913970947\n1940,-6.799381256103516\n1941,-7.233104705810547\n1942,-7.097812652587891\n1943,-7.2231879234313965\n1944,-6.603946685791016\n1945,-7.646345615386963\n1946,-7.739509582519531\n1947,-7.161524295806885\n1948,-7.659969329833984\n1949,-7.696352958679199\n1950,-8.517829895019531\n1951,-7.903929710388184\n1952,-6.680769920349121\n1953,-6.7539520263671875\n1954,-7.334064483642578\n1955,-7.586000919342041\n1956,-8.27430534362793\n1957,-7.763300895690918\n1958,-6.903257846832275\n1959,-7.8713836669921875\n1960,-6.951033115386963\n1961,-7.946412086486816\n1962,-7.465360164642334\n1963,-7.363328456878662\n1964,-8.241130828857422\n1965,-8.078269958496094\n1966,-7.83267879486084\n1967,-7.973592281341553\n1968,-7.3681206703186035\n1969,-7.0392913818359375\n1970,-7.720573425292969\n1971,-7.469780921936035\n1972,-9.525187492370605\n1973,-6.853341579437256\n1974,-8.428787231445312\n1975,-7.621856689453125\n1976,-7.588895320892334\n1977,-6.557257652282715\n1978,-7.993335723876953\n1979,-7.845717430114746\n1980,-7.049171447753906\n1981,-5.506665229797363\n1982,-8.5137939453125\n1983,-7.463536262512207\n1984,-7.414198398590088\n1985,-7.432916164398193\n1986,-7.71035099029541\n1987,-6.4537835121154785\n1988,-6.610030174255371\n1989,-7.706485748291016\n1990,-7.6779985427856445\n1991,-7.095147132873535\n1992,-7.697887420654297\n1993,-6.986419677734375\n1994,-6.888780117034912\n1995,-6.850322723388672\n1996,-7.337457180023193\n1997,-6.88342809677124\n1998,-5.186192989349365\n1999,-5.975519180297852\n2000,-6.7265448570251465\n2001,-5.930727958679199\n2002,-6.852164268493652\n2003,-6.402592658996582\n2004,-7.529717445373535\n2005,-5.863758563995361\n2006,-5.543209552764893\n2007,-6.819293975830078\n2008,-7.2008957862854\n2009,-6.997011661529541\n2010,-4.703649520874023\n2011,-5.9335737228393555\n2012,-5.714600563049316\n'
Using the Documentation¶
Seems this is not so easy. Luckily, the climate data is also available
as part of the wbdata
package. Use the documentation to pull and
analyze data related to Climate indicators, or a different choice using
the documentation at: http://wbdata.readthedocs.io/en/latest/.