{"id":66,"date":"2017-08-17T18:05:02","date_gmt":"2017-08-17T18:05:02","guid":{"rendered":"https:\/\/www.kedwards.com\/cs6452\/?page_id=66"},"modified":"2019-05-12T12:02:54","modified_gmt":"2019-05-12T12:02:54","slug":"homework-2","status":"publish","type":"page","link":"https:\/\/www.kedwards.com\/cs6452\/homework-2\/","title":{"rendered":"Homework 2"},"content":{"rendered":"<h4>Homework 2: Analyzing Structured Data using CSV and JSON<\/h4>\n<p>Due Date: See class schedule<\/p>\n<p>In this assignment you&#8217;ll learn how to process and analyze data structured in common formats using Python. You&#8217;ll gain experience in using Python data structures, and also useful Python modules for processing CSV and JSON data.<\/p>\n<p>This homework assignment is less guided than the previous one. I want you to pick a data source that you&#8217;re interested in, from the list of available sources from <a href=\"https:\/\/catalog.data.gov\/dataset?res_format=CSV\">data.gov<\/a>. (If, for some reason, you wish to use a data source not on this page, please clear it with me first). The data source you choose must have either a CSV or JSON format available.<\/p>\n<p><em>(Hint: &nbsp;if you don&#8217;t want to spend a lot of time going through data files looking for a good one, you&#8217;re welcome to use the file 2010-census-by-zip-simplified.csv on T-Square in the resources folder. &nbsp;It&#8217;s a version of the 2010 census data linked off of data.gov, but with spaces removed from the field names to make the assignment a bit simpler.<\/em><\/p>\n<p><em>Another hint: working with CVS data files will generally be a bit easier than working with JSON data files, but you can pick whichever you&#8217;d like.)<\/em><\/p>\n<p>You can think of these data files as representing tabular data, encoded into either CSV or JSON format. The table consists of multiple&nbsp;<em>fields,&nbsp;<\/em>which you can think of as columns. Each&nbsp;<em>entry<\/em> expresses the relations among a set of fields, which you can think of as being the rows in the table. While JSON data is potentially a bit more complex than this conceptually, most of the data sets at data.gov still basically represent tabular data. You must pick a data source that has at least four fields in it and at least 10 rows.<\/p>\n<p>Your code must load the data you&#8217;ve chosen, and then support a variety of queries that can be typed in at the command line by a user. You can hard-code the reference to your data file, but please set up your code so that it expects the data file to be in the same folder as your Python code (and so references it using just the base filename), and doesn&#8217;t contain any platform-specific path characters. I should be able to copy your code and data file to my computer and run it without problem. For reading user input from the command line, you&#8217;ll want to use raw_input() or one of its variants.<\/p>\n<p>The queries you should support are:<\/p>\n<ul>\n<li><strong>max &lt;field&gt;<\/strong>: given a particular&nbsp;<em>field name<\/em>, find and print the data that has the&nbsp;<em>maximum<\/em> value for this field across your entire data set.<\/li>\n<li><strong>min &lt;field&gt;<\/strong>: given a particular&nbsp;<em>field name<\/em>, find and print the data that has the&nbsp;<em>minimum<\/em> value for this field.<\/li>\n<li><strong>avg &lt;field&gt;<\/strong>: given a&nbsp;<em>field name<\/em>, compute the average of all values in your data for this field. (You can assume this will only be called on fields that have numeric values.)<\/li>\n<li><strong>search &lt;field&gt; &lt;value&gt;<\/strong>: find all data in your dataset for which the named&nbsp;<em>field<\/em> has the indicated&nbsp;<em>value.<\/em><\/li>\n<li><strong>range &lt;field&gt; &lt;min&gt; &lt;max&gt;<\/strong>: find all data in your dataset for which the named&nbsp;<em>field&nbsp;<\/em>has values between the specified&nbsp;<em>min&nbsp;<\/em>and&nbsp;<em>max<\/em>, and display in ascending order.<\/li>\n<li><strong>quit:&nbsp;<\/strong>exit the program.<\/li>\n<\/ul>\n<p>In essence, these commands make up a simple command language that will allow your user to explore the data. When these commands are entered into your program, your code should parse them and execute them correctly.<\/p>\n<p><em>(Hint: some of the data will have field names that contain spaces, such as &#8220;Total Population.&#8221; This will make parsing the commands a bit trickier. For this assignment, &nbsp;if you don&#8217;t want to code your program to deal with spaces in your field names, you can either choose a data set that doesn&#8217;t have spaces in the field names, or simply edit the data to remove the spaces.)<\/em><\/p>\n<p>Here&#8217;s an example interaction, for a table of population estimates from 2013:<\/p>\n<pre>STATE        POP_ESTIMATE\nAlabama      4833722\nAlaska       735132\nArizona      6626624\nArkansas     2959373\n\n&gt; min POP_ESTIMATE\n735132\n&gt; max POP_ESTIMATE\n6626624\n&gt; avg POP_ESTIMATE\n3788713\n&gt; search STATE Alaska\nSTATE          POP_ESTIMATE\nAlaska         735132\n&gt; range POP_ESTIMATE 2000000 5000000\nSTATE          POP_ESTIMATE\nArkansas       2959373\nAlabama        4833722\n\n<\/pre>\n<p>Note that if your data doesn&#8217;t lend itself well to these queries, please see me&#8230; we can come up with an alternative set that makes more sense for your data.<\/p>\n<p>To submit your assignment, please create and submit on T-Square a ZIP file that contains the following:<\/p>\n<ul>\n<li>Your python program<\/li>\n<li>The data file you&#8217;re using (if this file is larger than a megabyte, please trim it down before submitting&#8230; be sure to make sure your program works on this smaller data file before submitting it)<\/li>\n<li>A short README.txt file that contains examples of the above queries that produce results using your data file.<\/li>\n<\/ul>\n<h5>EXTRA CREDIT:<\/h5>\n<p>Up to 10 points for additional sorts of queries, depending on complexity and usefulness.<\/p>\n<p>&nbsp;<\/p>\n","protected":false},"excerpt":{"rendered":"<p>Homework 2: Analyzing Structured Data using CSV and JSON Due Date: See class schedule In this assignment you&#8217;ll learn how&hellip; <a class=\"read-more\" href=\"https:\/\/www.kedwards.com\/cs6452\/homework-2\/\">Read more <span class=\"screen-reader-text\">Homework 2<\/span><\/a><\/p>\n","protected":false},"author":1,"featured_media":0,"parent":0,"menu_order":0,"comment_status":"closed","ping_status":"closed","template":"","meta":{"footnotes":""},"class_list":["post-66","page","type-page","status-publish","hentry"],"_links":{"self":[{"href":"https:\/\/www.kedwards.com\/cs6452\/wp-json\/wp\/v2\/pages\/66","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.kedwards.com\/cs6452\/wp-json\/wp\/v2\/pages"}],"about":[{"href":"https:\/\/www.kedwards.com\/cs6452\/wp-json\/wp\/v2\/types\/page"}],"author":[{"embeddable":true,"href":"https:\/\/www.kedwards.com\/cs6452\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.kedwards.com\/cs6452\/wp-json\/wp\/v2\/comments?post=66"}],"version-history":[{"count":10,"href":"https:\/\/www.kedwards.com\/cs6452\/wp-json\/wp\/v2\/pages\/66\/revisions"}],"predecessor-version":[{"id":192,"href":"https:\/\/www.kedwards.com\/cs6452\/wp-json\/wp\/v2\/pages\/66\/revisions\/192"}],"wp:attachment":[{"href":"https:\/\/www.kedwards.com\/cs6452\/wp-json\/wp\/v2\/media?parent=66"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}