UT-COMPAIRQUA-18(Epa Gov air Data)

Related to this Jira Ticket http://jira.bigdatautah.org/browse/COMPAIRQUA-18

"EPA daily measurements of several pollutants for 4+ years.
File is tar'd and gzipped - see below.

Also available - HQL script for table: https://drive.google.com/file/d/0B3JJ714WyLZAU05WQXVZWjVCb2c/edit?usp=sharing

and stats and test query script: https://drive.google.com/file/d/0B3JJ714WyLZAdWQxTTBIa3g4aE0/edit?usp=sharing

Haven't used Google Drive before, so if there are problems, please contact me.

To get data into HDFS & in a Hive table:

Expand files:
tar xvfz epapoll.tar.gz Create Hive table (and HDFS dir): hive -f mkTblEPA_all.hql Move files to Hive directory: hadoop fs -put _000 /data/AirQuality/EPA/EPA_all_pollutants Collect stats and test query: hive -f stat.hql "

Additional Info

Field Value
Source http://www.epa.gov/airdata/ad_data.html