Related to this Jira Ticket
http://jira.bigdatautah.org/browse/COMPAIRQUA-18
"EPA daily measurements of several pollutants for 4+ years.
File is tar'd and gzipped - see below.
Also available - HQL script for table: https://drive.google.com/file/d/0B3JJ714WyLZAU05WQXVZWjVCb2c/edit?usp=sharing
and stats and test query script:
https://drive.google.com/file/d/0B3JJ714WyLZAdWQxTTBIa3g4aE0/edit?usp=sharing
Haven't used Google Drive before, so if there are problems, please contact me.
To get data into HDFS & in a Hive table:
Expand files:
tar xvfz epapoll.tar.gz
Create Hive table (and HDFS dir):
hive -f mkTblEPA_all.hql
Move files to Hive directory:
hadoop fs -put _000 /data/AirQuality/EPA/EPA_all_pollutants
Collect stats and test query:
hive -f stat.hql
"