DataSets

To demonstrate Athena capabilities, a sample data set is being used in this workshop along with sample tables and sample data sources.
We will be using the following datasets for Athena basic labs.

  1. Amazon Product Reviews Dataset - This dataset provides both TSV (tab separated values) and Parquet versions of over 155 million customer reviews since 1995.
    Source: https://s3.amazonaws.com/amazon-reviews-pds/readme.html

  2. Flight Delay Dataset - Historical flight delay data. 1.67 million records with wide rows.
    Source: http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236&DB_Short_Name=On-Time