View Datasets

To view source dataset in S3, access below URL

Amazon Product Reviews Dataset

Notice that the tsv folder has multiple files compressed using gzip. Also notice that file size varies from 12 MB to 2.6 GB. Parquet folder has sub-folders on product category and going down one level, you would notice that files are compressed using snappy. File size is more uniform.

Flight Delay Dataset

Navigate to flight folder and check the data under csv and parquet folder.