To view source dataset in S3, access below URL
Amazon Product Reviews Dataset
Notice that the tsv folder has multiple files compressed using gzip. Also notice that file size varies from 12 MB to 2.6 GB.
Parquet folder has sub-folders on product category and going down one level, you would notice that files are compressed using snappy. File size is more uniform.
Flight Delay Dataset
Navigate to flight folder and check the data under csv and parquet folder.