Test Data & Users

To demonstrate Athena federation capabilities, a sample data set is being used in this workshop along with sample tables and sample data sources.

Let’s walk through the following test datasets & data sources:

TPCH Database & Tables

TPCH data, which is public, will be used for this workshop. This dataset is a decision support benchmark. It consists of a suite of business-oriented ad hoc queries and concurrent data modifications. The queries and the data populating the database have been chosen to have broad industry-wide relevance. This benchmark illustrates decision support systems that examine large volumes of data, execute queries with a high degree of complexity, and give answers to critical business questions. The components of TPC-H consist of eight separate and individual tables (the Base Tables). The relationships between columns in these tables are illustrated in the following diagram: For this workshop, we will focus on the following tables from the TPC database:
  • customer
  • supplier
  • orders
  • part
  • partsupp
  • lineitem
  • nation

The entire TPCH data dictionary is available here: here.