This Machine Learning lab will show you how to use Amazon Athena ML to run a Federated Query that uses SageMaker inference to detect an anomalous value in our result.
For detecting anamolous values, we will be using Random Cut Forest (RCF) Algorithm which is an unsupervised algorithm for detecting anomalous data points within a data set.
For more information about RCF algorithm on Amazon SageMaker, please visit the following link:
Random Cut Forest
IAM Role for AWS SageMaker
As part of the CloudFormation Stack that we have already run to build the environment for this lab, we have already created a new Role that AWS SageMaker can use to run an Athena query to generate our training dataset, train a new model, and deploy that model to a SageMaker endpoint. To perform these tasks, our role should have AmazonAthenaFullAccess, AmazonSageMakerFullAccess, and AmazonS3FullAccess managed policies. Note that in a production setting you should scope down the AmazonS3FullAccess policy to include only the buckets that you require for training your model.
Before You Begin
As part of the CloudFormation Stack that we have already run to build the environment for this lab, we have already created a new SageMaker notebook using a ml.m4.xlarge instance type. We will be using the ARN of the role we already created in the previous step as the IAM Role that this notebook will use when interacting with other AWS Services.