In this lab, we will see how to use Athena UDFs to do the following:
Detect the dominant language of a text field
Detect the prevailing sentiment expressed—positive, negative, neither, or both
Detect or redact entities (such as items, places, or quantities)
You can also use the UDFs for other Text Analytics usecases such as, but not limited to:
Detect or redact PII
Translate text from one language to another
For more information on different use-cases, check this blog post on Athena Text Analytics
The UDF that we will be using as a part of this Lab will be built on top of AWS Lambda
. The lambda function will invoke Amazon Comprehend
APIs to detect language
and extract entities
from text fields. It will also uses Amazon Translate
for language translation.
If you are doing this lab as a part of an AWS Event, these configurations have already been applied to your account. For self-paced labs, the advanced AWS CloudFormation template from the previous section configures Athena for text analysis.
Though the configurations are already available on your environment, the high level steps involved in configuring Athena to do text analysis is as discussed below :-
Install the UDF
The UDF uses a pre-built lambda function that is available on GitHub
. The TextAnalyticsUDFHandler application from AWS Serverless Application Repository
is configured as a lambda function called textanalytics-udf
. For instructions on configuring the UDF manually, check out this blog.
Create a dedicated workgroup within Athena
The Athena UDF feature requires you to use Athena engine version 2. The cloudformation template provisions a new workgroup with the version 2 engine. If you are working on the labs without using the templates and your workgroup is not on Athena engine v2, please create a new workgroup with the v2 engine or edit your current workgroup to the v2 engine.
Follow the steps one-by-one to complete the lab: