Text Analysis using UDF

In this lab, we will see how to use Athena UDFs to do the following:
  • Detect the dominant language of a text field
  • Detect the prevailing sentiment expressed—positive, negative, neither, or both
  • Detect or redact entities (such as items, places, or quantities)

  • You can also use the UDFs for other Text Analytics usecases such as, but not limited to:
  • Detect or redact PII
  • Translate text from one language to another

  • For more information on different use-cases, check this blog post on Athena Text Analytics

    Architecture

    The UDF that we will be using as a part of this Lab will be built on top of AWS Lambda. The lambda function will invoke Amazon Comprehend APIs to detect language, sentiment and extract entities from text fields. It will also uses Amazon Translate for language translation.

    Configuring Athena

    If you are doing this lab as a part of an AWS Event, these configurations have already been applied to your account. For self-paced labs, the advanced AWS CloudFormation template from the previous section configures Athena for text analysis.

    Though the configurations are already available on your environment, the high level steps involved in configuring Athena to do text analysis is as discussed below :-

  • Install the UDF
  • The UDF uses a pre-built lambda function that is available on GitHub. The TextAnalyticsUDFHandler application from AWS Serverless Application Repository is configured as a lambda function called textanalytics-udf. For instructions on configuring the UDF manually, check out this blog.

  • Create a dedicated workgroup within Athena
  • The Athena UDF feature requires you to use Athena engine version 2. The cloudformation template provisions a new workgroup with the version 2 engine. If you are working on the labs without using the templates and your workgroup is not on Athena engine v2, please create a new workgroup with the v2 engine or edit your current workgroup to the v2 engine.

    Follow the steps one-by-one to complete the lab: