Detect language for each Amazon review

In this section of the lab, we will use the UDF to detect the language of a text column. For that, we will create a new table from our base table with an additional column: language.

The create script for the new table is available in the “Saved queries” section of your workgroup. You can either run it from the saved queries or execute it manually.

Please follow the appropriate link below.

Instructions to execute saved query

Instructions to execute manually

The above step creates the new table, amazon_reviews_with_language. Let us run the below query to list all languages, sorted by their occurrence count.
 
SELECT language, count(*) AS count FROM default.amazon_reviews_with_language GROUP BY language ORDER BY count DESC;

What just happened

We invoked the textanalytics-udf Lambda function and used detect_dominant_language method inside the function to detect the language in 2000 fields inside the review_body column of the main table. The results where then written into a new column named language. The Lambda function invoked Comprehend APIs to detect the dominant language.