UDF Code and Publish

  1. Open the project

    In the Cloud9 IDE on the left hand side expand the aws-athena-query-federation project and navigate to the AthenaUDFHandler.java file. Double Click on the file and it should open up for editing as shown here:
  2. Add UDF Function

  3. Now we will add the UDF code for a Redact String function which would redact a string to show only the last 4 characters. This UDF function can be used to mask Credit Card Information, SSN or other sensitive information like Customer Name, Phone Numbers, Address etc.

    This function takes a string as an input and returns back a redacted string to show only the last 4 characters e.g. xxxx1234. The UDF function takes column name as input from the data source being used as part of the query, processes it using the lambda function code and then returns data back to Athena.
        /** Redact a string to show only the last 4 characters
         * 
         * 
         * 
         * @param input the string to redact
         * @return redacted string
         */
        public String redact(String input)
        {
            String redactedString = new StringBuilder(input).replace(0, input.length() - 4, new String(new char[input.length() - 4]).replace("\0", "x")).toString(); 
            return redactedString;
        }
            
    You can copy the code snippet at line 61 in AthenaUDFHandler.java or you can get the modified UDF code by issuing the following command:
            
            
            curl https://aws-data-analytics-workshops.s3.amazonaws.com/athena-workshop/scripts/AthenaUDFHandler.java > athena-udfs/src/main/java/com/amazonaws/athena/connectors/udfs/AthenaUDFHandler.java
    
            
        
    Once the file is copied, you can open it in Cloud9 IDE to see its contents.
  4. Build the JAR File
  5. Save the file and run mvn clean install to build your project. After it successfully builds, a JAR file is created in the target folder of your project named artifactId-version.jar, where artifactId is the name you provided in the Maven project, for example, athena-udfs.
            
            cd ~/environment/aws-athena-query-federation/athena-udfs/
    
            mvn clean install
        
  6. Deploying The Connector

    From the athena-udfs dir, run ../tools/publish.sh S3_BUCKET_NAME athena-udfs to publish the connector to your private AWS Serverless Application Repository. The S3_BUCKET_NAME in the command is where a copy of the connector's code will be stored for the Serverless Application Repository to retrieve it. This will allow users with permission to do so, the ability to deploy instances of the connector via 1-Click form.
            
            ../tools/publish.sh S3_BUCKET_NAME athena-udfs
        
    S3_BUCKET_NAME can be obtained from CloudFormation Output Key S3Bucket as shown here: Once the connector is published successfully, it will look like this: Click on the link shown in terminal after the publish is successful or alternatively navigate to Serverless Application Repository Click on Available Applications -> Private applications.

  7. Move to the next chapter when the connector is successfully deployed .