Now you are going to perform more advanced transformations using AWS Glue jobs.
Step 1: Go to AWS Glue jobs console, select
n1_c360_dispositions, Pyspark job.
The transformation inside this job performs a join between 3 tables, general banking, account and card, to calculate disposition type and acquisition information.
Step 2: Click on Edit job.
Step 3: Change Glue version to
Spark 2.4, Python 3 with improved startup times (Glue Version 2.0).
Step 4: Select your stage S3 bucket c360view-us-west-2-your_account_id-stage + ‘/tmp/’ as Temporary directory.
Step 5: Select
n1_c360_dispositions and click on Action, Run job.
Step 6: Wait for completion.
Step 7: Check the script and logs.
Notice that in this Python script that converts a query result from Amazon Athena to a Pandas data frame and then the result from Pandas transformation to parquet files in Amazon S3 using spark write operation.
Step 8: Now select the Jobcust360etlmftrans, another Pyspark job, Action and Edit job.
Step 9: Change Glue version to
Spark 2.4, Python 3 with improved startup times (Glue Version 2.0) and select your stage S3 bucket c360view-us-west-2-your_account_id-stage + ‘/tmp/’ as Temporary directory.
Step 10: Now select the Jobcust360etlmftrans. Click on Action, and then on Run job.
Step 11: Wait for completion.
Step 12: Check the script and logs.
In this pyspark script we are doing some aggregations with the transactions from relational database done by account_id in the last 3 months and also the last 6 months. For this we used the AWS Glue dynamic frame.