To create a denormalized table we are going to run a job on Amazon EMR.
Amazon EMR is a powerful cluster, that you can set with few machines like in this Workshop or tens to thousands of machines. Consider using spot instances for batch processing and terminate your clusters when you are not using them. It is also recommended to store job results on Amazon S3.
Step 1: Go to EMR console.
Step 2: click on c360cluster.
Step 3: click on Steps tab.
Step 4: Add step.
**your_stage_bucket**
/library/c360_analytics.pyUse the bucket browser to select the application location.
--BucketName
**your analytics bucket**
Pick the name from Amazon S3 console
Leave a space between --BucketName
and your bucket name, without s3://.Then, click on Add.
Step 5: check the job status, going from pending to running.
After completion the job has created a denormalized table using PySpark.
Step 6: go to Lake formation console select the c360denormalized
table from c360view_analytic
databases.
Step 7: Grant access to it to your user
or role
.
Step 8: go to Athena console and check the new c360denormalized table on c360view_analytics database.