To create a denormalized table we are going to run a job on Amazon EMR.
Amazon EMR is a powerful cluster, that you can set with few machines like in this Workshop or tens to thousands of machines. Consider using spot instances for batch processing and terminate your clusters when you are not using them. It is also recommended to store job results on Amazon S3.
Step 1: Go to EMR console.
Step 2: click on c360cluster.
Step 3: click on Steps tab.
Step 4: Add step.
Use the bucket browser to select the application location.
**your analytics bucket**Pick the name from Amazon S3 console Leave a space between
--BucketNameand your bucket name, without s3://.
Then, click on Add.
Step 5: check the job status, going from pending to running.
After completion the job has created a denormalized table using PySpark.
Step 6: go to Lake formation console select the
c360denormalized table from
Step 7: Grant access to it to your
Step 8: go to Athena console and check the new c360denormalized table on c360view_analytics database.