This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I am kind of stuck at the end of the tunnel here for a POC meant to streamline AWS S3 data loads.
The goal is to,
1) Parse and load files to AWS S3 into different buckets which will be queried through Athena
2) Create external tables in Athena from the workflow for the files
3) Load partitions by running a script dynamically to load partitions in the newly created Athena tables
So far, I was able to parse and load file to S3 and generate scripts that can be run on Athena to create tables and load partitions. The major issue now is that the Dynamic Input module which allows me to run Athena queries through a Simba Athena ODBC driver will not allow me to run any DDL operations.
I actually have designed an app which builds a query based on a configuration table we have to load the files in Athena.. This app will be used as a one time setup to create a schema. The files will be loaded everyday to the same S3 bucket from a separate workflow which uses AWS CLI instead of the native S3 Upload connector. We run ALTER PARTITION scripts to refresh the mapping between S3 and Athena thereafter. The app does not have any input data.
Additionally, I also need to run ALTER PARTITION scripts which is also not supported by Dynamic Input tool it seems. I did try using the provided solution, but doesn't work in my case.