Hi!
I am kind of stuck at the end of the tunnel here for a POC meant to streamline AWS S3 data loads.
The goal is to,
1) Parse and load files to AWS S3 into different buckets which will be queried through Athena
2) Create external tables in Athena from the workflow for the files
3) Load partitions by running a script dynamically to load partitions in the newly created Athena tables
So far, I was able to parse and load file to S3 and generate scripts that can be run on Athena to create tables and load partitions. The major issue now is that the Dynamic Input module which allows me to run Athena queries through a Simba Athena ODBC driver will not allow me to run any DDL operations.
Following is the error received...
Dynamic Input (3) Error SQLPrepare: [Simba][Athena] (1040) An error has been thrown from the AWS Athena client. Athena Error No: 130, HTTP Response Code: 400, Exception Name: InvalidRequestException, Error Message: line 1:30: extraneous input 'CREATE' expecting {'(', 'ADD', 'ALL', 'SOME', 'ANY', 'AT', 'NO', 'SUBSTRING', 'POSITION', 'TINYINT', 'SMALLINT', 'INTEGER', 'DATE', 'TIME', 'TIMESTAMP', 'INTERVAL', 'YEAR', 'MONTH', 'DAY', 'HOUR', 'MINUTE', 'SECOND', 'ZONE', 'FILTER', 'OVER', 'PARTITION', 'RANGE', 'ROWS', 'PRECEDING', 'FOLLOWING', 'CURRENT', 'ROW', 'SCHEMA', 'COMMENT', 'VIEW', 'REPLACE', 'GRANT', 'REVOKE', 'PRIVILEGES', 'PUBLIC', 'OPTION', 'EXPLAIN', 'ANALYZE', 'FORMAT', 'TYPE', 'TEXT', 'GRAPHVIZ', 'LOGICAL', 'DISTRIBUTED', 'VALIDATE', 'SHOW', 'TABLES', 'VIEWS', 'SCHEMAS', 'CATALOGS', 'COLUMNS', 'COLUMN', 'USE', 'PARTITIONS', 'FUNCTIONS', 'TO', 'SYSTEM', 'BERNOULLI', 'POISSONIZED', 'TABLESAMPLE', 'UNNEST', 'ARRAY', 'MAP', 'SET', 'RESET', 'SESSION', 'DATA', 'START', 'TRANSACTION', 'COMMIT', 'ROLLBACK', 'WORK', 'ISOLATION', 'LEVEL', 'SERI
Here is the query syntax I have that works fine in Athena but not through the Dynamic Input in Alteryx.
CREATE EXTERNAL TABLE IF NOT EXISTS athenadbname.athenatblname (col_one string,col_two string,col_three string) PARTITIONED BY (date string) ROW FORMAT SERDE 'org.apache.hadoop.hive.ql.io.parquet.serde.ParquetHiveSerDe' STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' LOCATION 's3://bucket/athenatblname' TBLPROPERTIES ('parquet.compress'='gzip')
Any help is much appreciated!
Thank you.
Solved! Go to Solution.
The CREATE statement only works as a pre or post sql statement, and it also looks like you want to be outputting data, not inputting it (so Dynamic Output...if there was such a Tool).
What you could do is something like this
If you wanted to run multiple queries, you would just make a batch macro that updates the Output Tool
Cheers,
Mike
Thank you Michael for responding.
I actually have designed an app which builds a query based on a configuration table we have to load the files in Athena.. This app will be used as a one time setup to create a schema. The files will be loaded everyday to the same S3 bucket from a separate workflow which uses AWS CLI instead of the native S3 Upload connector. We run ALTER PARTITION scripts to refresh the mapping between S3 and Athena thereafter. The app does not have any input data.
Additionally, I also need to run ALTER PARTITION scripts which is also not supported by Dynamic Input tool it seems. I did try using the provided solution, but doesn't work in my case.
 
					
				
				
			
		
