This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I received from an AWS adviser the following message:
Skip Compression Analysis During COPY Checks for COPY operations delayed by automatic compression analysis.
Rebuilding uncompressed tables with column encoding would improve the performance of 2,781 recent COPY operations. This analysis checks for COPY operations delayed by automatic compression analysis. COPY performs a compression analysis phase when loading to empty tables without column compression encodings. You can optimize your table definitions to permanently skip this phase without any negative impacts.
Between 2018-10-29 00:00:00 UTC and 2018-11-01 23:33:23 UTC, COPY automatically triggered compression analysis an average of 698 times per day. This impacted 44.7% of all COPY operations during that period, causing an average daily overhead of 2.1 hours. In the worst case, this delayed one COPY by as much as 27.5 minutes.
Implement either of the following two options to improve COPY responsiveness by skipping the compression analysis phase: Use the column ENCODE parameter when creating any tables that will be loaded using COPY. Disable compression altogether by supplying the COMPUPDATE OFF parameter in the COPY command. The optimal solution is to use column encoding during table creation since it also maintains the benefit of storing compressed data on disk. Execute the following SQL command as a superuser in order to identify the recent COPY operations that triggered automatic compression analysis: WITH xids AS ( SELECT xid FROM stl_query WHERE userid>1 AND aborted=0 AND querytxt = 'analyze compression phase 1' GROUP BY xid) SELECT query, starttime, complyze_sec, copy_sec, copy_sql FROM (SELECT query, xid, DATE_TRUNC('s',starttime) starttime, SUBSTRING(querytxt,1,60) copy_sql, ROUND(DATEDIFF(ms,starttime,endtime)::numeric / 1000.0, 2) copy_sec FROM stl_query q JOIN xids USING (xid) WHERE querytxt NOT LIKE 'COPY ANALYZE %' AND (querytxt ILIKE 'copy %from%' OR querytxt ILIKE '% copy %from%')) a LEFT JOIN (SELECT xid, ROUND(SUM(DATEDIFF(ms,starttime,endtime))::NUMERIC / 1000.0,2) complyze_sec FROM stl_query q JOIN xids USING (xid) WHERE (querytxt LIKE 'COPY ANALYZE %' OR querytxt LIKE 'analyze compression phase %') GROUP BY xid ) b USING (xid) WHERE complyze_sec IS NOT NULL ORDER BY copy_sql, starttime;
Estimate the expected lifetime size of the table being loaded for each of the COPY commands identified by the SQL command. If you are confident that the table will remain under 10,000 rows, disable compression altogether with the COMPUPDATE OFF parameter. Otherwise, create the table with explicit compression prior to loading with COPY.
When I run the suggested query to check the COPY commands executed I realized all belonged to the Redshift bulk output from Alteryx.
Is there any way to implement this “Skip Compression Analysis During COPY” in alteryx to maximize performance as suggested by AWS?
As of right now Alteryx creates the query, so this would have to be a feature request. By default, the COPY command applies automatic compression whenever you run the COPY command with an empty target table, all of the table columns either have RAW encoding or no encoding.