Be sure to review our Idea Submission Guidelines for more information!
Submission GuidelinesHello All,
I received from an AWS adviser the following message:
_____________________________________________
Skip Compression Analysis During COPY
Checks for COPY operations delayed by automatic compression analysis.
Rebuilding uncompressed tables with column encoding would improve the performance of 2,781 recent COPY operations.
This analysis checks for COPY operations delayed by automatic compression analysis. COPY performs a compression analysis phase when loading to empty tables without column compression encodings. You can optimize your table definitions to permanently skip this phase without any negative impacts.
Observation
Between 2018-10-29 00:00:00 UTC and 2018-11-01 23:33:23 UTC, COPY automatically triggered compression analysis an average of 698 times per day. This impacted 44.7% of all COPY operations during that period, causing an average daily overhead of 2.1 hours. In the worst case, this delayed one COPY by as much as 27.5 minutes.
Recommendation
Implement either of the following two options to improve COPY responsiveness by skipping the compression analysis phase:
Use the column ENCODE parameter when creating any tables that will be loaded using COPY.
Disable compression altogether by supplying the COMPUPDATE OFF parameter in the COPY command.
The optimal solution is to use column encoding during table creation since it also maintains the benefit of storing compressed data on disk. Execute the following SQL command as a superuser in order to identify the recent COPY operations that triggered automatic compression analysis:
WITH xids AS (
SELECT xid FROM stl_query WHERE userid>1 AND aborted=0
AND querytxt = 'analyze compression phase 1' GROUP BY xid)
SELECT query, starttime, complyze_sec, copy_sec, copy_sql
FROM (SELECT query, xid, DATE_TRUNC('s',starttime) starttime,
SUBSTRING(querytxt,1,60) copy_sql,
ROUND(DATEDIFF(ms,starttime,endtime)::numeric / 1000.0, 2) copy_sec
FROM stl_query q JOIN xids USING (xid)
WHERE querytxt NOT LIKE 'COPY ANALYZE %'
AND (querytxt ILIKE 'copy %from%' OR querytxt ILIKE '% copy %from%')) a
LEFT JOIN (SELECT xid,
ROUND(SUM(DATEDIFF(ms,starttime,endtime))::NUMERIC / 1000.0,2) complyze_sec
FROM stl_query q JOIN xids USING (xid)
WHERE (querytxt LIKE 'COPY ANALYZE %'
OR querytxt LIKE 'analyze compression phase %') GROUP BY xid ) b USING (xid)
WHERE complyze_sec IS NOT NULL ORDER BY copy_sql, starttime;
Estimate the expected lifetime size of the table being loaded for each of the COPY commands identified by the SQL command. If you are confident that the table will remain under 10,000 rows, disable compression altogether with the COMPUPDATE OFF parameter. Otherwise, create the table with explicit compression prior to loading with COPY.
_____________________________________________
When I run the suggested query to check the COPY commands executed I realized all belonged to the Redshift bulk output from Alteryx.
Is there any way to implement this “Skip Compression Analysis During COPY” in alteryx to maximize performance as suggested by AWS?
Thank you in advance,
Gabriel
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.