Skip Compression Analysis During Redshift tables COPY
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello All,
I received from an AWS adviser the following message:
_____________________________________________
Skip Compression Analysis During COPY
Checks for COPY operations delayed by automatic compression analysis.
Rebuilding uncompressed tables with column encoding would improve the performance of 2,781 recent COPY operations.
This analysis checks for COPY operations delayed by automatic compression analysis. COPY performs a compression analysis phase when loading to empty tables without column compression encodings. You can optimize your table definitions to permanently skip this phase without any negative impacts.
Observation
Between 2018-10-29 00:00:00 UTC and 2018-11-01 23:33:23 UTC, COPY automatically triggered compression analysis an average of 698 times per day. This impacted 44.7% of all COPY operations during that period, causing an average daily overhead of 2.1 hours. In the worst case, this delayed one COPY by as much as 27.5 minutes.
Recommendation
Implement either of the following two options to improve COPY responsiveness by skipping the compression analysis phase:
Use the column ENCODE parameter when creating any tables that will be loaded using COPY.
Disable compression altogether by supplying the COMPUPDATE OFF parameter in the COPY command.
The optimal solution is to use column encoding during table creation since it also maintains the benefit of storing compressed data on disk. Execute the following SQL command as a superuser in order to identify the recent COPY operations that triggered automatic compression analysis:
WITH xids AS (
SELECT xid FROM stl_query WHERE userid>1 AND aborted=0
AND querytxt = 'analyze compression phase 1' GROUP BY xid)
SELECT query, starttime, complyze_sec, copy_sec, copy_sql
FROM (SELECT query, xid, DATE_TRUNC('s',starttime) starttime,
SUBSTRING(querytxt,1,60) copy_sql,
ROUND(DATEDIFF(ms,starttime,endtime)::numeric / 1000.0, 2) copy_sec
FROM stl_query q JOIN xids USING (xid)
WHERE querytxt NOT LIKE 'COPY ANALYZE %'
AND (querytxt ILIKE 'copy %from%' OR querytxt ILIKE '% copy %from%')) a
LEFT JOIN (SELECT xid,
ROUND(SUM(DATEDIFF(ms,starttime,endtime))::NUMERIC / 1000.0,2) complyze_sec
FROM stl_query q JOIN xids USING (xid)
WHERE (querytxt LIKE 'COPY ANALYZE %'
OR querytxt LIKE 'analyze compression phase %') GROUP BY xid ) b USING (xid)
WHERE complyze_sec IS NOT NULL ORDER BY copy_sql, starttime;
Estimate the expected lifetime size of the table being loaded for each of the COPY commands identified by the SQL command. If you are confident that the table will remain under 10,000 rows, disable compression altogether with the COMPUPDATE OFF parameter. Otherwise, create the table with explicit compression prior to loading with COPY.
_____________________________________________
When I run the suggested query to check the COPY commands executed I realized all belonged to the Redshift bulk output from Alteryx.
Is there any way to implement this “Skip Compression Analysis During COPY” in alteryx to maximize performance as suggested by AWS?
Thank you in advance,
Gabriel
Solved! Go to Solution.
- Labels:
- Output
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello @bbi-solutions,
As of right now Alteryx creates the query, so this would have to be a feature request. By default, the COPY command applies automatic compression whenever you run the COPY command with an empty target table, all of the table columns either have RAW encoding or no encoding.
To request this, please head on over to: Product Ideas
Thanks!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
