Alteryx Server Discussions

Find answers, ask questions, and share expertise about Alteryx Server.
SOLVED

Skip Compression Analysis During Redshift tables COPY

bbi-solutions
5 - Atom

Hello All,

I received from an AWS adviser the following message:

_____________________________________________

Skip Compression Analysis During COPY
Checks for COPY operations delayed by automatic compression analysis.

Rebuilding uncompressed tables with column encoding would improve the performance of 2,781 recent COPY operations.
This analysis checks for COPY operations delayed by automatic compression analysis. COPY performs a compression analysis phase when loading to empty tables without column compression encodings. You can optimize your table definitions to permanently skip this phase without any negative impacts.

Observation

Between 2018-10-29 00:00:00 UTC and 2018-11-01 23:33:23 UTC, COPY automatically triggered compression analysis an average of 698 times per day. This impacted 44.7% of all COPY operations during that period, causing an average daily overhead of 2.1 hours. In the worst case, this delayed one COPY by as much as 27.5 minutes.

Recommendation

Implement either of the following two options to improve COPY responsiveness by skipping the compression analysis phase:
Use the column ENCODE parameter when creating any tables that will be loaded using COPY.
Disable compression altogether by supplying the COMPUPDATE OFF parameter in the COPY command.
The optimal solution is to use column encoding during table creation since it also maintains the benefit of storing compressed data on disk. Execute the following SQL command as a superuser in order to identify the recent COPY operations that triggered automatic compression analysis:
WITH xids AS (
SELECT xid FROM stl_query WHERE userid>1 AND aborted=0
AND querytxt = 'analyze compression phase 1' GROUP BY xid)
SELECT query, starttime, complyze_sec, copy_sec, copy_sql
FROM (SELECT query, xid, DATE_TRUNC('s',starttime) starttime,
SUBSTRING(querytxt,1,60) copy_sql,
ROUND(DATEDIFF(ms,starttime,endtime)::numeric / 1000.0, 2) copy_sec
FROM stl_query q JOIN xids USING (xid)
WHERE querytxt NOT LIKE 'COPY ANALYZE %'
AND (querytxt ILIKE 'copy %from%' OR querytxt ILIKE '% copy %from%')) a
LEFT JOIN (SELECT xid,
ROUND(SUM(DATEDIFF(ms,starttime,endtime))::NUMERIC / 1000.0,2) complyze_sec
FROM stl_query q JOIN xids USING (xid)
WHERE (querytxt LIKE 'COPY ANALYZE %'
OR querytxt LIKE 'analyze compression phase %') GROUP BY xid ) b USING (xid)
WHERE complyze_sec IS NOT NULL ORDER BY copy_sql, starttime;

Estimate the expected lifetime size of the table being loaded for each of the COPY commands identified by the SQL command. If you are confident that the table will remain under 10,000 rows, disable compression altogether with the COMPUPDATE OFF parameter. Otherwise, create the table with explicit compression prior to loading with COPY.

_____________________________________________

 

When I run the suggested query to check the COPY commands executed I realized all belonged to the Redshift bulk output from Alteryx.

 

Is there any way to implement this “Skip Compression Analysis During COPY” in alteryx to maximize performance as suggested by AWS?

 

Thank you in advance,

 

Gabriel

2 REPLIES 2
KyleB
Alteryx
Alteryx

Hello @bbi-solutions,

 

As of right now Alteryx creates the query, so this would have to be a feature request.  By default, the COPY command applies automatic compression whenever you run the COPY command with an empty target table, all of the table columns either have RAW encoding or no encoding.  

 

To request this, please head on over to: Product Ideas

 

Thanks!

bbi-solutions
5 - Atom

@KyleB, thank you for your reply.

I will make a request in the Product Ideas.

 

Best regards, Gabriel