This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
6/17/21: We have completed maintenance for the Search functionality on the Community. If you are seeing any issues, please try to clear your cache first. If the issue persists please email Community@alteryx.com
I have a workflow that starts with in-DB inputs/joins then streams this to my local computer for another join, auto-field on the string fields, and then outputs a Tableau Hyper file. 44M records, maybe 100 columns.
When I run this under the normal engine, it takes about 6 hours and the Hyper file is 5.3GB.
Using the same workflow, but turning on AMP, it runs faster (4 hrs) but the resulting Hyper file is now 7.5GB.
The documentation doesn't list Tableau as a supported output for AMP yet, and says it will run under normal engine, which is fine, but why is the file size 50% larger? I have a Block-until-Done right before the Output tool, so all processing is completed first.
Has anyone else had their output file size increase for Hyper format?
We have seen the same behavior during our testing. This is likely due to the way that AMP handles data type assignment on Input.
AMP attempts to address a historic issue where the size of the field may not be large enough when processed by a downstream tool. There should be less need to add Select tools to change data types when resulting data will exceed the length of the original data type. AMP creates the maximum size field for strings and integers so that subsequent operations will have the necessary room to hold larger downstream values. This will result in larger byte sizes of resulting files when AMP is enabled.
Thanks for the response. I submitted a ticket to tech support and the solution was to add a Sort tool before the Output to Hyper when using AMP. This seems to correct the issue, though I'm not fully clear why the hyper engine in the Output tool cares about the order of the data. This implies that depending on the order of my input data, I could get drastically different outputs if I don't sort them. Personally. I think such a hidden "feature" is more of a "bug" (i.e. couldn't a sort option be added within the Output tool, or just for Hyper? or find out why Hyper engine cares?), since a user wouldn't know about this without reading this thread. But it seems to work, so I'll go with it. Maybe you can figure out the true cause?