Hey,
I'm currently restructuring some processes. So far, we have retrieved some data from a Google Big Query instance with help of the Alteryx python tool / Google Big Query tool directly in an Alteryx workflow. The data was written to a YXDB and has a size of roughly ~ 3 GB.
Due to multiple reasons we're now retrieving the data directly through python via Databricks and store it as a CSV. Because some Alteryx Workflows will be using the data, we read the CSV in, adjust the field types and sizes and store it as a YXDB. The resulting YXDB has a size of roughly ~ 6 GB.
I checked the contents of the files (on samples) and the available metadata on column names, types and size and they all seem to be exactly the same. Do you have an idea why the second YXDB is double the size of the first YXDB?