Alteryx Designer Desktop Discussions

chloehong445 · ‎01-28-2024

Hi there! I had a large file and it contained a unique ID. I was trying to see whether this ID had been duplicated in the previous step so I added a unique tool and outputted the results. As you know, a unique tool automatically outputs data sorted. And I discovered that the sorted one is much smaller than the unsorted one (as below) even though they have the same records.

My question is should these two files be the same only except for the order difference? Thanks!

Qiu · ‎01-28-2024

@chloehong445
I have never observed or thought this way.
The record number is same so maybe the data structure is optimized after sorting?

chloehong445 · ‎01-28-2024

I think so, just want to make sure 🤣

Qiu · ‎01-28-2024

@chloehong445
I think you are correct.

I opened the sample flow of Sorting tool, the data size changes slightly before and after.
The original data size is 904 bytes, and after the Sorting too, it varies from 891 to 901 bytes.

Given the extremely small size of the sample data, the data size difference can be amplied for the big data.

caltang · ‎01-28-2024

Yes it is normal. Nothing gets lost by the way, only order changes.

Calvin Tang
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/

gawa · ‎01-28-2024

hi @chloehong445 Great observation! I didn't know that.

According to the manual, it says .yxdb format is compressed, so sorting data might help compress data in more compact way. (I'm not sure of the exact principle or algorithm though😅)

https://help.alteryx.com/current/en/designer/data-sources/alteryx-database-file-format.html#alteryx-...

aatalai · ‎01-29-2024

you can also use the test tool for extra comfort that they are the same?

aatalai · ‎01-29-2024

or atleast things such as number of records etc

Alteryx Designer Desktop Discussions

Is it normal that yxdb gets smaller after sorted?

Re: Change Data Type of Input Data before Reading

Re: Change Data Type of Input Data before Reading

Re: Join versus Union

Re: Filter

Re: Regex help please - Parsing a big text area