Hi there! I had a large file and it contained a unique ID. I was trying to see whether this ID had been duplicated in the previous step so I added a unique tool and outputted the results. As you know, a unique tool automatically outputs data sorted. And I discovered that the sorted one is much smaller than the unsorted one (as below) even though they have the same records.
My question is should these two files be the same only except for the order difference? Thanks!
@chloehong445
I have never observed or thought this way.
The record number is same so maybe the data structure is optimized after sorting?
I think so, just want to make sure 🤣
@chloehong445
I think you are correct.
I opened the sample flow of Sorting tool, the data size changes slightly before and after.
The original data size is 904 bytes, and after the Sorting too, it varies from 891 to 901 bytes.
Given the extremely small size of the sample data, the data size difference can be amplied for the big data.
Yes it is normal. Nothing gets lost by the way, only order changes.
hi @chloehong445 Great observation! I didn't know that.
According to the manual, it says .yxdb format is compressed, so sorting data might help compress data in more compact way. (I'm not sure of the exact principle or algorithm though😅)
you can also use the test tool for extra comfort that they are the same?
or atleast things such as number of records etc