I have a huge data about 57 million rows and after cracking my head on it for one whole day, I (and my team) still could not find out the reason for the following discrepancy: I have a browse tool and summarize tool connected to the same data set as shown in first screen capture. The output of summarize tool shows 791 million pounds quantity. However, output of browse shows 70 million quantity as seen in the other 2 screen captures. There is no way to identify the true value due to the huge file. Any help would be appreciated.
Workflow:
Summarize Output:
Browse Output:
Solved! Go to Solution.
Hi Jean, PFB SS of browse tool:
I would expect quantity to be integer but it's a double field. Does it contain decimal values ?
I won't expect that field to contain decimals. I tried converting it to Int64 using select tool before summarize and browse, but similar result.
Can you try to compute min, max and average with summarize ? To check consistency with Browse tool.
Yes I did that, there was some discrepancy (with Max, Avg and Median) with that as well. Min looks good though.
Summarize Result:
Sum_Act.delivery qty | CountNonNull_Act.delivery qty | Min_Act.delivery qty | Max_Act.delivery qty | Avg_Act.delivery qty | Median_Act.delivery qty |
791970318 | 56901306 | -28330 | 2166528 | 13.9183153 | 2 |
And Browse output:
I just discovered that Browse data profiling is capped at 300MB. Does the amount of data is greater than that ?
Yes!! The data is over 2GB! That must be it, then!
Are the summarize numbers true then? Can I use those for further calculation?
Extract from this documentation about Browse tool : https://help.alteryx.com/20212/designer/browse-tool
So you can trust the Summarize result !
User | Count |
---|---|
19 | |
14 | |
13 | |
9 | |
8 |