Hi all,
I have a set of data consisting of relatively large numbers, and I am trying to aggregate them using the Summarize tool. The issue is that I seem to get inconsistent results every time I run the workflow.
For example,
I understand that the double data type has 15 digits of precision, and my initial understanding is that 15 digits of precision means 15 significant figures. However, based on my run results, this does not seem to be the case.
Does 15 digits of precision actually mean that any digit beyond the 15th digit will become random? Are such inconsistent results expected with large numbers, and does this mean I will not be able to produce the exact same results between runs?
[Edit: I saw a related post here Big numbers and Alteryx . Not sure if it has anything to do with my issue]
@Alteryx_newbie can you try with fixed decimal data type?
@Alteryx_newbie
Can you do the following, add a Count of the lines that been summed up and then run the tool again, see if Alteryx is actually using the same amount of rows, just for the sake to ensure that you are actually summing up the same data.
@OTrieger the count is consistent
@binuacs I get 5,973,734,380,328,303 when I run with fixed decimal data type. Although the data type keeps my results consistent, I try not to use fixed decimal as it might increase run time.
Still don't understand why the results are so inconsistent when it comes to double data type. I'd expect to get 5,973,734,380,328,300 (15 sig. fig.) every time
@Alteryx_newbie you mentioned
I try not to use fixed decimal as it might increase run time.
the article Big numbers and Alteryx mentions
Only when precision falls to about 15 significant digits do the values converge.
If you really need to do precise addition, you can. Just don't expect it to be as fast.
You can choose either reliable precision or fast runtime, not both.
Chris
Fixed Decimal is the CORRECT format to use here - not Double. Doubles are fractions. they round. They have issues at very large or very small numbers. Fixed Decimals are precise. You want precision. Use precision.
I try not to use fixed decimal as it might increase run time. - This is illogical. You use the correct format for your data.
Hi both @ChrisTX and @apathetichell, thanks for the responses. I understand that fixed decimal gives me precision but I think the issue here is I am trying to understand why the results are so inconsistent when it comes to the double data type. Could you please give me more insights on the former?
When you mention "They have issues at very large or very small numbers", what exact issues are they and what causes them?
Hey - as I noted in my response. DOUBLE is stored as a fraction. That means that instead of a multi digit number there are numerators and denominators representing the number.
try this test:
1) text input tool.
field 1 - 1
filed 2- 3
generate rows -> generate 1,000,000 rows.
formula tool - double type
[field1]/[field2]
formula tools - fixed decimal type
[field]/[field2]
summarize tool - both.
at a certain point the double field will start trying to represent the sum values as a different fraction. it will round.
@Alteryx_newbie for this statement:
I have a set of data consisting of relatively large numbers
Can you provide examples of these large numbers? Guessing the numbers include decimals? Or is every single number an integer, with no decimals?
Are you using any "blue tools" (from the Preparation category) to process your large numbers?
Chris