Hi All
Today's we notice one odd behavior of Round function.
Formula Round(4.975, 0.01) getting 4.97, is it should be 4.98? can't figure out why and also the behavior of select tool is same.
Solved! Go to Solution.
Hi there @Yongcan
You've stumbled across one of the deep problems with floating point representation. As I'm sure you know, integers are held in binary and can be exactly represented. 0101 = 5, 0110 = 6, etc. However, floats are stored in 3 parts
1 sign bit
8 exponent bits
23 Mantissa bits
The mantissa is the closest value of the series 1*(1/2) + 0*(1/4) + 1*(1/8) + 0*(1/16) +………+ 0*(1/2^23) = 0.625.
Some decimals are exactly represented
0.5=(1*(1/2) + 0*(1/4))
0.25(0*(1/2) + 1*(1/4))
0.75(1*(1/2) + 1*(1/4))
The problem arises because not all decimals can be exactly represented by this series. 0.1 is one particularly vexing example. As you can see in the table, the expansion of 0.1 is actually 0.10000000000000001
These results were generated using the following formulas
The Fixed17 column gives an approximation of how the number is represented internally
What's troubling about these results is that the round() function and the conversion to 19.2 don't give the same results suggesting that there are 2 different rounding mechanisms in place. I think you should open a ticket to have this addressed.
But how to solve your problem. If you add a small amount to each number before rounding you'll get consistent results. The small amount should be several orders of magnitude smaller than what your round to, but several orders larger than the internal representation. In my RoundPlusSmall formula I used
round([Number]+0.00000001,0.01)
Dan
We just stumbled across this problem and are wondering if anyone ever opened a ticket to have it addressed?
Hi Dan,
Thank for providing insights into the reasons why the situation happens due to the floating point representation.
Understanding the problem is great and necessary, but we are still left with this very problematic situation that just reading a csv file and re-writing a new csv file (without doing any manipulations/conversions/rounding in between) produce 2 different files.
The solution proposed about manually hard-coding a specific rounding formula works for simple one-timers workflow, but is not viable for more complex/general data engineering workflows that process data dynamically/programmatically and can have (a) different fields coming in each time the process is run and (b) different/unknown in advance decimal points for different fields.
It seems there would be ways for Alteryx to provide solution to this problem because this doesn't happen in most other tools reading data into decimal variables. (maybe by first detecting how many decimal points there is in the field read and doing the rounding you described automatically...??)
This problem seriously challenge the viability of the product offered as we are getting worst data quality just by reading it in Alteryx (and creating even bigger data engineering burden if wanting to fix it manually as proposed).
Regards