Dear community,
As a starting user of Alteryx, I am puzzled by the behavior of the Profile in a Browse tool. In my super simple analysis I determine the length of a field with following Profile result:
The profile tells me I am having approx. 8.2M records in my data set, but the distinct values for length of the field in that dataset (between 2 and 18 characters) adds up to approx. 5.5M records. Seems there is data missing for about 2.7M records. Generating the profile costs a lot of calculation time, but the information presented seems incomplete. I have seen somewhere in the community the result is capped at 300MB, but if that is happening should there not be a visible hint of that fact being given? What gives?
Hope somebody can throw some light on this, am using Designer version 2020.4.
Cheers,
Arjan
Hello @ArjanF
Thanks for reaching out to the Community!
Are you able to share your workflow so that the Community can help to troubleshoot further?
From your screenshot, it appears you have begun to dig into the data within the browse tool, so if we can see the workflow and some sample data it will help to better understand what is causing the difference in the records you believe you should have, and the records being reported.
Here are some resources for you to check out about the browse tool and its data profiling:
Tool Mastery | Browse
Data Profiling in the Browse Tool
Thanks!
TrevorS
Hello TrevorS,
It took some time to anonymize my data set, as working with a smaller set doesn't show the problematic behavior in the Browse Tool. Based on that, I do believe the behavior is caused by size of input file. Apologies upfront for sharing a packed workflow of 24MB, but I could not make this smaller. The problematic behavior is shown in the Browse Tool marked in picture below. I have put a Summarize and another Browse based on the same node as the problematic Browse, to make it clear where the problem is (I hope). Any pointers would be very welcome.
Thanks for any support/insights on this topic.
ArjanF
Any chance this is a memory issue on your end?
This shows up when I summarize your summarize to get a total count:
Sum_Count
8552322
The browse tool you marked also showed 8552322 records....
Have you tried a frequency table tool?
Hi apathetichell,
Total amount of records is not the problem, memory on my end should also not be the problem as I am running on a 16 GB machine, should be enough I believe.
I made a picture to point it out more clearly I hope.
Kind regards,
ArjanF
Honestly - I'd chalk it up to limits of Browse.
For this amount of data you should be using something like Field Summary to get analysis. Clearly the data is there and the machine isn't cutting anything out - as the summarize functions are working properly.
That might all be true, but, in my humble opinion, the software should give some visual guidance to that fact (assuming data volume is triggering this behavior). With the implementation in 2020.4 I get no (visible) feedback the results are unreliable as profile is not created on the full dataset! That is by itself very bad as unaware users will reach the wrong conclusion as I initially did. Again, in my humble opinion, either show a full and correct result, or don't show a result (but a message pertaining to the data size) as the current potential partial result leaves users guessing at the correctness of the Profile information.
Is there somewhere a feature request / bug fix section where this can be brought to the developers of Alteryx?
Kind regards,
ArjanF