Be sure to review our Idea Submission Guidelines for more information!
Submission GuidelinesNOTE: There are other Idea posts for improvement of the Browse Profiling functionality, but I did not find anything specific to this and feel these ideas should be segregated anyway.
I just discovered that the plot in the Browse tool profiling section when plotting numeric values has differing behavior.
According to the documentation, "Once more than 10,000 unique values are profiled, binning is applied to increase performance and to represent data in a a more meaningful way."
What this means is that for numeric data, a scatterplot is shown if there are less than 10,000 unique values, and a frequency plot (bar chart) is shown if more than 10,000 unique values. There is then an indication that "Only the top 20 unique values are shown".
I can see where with some situations (e.g., an integer value), a frequency plot that shows the more predominant values would be a good thing to see.
However I would argue that a frequency plot of numeric data that is basically a “double” data type can be pretty meaningless…since out of 10,001 values, you might have 10,001 UNIQUE values…so you end up with a frequency plot that is not of much value (where as the scatterplot would still allow a user to see the dispersion of the ENTIRE data set).
I’ve attached an example to easily show this.
It would be great if the user could choose the plot he wants for a specific set of data…similar to the choices that occur when a date field is present in the data.
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.