The Product Idea boards have gotten an update to better integrate them within our Product team's idea cycle! However this update does have a few unique behaviors, if you have any questions about them check out our FAQ.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Improvements to Basic Data Profiling Tool

In addition to the existing functionality, it would be good if the below functionality can also be provided.

 

1) Pattern Analysis

 

This will help profile the data in a better way, help confirm data to a standard/particular pattern, help identify outliers and take necessary corrective action.

 

Sample would be - for emails translating 'abc@gmail.com' to 'nnn@nnnn.nnn', so the outliers might be something were '@' or '.' are not present.
Other example might be phone numbers, 12345-678910 getting translated to 99999-999999, 123-456-78910 getting translated to 999-999-99999, (123)-(456):78910 getting translated to (999)-(999):99999 etc.

 

It would also help to have the Pattern Frequency Distribution alongside.

So from the above example we can see that there are 3 different patterns in which phone numbers exist and hence it might call for relevant standadization rules.


2) More granular control of profiling

 

It would be good, that, in the tool, if the profiling options (like Unique, Histogram, Percentile25 etc) can be selected differently across fields.

 

A sub-idea here might also be to check data against external third party data providers for e.g. USPS Zip validation etc, but it would be meaningful only for selected address fields, hence if there is a granular control to select type of profiling across individual fields it will make sense.

 

Note - When implementing the granular control, would also need to figure out how to put the final report in a more user friendly format as it might not conform to a standard table like definition.

 

3) Uniqueness

 

With on-going importance of identifying duplicates for the purpose of analytic results to be valid, some more uniqueness profiling can be added.

 

For example - Soundex, which is based on how similar/different two things sound.
Distance, which is based on how much traversal is needed to change one value to another, etc.

 

So along side of having Unique counts, we can also have counts if the uniqueness was to factor in Soundex, Distance and other related algorithms.

 

For example if the First Name field is having the following data -

 

Jerry
Jery
Nick
Greg
Gregg

 

The number of Unique records would be 5, where as the number of soundex unique might be only 3 and would open more data exploration opportunities to see if indeed - Jerry/Jery, Greg/Gregg are really the same person/customer etc.

 

4) Custom Rule Conformance

 

I think it would also be good if some functionality similar to multi-row formula can be provided, where we can check conformance to some custom business rules.

 

For e.g. it might be more helpful to check how many Age Units (Days/Months/Year) are blank/null where in related Age Number(1,10,50) etc are populated, rather than having vanila count of null and not null for individual (but related) columns.

 

Thanks,

Rohit

10 Comments
Prasad_dup_247
5 - Atom

 

 

Its useful  Information for data profiling.

 

Thanks,

Prasad.

Atabarezz
13 - Pulsar

Pattern analysis is a nice idea...

On the other hand fuzzy matching tool provides info on uniqueness... can use soundex algo etc.

 

I would like second some of the ideas but not all.

It may be wise to provide multiple ideas as seperate idea inputs to to enable just that...

 

Best

sundar_prithvi
5 - Atom

Would be really helpful if we have these in place in the newer version of Alteryx.

shivach
6 - Meteoroid

This will be very helpful if the above functionalities are included in upcoming versions

Rohit_Bajaj
9 - Comet

Sure Atabarezz, going ahead will post one granular idea per post.

In this case my assumption was since all changes were proposed for same tool (data profiling) hence I have clubbed together.

But it makes sense to have them separately as you correctly pointed out, will do so going ahead.

 

Regd. fuzzy match tool, most/all of the functionality in profiling tool should be possible to achieve by using some or other transformations.

For e.g. Pattern Analysis might be possible using Regex function, by having Soundex/Distance Algorithm (i.e. common fuzzy) based fuzzy profiling inside the tool, the idea was to avoid individual column level configuration.

 

Again fuzzy match would make sense not for all columns and the algorithm would vary based on the data we are dealing (i.e. Person Name vs Address etc), hence the idea was to just have basic fuzzy algorithms, and if you see that along with the sub idea 'More granular control of profiling' where in type of profiling can be controlled and different across columns, hope it will make more sense.

Atabarezz
13 - Pulsar

Exactly, when you use a fuzzy matchin tool onto the same column, input and output is the same, it works just as you mentiones.

It may be wise to self-develop a macro for that... I'll look into that if I got some spare time...

andrewdatakim
12 - Quasar
12 - Quasar

I would like the ability to export the Profiling reports including graphics, similar to the results table (this would eliminate the need for the Basic Data Profile Tool) , to share with team members during the development process.

RodLight
8 - Asteroid

I have added a different idea related to the profiling of numeric values...

Some of you may want to vote that one up as well...

https://community.alteryx.com/t5/Alteryx-Designer-Ideas/Choice-of-plots-in-Profiling-of-Browse-tool/...

 

Community_Admin
Alteryx
Alteryx
Status changed to: Inactive
 
Community_Admin
Alteryx
Alteryx

The status of this idea has been changed to 'Inactive'. This status indicates that:

 

1. The idea has not had activity in the form of likes or comments in over a year.

2. The idea has not reached ten likes.

3. The idea is still in the 'New Idea' status. 

 

However, this doesn't mean your idea won't be implemented! The Community can still like and comment on this idea. With enough renewed interest, this idea can be brought back into the 'New Idea' status. 

 

Thank you for contributing to the Alteryx Community and the Alteryx Product Idea Boards!