Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!
The Product Idea boards have gotten an update to better integrate them within our Product team's idea cycle! However this update does have a few unique behaviors, if you have any questions about them check out our FAQ.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Cleanse data enhancements

Cleanse Macro

Given a choice between the delivered macro and the CReW macro, I’ll choose the CReW macro for both speed and functionality.  Wikipedia says, “Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.”  If Alteryx were to convert the macro to a true tool, here is my feature request list:

Performance:

  • AMP compatible – Fast!
  • Faster than the CReW macro for deleting empty fields/rows
  • Resolve time it takes to load the tool (current macro versions are slow), html is faster.

Feature Enhancement:

  • Allow selection of fields based on data type
  • Include incoming/outgoing SELECT functionality
  • Allow for PREFIX functionality (like multi-field formula), but NOT default
  • Read incoming metadata to provide color coding of fields to indicate where potential problems exist (e.g. NULL, Whitespace) – part of browse everywhere currently
  • Allow for Nulls to convert to 0/blank or 0/blank to convert to Null
  • When removing punctuation, provide for exceptions (e.g. Numeric set of negative, comma and period).
  • Include HTML tag removal
  • Support internationalization (character sets)

Going the extra mile:

  • Display or opt for output, cleanup metrics.  How dirty was my data?  Potentially, allow for ERROR to stop workflow if garbage is detected.
  • Optional:  Detect outliers in numeric data.  I’ve got an outlier detection macro that we can review, but while you are passing all of the data for numeric values, explaining or tagging outliers would be useful.  Could be a box-whisker on numeric values maybe?
    • Make outlier actionable
      • Identify in data (new field indicator)
      • Remove
      • Modify/Impute
    • Test/Preview against metadata:  (pre-run), see what the incoming/outgoing results would be on *all of the metadata before I run the workflow.
    • camelCase:  https://en.wikipedia.org/wiki/Camel_case
    • Identify/Replace unknown values (e.g. N/A, Not Applicable, #) with Null() or other?
    • Identify/Remove duplicate values within a cell
    • See also:  https://en.wikipedia.org/wiki/Data_cleansing
    • Option to point to a “personal” dictionary for spelling or validation
    • Provide “smart” annotation on tool
4 Comments
Joe_Lipski
13 - Pulsar
13 - Pulsar

I don't know if this would be an addition to the cleanse tool or a function that you can do directly from browse, but it would be great if you could auto-fix the problematic data from the data profiling

 

joe_lipski_0-1614772244379.png

 

https://community.alteryx.com/t5/Alteryx-Designer-Ideas/Filter-in-Browse-for-quot-not-ok-quot-record...

 

SeanAdams
17 - Castor
17 - Castor

I really like the idea of having an output that tells you how much cleanup was done!

 

I'd love to have an option in the cleanse tool to "clean up numbers" which would allow for +;-;space;comma;and number characters but clean up everything else.

 
AlteryxCommunityTeam
Alteryx Community Team
Alteryx Community Team
Status changed to: Accepting Votes
 
haraldharders
9 - Comet

I support everything which was said above (especially the possibility to replace empty cells by Null()). If you ask me, the Cleansing tool should be changed to a "true" tool rather than a macro. I personally have stopped using Data Cleansing at all and do everything with Multi-Field Formulae or Filters, depending on what I want to achieve.