ALTERYX INSPIRE | Join us this May for for a multi-day virtual analytics + data science experience like no other! Register Now

Alteryx Designer Ideas

Share your Designer product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Cleanse data enhancements

Cleanse Macro

Given a choice between the delivered macro and the CReW macro, I’ll choose the CReW macro for both speed and functionality.  Wikipedia says, “Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.”  If Alteryx were to convert the macro to a true tool, here is my feature request list:

Performance:

  • AMP compatible – Fast!
  • Faster than the CReW macro for deleting empty fields/rows
  • Resolve time it takes to load the tool (current macro versions are slow), html is faster.

Feature Enhancement:

  • Allow selection of fields based on data type
  • Include incoming/outgoing SELECT functionality
  • Allow for PREFIX functionality (like multi-field formula), but NOT default
  • Read incoming metadata to provide color coding of fields to indicate where potential problems exist (e.g. NULL, Whitespace) – part of browse everywhere currently
  • Allow for Nulls to convert to 0/blank or 0/blank to convert to Null
  • When removing punctuation, provide for exceptions (e.g. Numeric set of negative, comma and period).
  • Include HTML tag removal
  • Support internationalization (character sets)

Going the extra mile:

  • Display or opt for output, cleanup metrics.  How dirty was my data?  Potentially, allow for ERROR to stop workflow if garbage is detected.
  • Optional:  Detect outliers in numeric data.  I’ve got an outlier detection macro that we can review, but while you are passing all of the data for numeric values, explaining or tagging outliers would be useful.  Could be a box-whisker on numeric values maybe?
    • Make outlier actionable
      • Identify in data (new field indicator)
      • Remove
      • Modify/Impute
    • Test/Preview against metadata:  (pre-run), see what the incoming/outgoing results would be on *all of the metadata before I run the workflow.
    • camelCase:  https://en.wikipedia.org/wiki/Camel_case
    • Identify/Replace unknown values (e.g. N/A, Not Applicable, #) with Null() or other?
    • Identify/Remove duplicate values within a cell
    • See also:  https://en.wikipedia.org/wiki/Data_cleansing
    • Option to point to a “personal” dictionary for spelling or validation
    • Provide “smart” annotation on tool
2 Comments
joe_lipski
12 - Quasar
12 - Quasar

I don't know if this would be an addition to the cleanse tool or a function that you can do directly from browse, but it would be great if you could auto-fix the problematic data from the data profiling

 

joe_lipski_0-1614772244379.png

 

https://community.alteryx.com/t5/Alteryx-Designer-Ideas/Filter-in-Browse-for-quot-not-ok-quot-record...

 

SeanAdams
17 - Castor
17 - Castor

I really like the idea of having an output that tells you how much cleanup was done!

 

I'd love to have an option in the cleanse tool to "clean up numbers" which would allow for +;-;space;comma;and number characters but clean up everything else.