This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Given a choice between the delivered macro and the CReW macro, I’ll choose the CReW macro for both speed and functionality.Wikipedia says, “Data cleansing or data cleaning is the process of detecting and correcting (or removing) corrupt or inaccurate records from a record set, table, or database and refers to identifying incomplete, incorrect, inaccurate or irrelevant parts of the data and then replacing, modifying, or deleting the dirty or coarse data.”If Alteryx were to convert the macro to a true tool, here is my feature request list:
AMP compatible – Fast!
Faster than the CReW macro for deleting empty fields/rows
Resolve time it takes to load the tool (current macro versions are slow), html is faster.
Allow selection of fields based on data type
Include incoming/outgoing SELECT functionality
Allow for PREFIX functionality (like multi-field formula), but NOT default
Read incoming metadata to provide color coding of fields to indicate where potential problems exist (e.g. NULL, Whitespace) – part of browse everywhere currently
Allow for Nulls to convert to 0/blank or 0/blank to convert to Null
When removing punctuation, provide for exceptions (e.g. Numeric set of negative, comma and period).
Include HTML tag removal
Support internationalization (character sets)
Going the extra mile:
Display or opt for output, cleanup metrics.How dirty was my data?Potentially, allow for ERROR to stop workflow if garbage is detected.
Optional:Detect outliers in numeric data.I’ve got an outlier detection macro that we can review, but while you are passing all of the data for numeric values, explaining or tagging outliers would be useful.Could be a box-whisker on numeric values maybe?
Make outlier actionable
Identify in data (new field indicator)
Test/Preview against metadata:(pre-run), see what the incoming/outgoing results would be on *all of the metadata before I run the workflow.