Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Are there any known issues with the DataCleanse tool?

john_watkins
11 - Bolide

I have an odd issue where I have a data cleansing tool in a batch macro to clean up a string field.  It is set to remove leading/trailing whitespace, tabs, newlines, duplicate whitespace and make the result uppercase.    It seems to be missing a few records along the way.   Are there any potential known issues with 2023.1 and the cleanse or potential issues with AMP being enabled?    It is missing some tabs and spaces but on 30 rows out of 2M.

8 REPLIES 8
rzdodson
12 - Quasar

I have had various issues with the Data Cleanse tool around various versions of Alteryx.

What I would recommend to do in your use case is utilize a Multi-Field Formula tool instead. Based on what you need to accomplish, you can select all of your string fields, develop formula logic to handle the changes you need inside that MFF tool (e.g. remove leading characters, etc.), and it is a more stable approach that consistently delivers the expected results, in my opinion.

ChrisTX
15 - Aurora

I'm not aware of any issues like that.  For the characters that were missed, can you view the HEX code in a text editor like Notepad++?

Just thinking a "space" is different from a "non-breaking space".   Not sure about the tab though.

 

Maybe create a test input file with only 5 or 10 records, verify the "spaces" have the same HEX codes, and verify all spaces are cleansed correctly.

Then turn off AMP and test again.

Then add the other 2M records to see if it could be an issue with the number or records and/or with AMP.

 

Then the create an Alteryx support ticket.

 

The joys of debugging.  I'm working with an intermittent issue now with the Data Profile tool.  Not fun.

 

Chris

apathetichell
18 - Pollux

Datacleanse is a ram/process nightmare. I would recommend skipping it for large datasets and using a multi-field formula tool instead.

john_watkins
11 - Bolide

I did try a few ways to see what the characters actually were.  One was by pasting the results to Regex101, and I also did as you suggested to put that into Notepad++.  The characters seemed to be "tabs", spaces, and potentially newline characters.   The most confusing part is that the data had already gone through a data cleanser that specifically removed those characters not once, but twice, and it left these problem records.   I ended up trying to break all of this apart using the "REGEX" tool in Alteryx to parse the field into three pieces.   This was anything before what I wanted, the data I wanted, and then whatever came after.   This allowed me to look again what and how many were failing.   This code did clean all the records as I kept only the middle piece, removed the two additional columns and swapped this column out for the original.  (^[\t\s]*)([\w].*[A-Za-z0-9.~\-`]*)([\s\t\n\r]*)$

 

When the same data was run through a Data Cleansing tool with the exact same configuration it worked just fine and stripped out the characters in a NEW workflow.

 

Apathetic - I hear you on the tool, but in many cases it gets the job done with smaller datasets without issues and faster than me writing all the cleansing.  (unless it now doesn't work consistently).

 

I originally thought I had AMP turned on for either the calling workflow or the batch macro, but after review, it is turned on for neither.  My hypothesis that this was somehow AMP related should not be true based on this unless the Data Cleanse tool has been modified and needs AMP turned on.

 

ChrisTX
15 - Aurora

With a large complex workflow, calling many macros and modifying the inner XML for some of the tools in the macro, I ran into several unexplained problems, which were "fixed" when I copied the workflow to a NEW workflow, and manually deleted and re-added each macro call.  Not ideal, but with a large workflow that's been through a lot of development, sometimes a NEW workflow can help.

 

I can't imagine the problems I would have encountered if AMP was turned on.

Alteryx support was willing to help, but the problems were intermittent and not easily reproducible.

 

Chris

john_watkins
11 - Bolide

Chris, I don't know exactly why but the copy/paste to new workflow is also a good way around it.  I fought through the bug as I don't like any "magic" in my workflows, but it would have been a LOT easier to just copy/paste and not worry why the error was there.

jdminton
12 - Quasar

I've run into the issue in the past where newlines in quoted fields cause an issue with AMP. You can get around that in the input tool with the "allow newlines in quoted fields" option though, so it shouldn't be an issue by the time you get to the macro. 

ChrisTX
15 - Aurora

@john_watkins I learned a long time ago to pick my battles when deciding how much time I'll spend to determine "why the error was there".

 

If I'm confident the error is within the software, not within my workflow, I'll follow any reasonable path to get past the error, and just chalk up the experience to "no software is perfect".

 

I will frequently submit a defect ticket if I can reproduce the error.  Because if a user finds an error with my code (which of course rarely happens) /sarcasm, I appreciate it when someone takes the time to let me know.

 

Chris

Labels