This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
this is is my first look at Alteryx ! I have a large csv file of over a million lines. One of the fields is an item ID which has very poor quality with differing issues :
special characters within the item ID
item description added to the item ID
unit of measure added to the item ID
Can anyone help on creating a final clean ID field ?
The first issue is I don’t know which is the correct ID as I’m getting data from multiple sources and on a weekly basis so a master lookup will need to be constantly updated. Is it possible to group them using Soundex to start with to see the most likely matches?
For the special characters and whitespace you can use the data cleansing tool.
For the unit of measure use the find/replace tool (most likely using a list of units in a text input) or formula tool to convert them to blanks "".
Finally, if the item description follows a specific format you could use regex in the formula tool to replace it with blanks "". If that's not right could you please attach an example of what the item description is?