Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Please help with data conversion (not sure if possible)

Kittycatdance
5 - Atom

Hello all, 

I have data that I am trying to cleanse and is a very manual time consuming process I need to perform frequently. Could anyone please help me create a workflow to cleanse. I have tried but am really struggling. I am not sure it is possible but would appreciate any help.

 

I have attached an example data set containing the raw data in Tab 1, the semi cleansed data in Tab 2 (the part I think is possible to do) and the fully cleansed data in Tab 3 (the part I am less confident is possible). 

 

If anyone could provide any insight I would really appreciate it. If I figure this out myself I will let everyone know.

4 REPLIES 4
Claje
14 - Magnetar

Hi,


It doesn't look like there's an attachment on your post.


Can you try uploading it again?

Thanks!

Kittycatdance
5 - Atom

Sorry, I'm new. Uploaded now

Claje
14 - Magnetar

No need to apologize! Welcome to the Community!

You're right that this is a pretty tricky data cleansing activity.  I have built and attached a workflow which should accomplish what you need - you will just need to change the file path for the input to reference your file.

This process is based on the rules that I found in the example you sent, so there may be other data in your process that breaks the rules I built.

I wrote up some quick notes on each tool in the annotations so that you can hopefully get a feel for the "why" behind each piece.

The core element of this solution that drives it and makes it work is Regular Expressions.  There is ample documentation on the internet about these expressions, but I've tried to summarize the two I included in the workflow here:

ID (\d+) (.*) (?:\(.*)Posted by (.*) on (.*)\).*
/*This expression is used on the row where the Journal Number is found.  It looks for a string to start with "ID ", then splits any series of digits followed by a space into the "Journal Number" field.

We then take all text until a parentheses is opened followed by the words Posted by (This is the piece I am least confident of.  It seems to work in this example but I don't know if it always works).

The next field we pull is text after "Posted by" until we find " on ", which looks like the User field.  Then we pull in the Posting Date, taking care to remove the closing parentheses.

*/

 

(.*) \((.*)\)
/*
This regex is much simpler.  We are looking for any text until we find the string " (", which is our Account Name. Then we want any text inside the parentheses, which is the Account Number.  Since we allow for no characters after the closing parentheses, we will not pick up Example (2013) or Family (2013) as an account number.

*/
Kittycatdance
5 - Atom

Thank you so much! I will be taking a long time to understand everything you have done but I am very greatful and this will be great to learn. 

Labels