Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing Data from Single Column

kge5293
5 - Atom

Hi alteryx community!

 

I've been working on a problem and have not been able to come to a solution, I'm hoping someone can assist me. I believe the answer to this problem will be using a regex function to parse the data but I haven't been able to figure out how to create a formula that solves all my criteria. See details of my issue below:

 

- I used the 'pdf input' tool to convert a ~750 page pdf into an excel file which was used as the starting point in the workflow. After converting the pdf into excel all the columns were combined into one. 

- I would like to separate the following: date, names (include first, middle and last), first set of two digits (these don't exist on all rows of data, see row 8 and 9 in the attached excel for an example), next five digits (also don't exist on all rows of data), the text after the previous five digits (should align with the text that has no digits preceding it - e.g. football, grass would be in the same column as left wing player - ice hockey etc.), the amount, and the three capitalized initials. As mentioned above, I believe the way to solve this would be through the use of a regex formula but I haven't been able to create one that can solve each of the above criteria.

- The telephone numbers at the end are irrelevant and can be removed.

 

Also worth noting: in some instances there is not telephone number at the end of the row of data (i.e. the three initials are the last inputs in the row). There are also a varying number of spaces between certain numbers/words in each of the rows. 

 

Below is a screenshot of how I'm trying to get my data to end up looking:

 

kge5293_0-1604638809904.png

 

 Appreciate any help, thanks!

7 REPLIES 7
atcodedog05
22 - Nova
22 - Nova

Hi @kge5293 

 

It wasnt easy but was able to crack it.

 

Here is a workflow for the task.

Output:

atcodedog05_0-1604644123742.png

Workflow:

atcodedog05_1-1604644138195.png

 

Hope this helps 🙂 Please check and let me know


If this post helps you please mark it as solution. And give a like if you dont mind 😀👍

atcodedog05
22 - Nova
22 - Nova

Hi @kge5293 

 

Did you get a chance to check it out. Does it work as accepted.

kge5293
5 - Atom

Thanks @atcodeog05 this is extremely helpful and solves 99% of my problem! When I was reviewing the data there were a handful of rows that didn't get parsed in the same format as the ones I initially shared I think because of minor nuances in their structure. Is there something that can be tweaked in the regex to parse these as well? 

 

In the attached file the first row wasn't properly parsed during the first regex (no data was carried forward) and the remaining rows lost data after the second regex (only the amount and three initials were lost). 

atcodedog05
22 - Nova
22 - Nova

Hi @kge5293 

 

I will take a look and get back to you. I could agree with you the data is badly structured.

atcodedog05
22 - Nova
22 - Nova

Hi @kge5293 

 

Here is the modified workflow which can handle new cases.

 

Output

atcodedog05_0-1604673010104.png

Please check and let me know.

kge5293
5 - Atom

@atcodedog05 Thank you!

atcodedog05
22 - Nova
22 - Nova

Hi @kge5293 

 

I have posted the modified solution in previous post please check and let me know.

Labels