Alteryx Designer Desktop Discussions

kge5293 · ‎11-05-2020

Hi alteryx community!

I've been working on a problem and have not been able to come to a solution, I'm hoping someone can assist me. I believe the answer to this problem will be using a regex function to parse the data but I haven't been able to figure out how to create a formula that solves all my criteria. See details of my issue below:

- I used the 'pdf input' tool to convert a ~750 page pdf into an excel file which was used as the starting point in the workflow. After converting the pdf into excel all the columns were combined into one.

- I would like to separate the following: date, names (include first, middle and last), first set of two digits (these don't exist on all rows of data, see row 8 and 9 in the attached excel for an example), next five digits (also don't exist on all rows of data), the text after the previous five digits (should align with the text that has no digits preceding it - e.g. football, grass would be in the same column as left wing player - ice hockey etc.), the amount, and the three capitalized initials. As mentioned above, I believe the way to solve this would be through the use of a regex formula but I haven't been able to create one that can solve each of the above criteria.

- The telephone numbers at the end are irrelevant and can be removed.

Also worth noting: in some instances there is not telephone number at the end of the row of data (i.e. the three initials are the last inputs in the row). There are also a varying number of spaces between certain numbers/words in each of the rows.

Below is a screenshot of how I'm trying to get my data to end up looking:

Appreciate any help, thanks!

atcodedog05 · ‎11-05-2020

Hi @kge5293

It wasnt easy but was able to crack it.

Here is a workflow for the task.

Output:

Workflow:

Hope this helps 🙂 Please check and let me know

If this post helps you please mark it as solution. And give a like if you dont mind 😀👍

atcodedog05 · ‎11-06-2020

Hi @kge5293

Did you get a chance to check it out. Does it work as accepted.

kge5293 · ‎11-06-2020

Thanks @atcodeog05 this is extremely helpful and solves 99% of my problem! When I was reviewing the data there were a handful of rows that didn't get parsed in the same format as the ones I initially shared I think because of minor nuances in their structure. Is there something that can be tweaked in the regex to parse these as well?

In the attached file the first row wasn't properly parsed during the first regex (no data was carried forward) and the remaining rows lost data after the second regex (only the amount and three initials were lost).

atcodedog05 · ‎11-06-2020

Hi @kge5293

I will take a look and get back to you. I could agree with you the data is badly structured.

atcodedog05 · ‎11-06-2020

Hi @kge5293

Here is the modified workflow which can handle new cases.

Output

Please check and let me know.

kge5293 · ‎11-06-2020

@atcodedog05 Thank you!

atcodedog05 · ‎11-06-2020

Hi @kge5293

I have posted the modified solution in previous post please check and let me know.

Alteryx Designer Desktop Discussions

Parsing Data from Single Column