Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Alteryx is here to help you solve your biggest data challenges. Read about the new Virtual Solution Center here.
SOLVED

Parse out data with multiple criteria

Highlighted
5 - Atom

Hi Everyone,

 

MJB_DA_0-1585063442090.png

 

The image shown above is the data set I am working with. I dropped in a PDF using the PDF input tool, and consequently the input tool included all of the data in one column. In the image above, the column includes the following information from a balance sheet:

 

1) Row Description (i.e. Cash, Trade notes and accounts receivable, etc.)

2) Row Number (i.e. 1, 2a,2b, 3, etc.)

3) Beginning balances (i.e. 3,333,678, 56,633,614, 62,080, etc.)

4) Ending balances (i.e. 4,076,325, 57,554,079, 52,484, etc.)

 

Ultimately, I would like to separate the single column into the 4 columns outlined above. The problems I am running into are the following:

 

1) The row descriptions are different lengths and include the numerous "mmmmm" 's

2) When attempting to use the Regex function to parse out the beginning and ending balances, it was including the row numbers

3) When using the Regex function to parse out the beginning and ending balances, I was only able to extract out either the beginning OR ending balance, not both.

 

Based on other posts I have read, it seems like using some form of the Regex function is my best bet. Unfortunately, I am not very experienced using this tool or formula.

 

Does anyone have any thoughts on how to get the result I want? Thanks.

 

Highlighted
14 - Magnetar

I don't think this is a million miles from what you want. I typed out a few of the rows in you image - it would be much easier to help if you pasted a sample set of data

 

DavidP_0-1585087727486.png

 

Highlighted
5 - Atom

Hi DavidP,

 

That output looks to be what I am looking for. Attached is a sample of my data that pertains to this question.

 

Thanks!

 

14 - Magnetar

Here is the workflow with your data file attached. As you can see, it works ok when there is data in the number columns. In situations like this where it becomes tricky to capture all the variances with a regex statement, I normally divide the data into different groups and build a different regex for each. You can add a record ID at the start and then use filter tools to break your data into different groups and recombine it at the end with a union tool.

 

DavidP_0-1585143711754.png

 

Highlighted
5 - Atom

Thanks!

Labels