Hi Everyone,
The image shown above is the data set I am working with. I dropped in a PDF using the PDF input tool, and consequently the input tool included all of the data in one column. In the image above, the column includes the following information from a balance sheet:
1) Row Description (i.e. Cash, Trade notes and accounts receivable, etc.)
2) Row Number (i.e. 1, 2a,2b, 3, etc.)
3) Beginning balances (i.e. 3,333,678, 56,633,614, 62,080, etc.)
4) Ending balances (i.e. 4,076,325, 57,554,079, 52,484, etc.)
Ultimately, I would like to separate the single column into the 4 columns outlined above. The problems I am running into are the following:
1) The row descriptions are different lengths and include the numerous "mmmmm" 's
2) When attempting to use the Regex function to parse out the beginning and ending balances, it was including the row numbers
3) When using the Regex function to parse out the beginning and ending balances, I was only able to extract out either the beginning OR ending balance, not both.
Based on other posts I have read, it seems like using some form of the Regex function is my best bet. Unfortunately, I am not very experienced using this tool or formula.
Does anyone have any thoughts on how to get the result I want? Thanks.
Solved! Go to Solution.
Here is the workflow with your data file attached. As you can see, it works ok when there is data in the number columns. In situations like this where it becomes tricky to capture all the variances with a regex statement, I normally divide the data into different groups and build a different regex for each. You can add a record ID at the start and then use filter tools to break your data into different groups and recombine it at the end with a union tool.
Thanks!