Hello -
I am extracting PDF data into Excel using the PDF input package and am left with a dataset in one column, for the most part with the data row title on the left and the value separate by several spaces on the right (example below). What would be the best way to parse this into separate columns, given the number of spaces are not always consistent between the title and actual data?
Thanks.
ACCRUED CUSTODIAN FIXED 23,835.41 |
ACCRUED INSURANCE FIXED 59.14 |
ACCRUED LEGAL FIXED 98,325.85 |
ACCRUED MISC EXPENSE FIXED 5,260.96 |
Solved! Go to Solution.
@jannis005 I would use a Regex tool on the parse setting.
You could have it parse based on both the spaces and the starting point of digits.
Like this:
(.+)\s+(\d+.+)
Thanks! That solution worked. Some of my data has trailing ")" characters - is there a way to remove those characters and replace them with nothing using RegEx?
@jannis005 If it's always the ")" character, then using Regex would be overkill. But if you really wanted to, you could do:
Regex_Replace([RegExOut1],'\W*$','')
Personally, since it's always the same, I would do this:
TrimRight([RegexOut1],')')
or
Replace([RegexOut1],')','')