Alteryx Designer Desktop Discussions

jannis005 · ‎11-05-2018

Hello -

I am extracting PDF data into Excel using the PDF input package and am left with a dataset in one column, for the most part with the data row title on the left and the value separate by several spaces on the right (example below). What would be the best way to parse this into separate columns, given the number of spaces are not always consistent between the title and actual data?

Thanks.

ACCRUED CUSTODIAN FIXED 23,835.41

ACCRUED INSURANCE FIXED 59.14

ACCRUED LEGAL FIXED 98,325.85

ACCRUED MISC EXPENSE FIXED 5,260.96

patrick_mcauliffe · ‎11-05-2018

@jannis005 I would use a Regex tool on the parse setting.

You could have it parse based on both the spaces and the starting point of digits.

Like this:

(.+)\s+(\d+.+)

jannis005 · ‎11-05-2018

Thanks! That solution worked. Some of my data has trailing ")" characters - is there a way to remove those characters and replace them with nothing using RegEx?

patrick_mcauliffe · ‎11-10-2018

@jannis005 If it's always the ")" character, then using Regex would be overkill. But if you really wanted to, you could do:

Regex_Replace([RegExOut1],'\W*$','')

Personally, since it's always the same, I would do this:
TrimRight([RegexOut1],')')

or

Replace([RegexOut1],')','')

Alteryx Designer Desktop Discussions

Best way to parse data by consecutive spaces separating columns

Zero to Advanced in 20 days

Re: Zero to Advanced in 20 days

Re: Zero to Advanced in 20 days

Passed the Advanced Certification Exam!

Re: Identify duplicates in a specific column, and ...