Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

PDF to Text

msjac01
7 - Meteor

I have an invoice that I'm trying to extract data from and output it to a table format. Currently I have 2 rows of the 'Text' output, row 1 is the Headers from the body of the invoice that I want be Headers for my table, and row 2 is the values that I want to list under the Headers. Any thoughts on how to accomplish this?

7 REPLIES 7
binu_acs
21 - Polaris

@msjac01 use dynamic rename tool

image.png

msjac01
7 - Meteor

I'll try that. The columns headers and data need to be parsed, like the attached. Any thoughts there?

LindonB
11 - Bolide

It seems like you just need to delimit the values, mostly by spaces but with a couple exceptions. You can either rename the fields after doing this manually using a select tool or dynamically by delimiting them separately. Attached is a workflow that does this with the data you provided. It essentially....

1. Identifies if a record contains now numbers and considers that the header field.

2. Takes the first record and delimits the head row into columns.

3. Delimits the non-header (data) records based on spaces.

4. Renames the data records based on a pivoted list from the headers list.

 

Note that you might want to edit this follow based on your full data set structure and your understanding of the data fields. 

binu_acs
21 - Polaris

@msjac01 Regex to parse the text

image.png

msjac01
7 - Meteor

Hi there,

 

Are you able to send a screenshot of your parsing configuration?

 

Thanks!

binu_acs
21 - Polaris

@msjac01 I thought i attached the workflow, attaching again

msjac01
7 - Meteor

Will this flow still work if your row 1 is potentially in a different row depending on the pdf? Check Date Br/Co Description Reason Amount Comment will not be in row 1 usually and row position will vary.

Labels
Top Solution Authors