Let’s talk Alteryx Copilot. Join the live AMA event to connect with the Alteryx team, ask questions, and hear how others are exploring what Copilot can do. Have Copilot questions? Ask here!
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

PDF to Text

msjac01
7 - Meteor

I have an invoice that I'm trying to extract data from and output it to a table format. Currently I have 2 rows of the 'Text' output, row 1 is the Headers from the body of the invoice that I want be Headers for my table, and row 2 is the values that I want to list under the Headers. Any thoughts on how to accomplish this?

7 REPLIES 7
binuacs
21 - Polaris

@msjac01 use dynamic rename tool

image.png

msjac01
7 - Meteor

I'll try that. The columns headers and data need to be parsed, like the attached. Any thoughts there?

LindonB
11 - Bolide

It seems like you just need to delimit the values, mostly by spaces but with a couple exceptions. You can either rename the fields after doing this manually using a select tool or dynamically by delimiting them separately. Attached is a workflow that does this with the data you provided. It essentially....

1. Identifies if a record contains now numbers and considers that the header field.

2. Takes the first record and delimits the head row into columns.

3. Delimits the non-header (data) records based on spaces.

4. Renames the data records based on a pivoted list from the headers list.

 

Note that you might want to edit this follow based on your full data set structure and your understanding of the data fields. 

binuacs
21 - Polaris

@msjac01 Regex to parse the text

image.png

msjac01
7 - Meteor

Hi there,

 

Are you able to send a screenshot of your parsing configuration?

 

Thanks!

binuacs
21 - Polaris

@msjac01 I thought i attached the workflow, attaching again

msjac01
7 - Meteor

Will this flow still work if your row 1 is potentially in a different row depending on the pdf? Check Date Br/Co Description Reason Amount Comment will not be in row 1 usually and row position will vary.

Labels
Top Solution Authors