Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

PDF To Text - Amounts in a Credit Column shifting

w_gore
8 - Asteroid

 

I am preparing a workflow using computer vision tools to read numerous purchase card statements from PDF files.  I have successfully read each file using PDF to Text.  My workflow results in a record for each transaction.  However, I have not determined how to flag/identify a transaction with a credit amount (as shown in screenshot).

 

PDF to Text configuration:

Text Extraction Options - Read Text Content Only checked with Risk Score and Output selection checked

Output Options - Alteryx Table

 

Looking for suggestion on how to isolate these credit values.  Thanks for the help.

 

WG

 

p.s. I should have mentioned the credit amount aligns with all other amounts (charge column).  There are no discerning spaces/breaks before a "credit" amount or "charge" amount.

 

10 REPLIES 10
alexnajm
18 - Pollux
18 - Pollux

Do any of the other output options yield the credit column? You can do multiple at once in the same PDF to Text tool to test

w_gore
8 - Asteroid

Other options have not generated a credit column amount or workable result.  In the screenshot example, that layout appears after a different layout for each employee and last four of card number header.  The PDF to Text tool reads each statement row (Record ID) into nine columns.  I am able to create a useable record after using RegEx and some other tools.  I use a formula tool to find the character location for the first digit after the last letter in a string containing description, reference no, mcc, purchase, credit.  The reference no, space and mcc are a set number of characters.  Any characters after this length are amounts.  

 

Wondering if I could identify a space or something in a record string to flag an amount as belonging in credit column.  If so, I could match the Record ID further downstream after isolating the amount field.

alexnajm
18 - Pollux
18 - Pollux

I am a bit surprised that it's not coming through any of the options - are you using an Image Template tool? Is there any sample data you can provide so we can test out some solutions?

w_gore
8 - Asteroid

The attached redacted pdf should provide the necessary sample.

w_gore
8 - Asteroid

Failed to take sample with a credit amount.  My mistake.  Here is a sample with a credit amount.

alexnajm
18 - Pollux
18 - Pollux

The 418.86 amount is coming through for me with just a change to "Read Text and image Content" (since the Read Text Content Only option doesn't work for the example provided), so can you try that?

 

Otherwise it'll be hard to help further without seeing the issue

PDF To Text - Amounts in a Credit Column shifting.png

w_gore
8 - Asteroid

Yes, that process works.  This final sample provides the issue I am encountering.  I am reading in 114 pages based on this sample.  Page 36 of 114 presents one layout for the statement.  Page 37 of 114 presents the transition to the other layout with the desired data for output (the second page does not have a credit amount; however, the prior file attachment can be used).  All of the desired data is under a "Transactions" header.  The layout transition results in an output with 17 column fields (column 1, column 2....column 17).

 

Also attached a screenshot of my current wf.

 

Thank you for helping with this.

alexnajm
18 - Pollux
18 - Pollux

I would just read it in as lines then, and work on parsing from there - that will at least capture everything and you can use Alteryx afterwards to get the data in a useable format. You could use the format of the data to branch to different parts and parse accordingly!

w_gore
8 - Asteroid

That will definitely be a better approach for the workflow.  I'm still hung up on isolating those credit amounts.  In the attached screenshot, the two highlighted amounts are in Credit column on the statement. Recommended approach/tools to accomplish flagging these values/records?

Labels
Top Solution Authors