I am preparing a workflow using computer vision tools to read numerous purchase card statements from PDF files. I have successfully read each file using PDF to Text. My workflow results in a record for each transaction. However, I have not determined how to flag/identify a transaction with a credit amount (as shown in screenshot).
PDF to Text configuration:
Text Extraction Options - Read Text Content Only checked with Risk Score and Output selection checked
Output Options - Alteryx Table
Looking for suggestion on how to isolate these credit values. Thanks for the help.
WG
Solved! Go to Solution.
Do any of the other output options yield the credit column? You can do multiple at once in the same PDF to Text tool to test
Other options have not generated a credit column amount or workable result. In the screenshot example, that layout appears after a different layout for each employee and last four of card number header. The PDF to Text tool reads each statement row (Record ID) into nine columns. I am able to create a useable record after using RegEx and some other tools. I use a formula tool to find the character location for the first digit after the last letter in a string containing description, reference no, mcc, purchase, credit. The reference no, space and mcc are a set number of characters. Any characters after this length are amounts.
Wondering if I could identify a space or something in a record string to flag an amount as belonging in credit column. If so, I could match the Record ID further downstream after isolating the amount field.
I am a bit surprised that it's not coming through any of the options - are you using an Image Template tool? Is there any sample data you can provide so we can test out some solutions?
The 418.86 amount is coming through for me with just a change to "Read Text and image Content" (since the Read Text Content Only option doesn't work for the example provided), so can you try that?
Otherwise it'll be hard to help further without seeing the issue
Yes, that process works. This final sample provides the issue I am encountering. I am reading in 114 pages based on this sample. Page 36 of 114 presents one layout for the statement. Page 37 of 114 presents the transition to the other layout with the desired data for output (the second page does not have a credit amount; however, the prior file attachment can be used). All of the desired data is under a "Transactions" header. The layout transition results in an output with 17 column fields (column 1, column 2....column 17).
Also attached a screenshot of my current wf.
Thank you for helping with this.
I would just read it in as lines then, and work on parsing from there - that will at least capture everything and you can use Alteryx afterwards to get the data in a useable format. You could use the format of the data to branch to different parts and parse accordingly!