Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Read in PDF using Text Mining tools

missgina
7 - Meteor

I am trying the new text mining tools to read in PDFs.  I have used the image template to configure the fields I want to extract.  Even though the fields are always labeled the same, and in the same order, the size of the cells may vary as they are free text fields that are converted to PDF.   

 

I noticed as I process multiple PDFs, some of them had the contents truncated, which I'm assuming is because I drew the box using a template that perhaps only had 1 line of information but another file had 3 lines of info.

 

 

 

Any suggestions as to how to handle this?

 

Thanks

 

Gina

2 REPLIES 2
sprakasam
Alteryx Alumni (Retired)

@missgina Currently it needs to be done manually. But we have entity pair extraction coming up in the future which will solve this problem.  

ArtApa
Alteryx
Alteryx

Hi @missgina - You can either create a bigger annotation (for 3 lines instead of 1 line as per your example) or you can avoid annotations completely, read the entire pdf in bulk and parse your data to give it the required shape.

Labels