In case you missed the announcement: The Alteryx One Fall Release is here! Learn more about the new features and capabilities here
ACT NOW: The Alteryx team will be retiring support for Community account recovery and Community email-change requests after December 31, 2025. Set up your security questions now so you can recover your account anytime, just log out and back in to get started. Learn more here
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Read in PDF using Text Mining tools

missgina
7 - Meteor

I am trying the new text mining tools to read in PDFs.  I have used the image template to configure the fields I want to extract.  Even though the fields are always labeled the same, and in the same order, the size of the cells may vary as they are free text fields that are converted to PDF.   

 

I noticed as I process multiple PDFs, some of them had the contents truncated, which I'm assuming is because I drew the box using a template that perhaps only had 1 line of information but another file had 3 lines of info.

 

 

 

Any suggestions as to how to handle this?

 

Thanks

 

Gina

2 REPLIES 2
sprakasam
Alteryx Alumni (Retired)

@missgina Currently it needs to be done manually. But we have entity pair extraction coming up in the future which will solve this problem.  

ArtApa
Alteryx
Alteryx

Hi @missgina - You can either create a bigger annotation (for 3 lines instead of 1 line as per your example) or you can avoid annotations completely, read the entire pdf in bulk and parse your data to give it the required shape.

Labels
Top Solution Authors