Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Pdf Data Extraction

Anasalter
7 - Meteor

I have couple of pdf's in a directory. I want to extract specific data from all the pdf's and afterwards compare. problem i am facing is when i am using (pdf to text ) tool it is extracting all the data from all pdf's with the use of macro. but i only want to extract required data  from each pdf. 

7 REPLIES 7
Manoj_k
9 - Comet

Hi @Anasalter you an create a batch macro which can be later applied for all the files.

Anasalter
7 - Meteor

Hi @Manoj_k  i have already created a batch macro, i am getting all the text out of pdf's but what i want to know is how can i get specific data from those pdf.

CoG
14 - Magnetar

You'd need to know what area of the page/what format or anchors exist to search on. Then using the Filter Tool, you will isolate those text blocks and concatenate as necessary. More information would help to provide more specific guidance.

nagakavyasri
12 - Quasar

@Anasalter use the combination of Image Input,Image Template and Image to Text tools to extract specific data from PDF files instead of all the data.

 

Screenshot 2024-10-08 164803.png

 

Give the path of PDF files in 'Image Input' tool, add a template and give annotations in 'Image Template' tool.

Anasalter
7 - Meteor

@nagakavyasri   I have tried with method also but the problem is some of the pdf are having 2 pages and some are having 4  so annotations are not working properly also format of all the pdf are not same.

nagakavyasri
12 - Quasar

 

@Anasalter you can use 2 such sets of tools to read a pdf of 2 pages with 1 format and other pdf of 4 pages with another format in the same workflow

samson211
8 - Asteroid

Anyone have the batch macro for this. if yes, please share. THanks

Labels
Top Solution Authors