We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Handle unstructured pdf documents

Arwa_Albutainy2712
7 - Meteor

I will work on several pdf file as an inputs, pdf file are unstructured, and i want to catch the total price from each pdf, note that pdf would come in arabic and english and the price unit would be in riyal or us dollars 

so how can we do that automatically?

8 REPLIES 8
OTrieger
14 - Magnetar

@Arwa_Albutainy2712 
I guess this is one of the biggest challenges that companies have.

 

If you do not know where the data will be in the document then go with a Python reader script and the try create mapping based on keywords, and filter out these lines.

 

However honestly, PDFs or scanned docs does not have a perfect solution, you will need to figure it out as per the data that you have.

Arwa_Albutainy2712
7 - Meteor

Do you mean python tool?

Arwa_Albutainy2712
7 - Meteor

And what do you mean by mapping?

OTrieger
14 - Magnetar

@Arwa_Albutainy2712 

Yes, using Python tool. Python tool enabling you writing python codes in Alteryx, so potentially whatever you can do with Python you should use in Alteryx. 

Arwa_Albutainy2712
7 - Meteor

What do you mean by mapping?

OTrieger
14 - Magnetar

Creating flags that indicating the text, so you will know which text is needed and which is not.

Such as if it is Grand Total then 1 if Date then 2 etc.

Arwa_Albutainy2712
7 - Meteor

Which tool do you recommend to do the mapping?

OTrieger
14 - Magnetar

@Arwa_Albutainy2712 

I normally using Formula tool, with IF statements, however in your case it is depending on the data itself, you might need to use the Multi Row Formula tool for the IF statement as you might have the text a few rows above the needed text.

 

Labels
Top Solution Authors