I will work on several pdf file as an inputs, pdf file are unstructured, and i want to catch the total price from each pdf, note that pdf would come in arabic and english and the price unit would be in riyal or us dollars
so how can we do that automatically?
@Arwa_Albutainy2712
I guess this is one of the biggest challenges that companies have.
If you do not know where the data will be in the document then go with a Python reader script and the try create mapping based on keywords, and filter out these lines.
However honestly, PDFs or scanned docs does not have a perfect solution, you will need to figure it out as per the data that you have.
Do you mean python tool?
And what do you mean by mapping?
Yes, using Python tool. Python tool enabling you writing python codes in Alteryx, so potentially whatever you can do with Python you should use in Alteryx.
What do you mean by mapping?
Creating flags that indicating the text, so you will know which text is needed and which is not.
Such as if it is Grand Total then 1 if Date then 2 etc.
Which tool do you recommend to do the mapping?
I normally using Formula tool, with IF statements, however in your case it is depending on the data itself, you might need to use the Multi Row Formula tool for the IF statement as you might have the text a few rows above the needed text.