Dear community,
I would like to ask how we can extract the data from mutipage - PDF (number and text) in Alteryx. I understand that we can use PDF tool - which we need to download and subscribe it for license.
May i knows if there is any way to extract the data without purchasing the license for PDF tool related - intelligence suite?
Thank you.
There are various free solutions around the place. Take a look for PDF extraction on the community or in the gallery. As with any pdf parsing, there is a certain amount of customisation you need to do. Any free solution will have limitations and require customisation.
PDF is by it's very nature, not the same structure underneath. That's the whole purpose of PDF's, it's a universal presentation layer for data of any format. This means that pdf extraction on 2 docs that look the same, may end up different if using different algorithms, and it's also why a lot of pdf extraction converts it to an image and then extracts from that.
@KGT is right. You can find an example here: https://community.alteryx.com/t5/Community-Gallery/PDF-Reader-Tool/ta-p/937908
Though I would suggest for you to tinker and code something together with Alteryx. You can use LLMs to guide you, but stick to standard packages that are maintained.
Hello @SH_94
Although this isn't quite as powerful as the pdf to text tool, especially when it comes to template extraction, one method I've used before involves converting the PDFs to .docx files and then reading the .docx file to Alteryx.
The conversion from pdf to .docx can be achieved with a PowerShell script (.ps1) file, which you can then run within the run command tool.
Once your documents are converted, you can read the .docx file in as a .zip file to extract all the text. I have detailed these steps in another post here:
https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/WORD-DOCUMENT-TEMPLATE-INPUT/m...
If you have any questions, please let me know.
Regards - Pilsner
Hi @caltang ,
Would like to seek your help to further elaborate on the following sentence and what i need to do in order to achieve this?
"Though I would suggest for you to tinker and code something together with Alteryx."
Thank you.
User | Count |
---|---|
107 | |
82 | |
70 | |
54 | |
40 |