parsing pdf or doc files
Options
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
Keerthana_Adamana
6 - Meteoroid
‎05-23-2024
10:27 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
hi,
I would like to parse a folder which contains both pdf and doc file types(for resume parsing). Is there any way i can achieve?
Labels:
- Labels:
- Parse
- Text Mining
2 REPLIES 2
bertal34
9 - Comet
‎05-23-2024
10:40 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
PDF parsing will require an AIS license which includes the Computer Vision tools. For ms word documents, check out the thread below. I was able to take the "Docx Input" macro from @RogerS and tweak it for my use case. To input multiple docx files, you can place this macro inside of a batch macro allowing you to feed in multiple file paths and output data from all files.
18 - Pollux
‎05-23-2024
10:57 AM
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
To add onto @bertal34 's note, if you don't have the Intelligence Suite tools then there's a macro that leverages R to read in PDFs on the Community: PDF Input - Alteryx Community
