Hi,
I am trying to read a table with a field to store hyperlinks (to Word documents in my local computer). I am trying to extract all the text from each Word doc and save it to a new field. Is this possible?
Thank you,
Muyi
Alteryx cannot read Word documents, because they are not really a "data file".
You could write some Python code to extract the text and then process that. What is in the documents that you're trying to extract ?
You might have some luck with this workflow on the Alteryx Public Gallery: https://gallery.alteryx.com/#!app/Word-DocX-Parser/5cafb0b08a93370e40222b9c
It can take a directory of word documents and extract the text from them. No Python at all actually! It reads in values from the XML behind the scenes of the files.
Thank you @BrandonB ! This is life saving!
I successfully parsed some (20 out of 190) documents but failed to open the rest. They are all in the same folder. I have made sure those files are closed. Do you know the possible reasons?
Error: Parse DocX Batch (7): Record #168: Tool #19: Unable to open archive: T:\T_E_FILES\Construction Section\STP\current\21-permits\21-3801.doc
Muyi
So are they DOC or DOCX ? The difference between the 2 file formats is huge and probably why the logic is failing
They are all doc (97 - 2003 document).