Alteryx Designer Desktop Discussions

Muyi · ‎09-07-2021

Hi,

I am trying to read a table with a field to store hyperlinks (to Word documents in my local computer). I am trying to extract all the text from each Word doc and save it to a new field. Is this possible?

Thank you,

Muyi

cmcclellan · ‎09-07-2021

Alteryx cannot read Word documents, because they are not really a "data file".

You could write some Python code to extract the text and then process that. What is in the documents that you're trying to extract ?

BrandonB · ‎09-07-2021

You might have some luck with this workflow on the Alteryx Public Gallery: https://gallery.alteryx.com/#!app/Word-DocX-Parser/5cafb0b08a93370e40222b9c

It can take a directory of word documents and extract the text from them. No Python at all actually! It reads in values from the XML behind the scenes of the files.

Muyi · ‎09-08-2021

Thank you @BrandonB ! This is life saving!

I successfully parsed some (20 out of 190) documents but failed to open the rest. They are all in the same folder. I have made sure those files are closed. Do you know the possible reasons?

Error: Parse DocX Batch (7): Record #168: Tool #19: Unable to open archive: T:\T_E_FILES\Construction Section\STP\current\21-permits\21-3801.doc

Muyi

cmcclellan · ‎09-08-2021

So are they DOC or DOCX ? The difference between the 2 file formats is huge and probably why the logic is failing

Muyi · ‎09-08-2021

They are all doc (97 - 2003 document).

Alteryx Designer Desktop Discussions

Extract data from hyperlink

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...