Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Extract data from hyperlink

Muyi
7 - Meteor

Hi,

 

I am trying to read a table with a field to store hyperlinks (to Word documents in my local computer). I am trying to extract all the text from each Word doc and save it to a new field. Is this possible?

Muyi_0-1631056149698.png

 

Thank you,

Muyi 

5 REPLIES 5
cmcclellan
13 - Pulsar

Alteryx cannot read Word documents, because they are not really a "data file".

 

You could write some Python code to extract the text and then process that.  What is in the documents that you're trying to extract ?

BrandonB
Alteryx
Alteryx

You might have some luck with this workflow on the Alteryx Public Gallery: https://gallery.alteryx.com/#!app/Word-DocX-Parser/5cafb0b08a93370e40222b9c 

 

It can take a directory of word documents and extract the text from them. No Python at all actually! It reads in values from the XML behind the scenes of the files. 

Muyi
7 - Meteor

Thank you @BrandonB ! This is life saving!

 

I successfully parsed some (20 out of 190) documents but failed to open the rest. They are all in the same folder. I have made sure those files are closed. Do you know the possible reasons?

 

Muyi_0-1631124953850.png

Error: Parse DocX Batch (7): Record #168: Tool #19: Unable to open archive: T:\T_E_FILES\Construction Section\STP\current\21-permits\21-3801.doc

 

Muyi 

cmcclellan
13 - Pulsar

So are they DOC or DOCX ?  The difference between the 2 file formats is huge and probably why the logic is failing

Muyi
7 - Meteor

They are all doc (97 - 2003 document).

Labels