Important Community update: The process for changing your account details was updated on June 25th. Learn how this impacts your Community experience and the actions we suggest you take to secure your account here.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Extract data from hyperlink

Muyi
7 - Meteor

Hi,

 

I am trying to read a table with a field to store hyperlinks (to Word documents in my local computer). I am trying to extract all the text from each Word doc and save it to a new field. Is this possible?

Muyi_0-1631056149698.png

 

Thank you,

Muyi 

5 REPLIES 5
cmcclellan
13 - Pulsar

Alteryx cannot read Word documents, because they are not really a "data file".

 

You could write some Python code to extract the text and then process that.  What is in the documents that you're trying to extract ?

BrandonB
Alteryx
Alteryx

You might have some luck with this workflow on the Alteryx Public Gallery: https://gallery.alteryx.com/#!app/Word-DocX-Parser/5cafb0b08a93370e40222b9c 

 

It can take a directory of word documents and extract the text from them. No Python at all actually! It reads in values from the XML behind the scenes of the files. 

Muyi
7 - Meteor

Thank you @BrandonB ! This is life saving!

 

I successfully parsed some (20 out of 190) documents but failed to open the rest. They are all in the same folder. I have made sure those files are closed. Do you know the possible reasons?

 

Muyi_0-1631124953850.png

Error: Parse DocX Batch (7): Record #168: Tool #19: Unable to open archive: T:\T_E_FILES\Construction Section\STP\current\21-permits\21-3801.doc

 

Muyi 

cmcclellan
13 - Pulsar

So are they DOC or DOCX ?  The difference between the 2 file formats is huge and probably why the logic is failing

Muyi
7 - Meteor

They are all doc (97 - 2003 document).

Labels