Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

How to use R and Python to Parse Word Documents

Alteryx Partner

Hi @ShaanM 


Please find the sample file.

Thanks in advance.

Alteryx Partner

Hi Shaan


Please find the file.

Thanks in advance.


@gururajb i tested with your file. Looks like some file properties have not been filled in.


i opened the doc and copied contents and pasted into a new word doc and then the file reads in ok.


it might be down to how the original file was created

Alteryx Partner

Thanks for the insights @ShaanM.

I will understand from the client how the files were created.


If I wanted to add the input filepath to the python macro so I can link phrases back to source documents, what might that look like? Something like this?


from ayx import Alteryx
import pandas

import docx2txt

text = docx2txt.process('XXXX')
filepath = 'XXXX'


#Turn the variabe with html page into Pandas' DF
df = pandas.DataFrame({"text","filepath":[text],[filepath]})

#Write the data frame to Alteryx workflow for downstream processing




Yes think you are on the right path.


The main thing is to define the file path in the data frame that way it can be part of the data as it passes through the stream