community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Upgrade Alteryx Designer in 10 Steps

Debating whether or not to upgrade to the latest version of Alteryx Designer?

LEARN MORE

How to use R and Python to Parse Word Documents

Alteryx
Alteryx

A lot of people I have been speaking to recently have asked this and seems to crop up more and more.

 

i thought it would be useful to build two macros to help solve this challenge.

 

Using either R or Python the two macros take a feed of files and process them.

 

If using R, the package used is called 'Officer' which you will need to install separately 

 

For Python, the package used is called 'Docx2txt' , also to be installed seperately

 

It is a very basic example and there are a whole host of other packages that do something similar.

 

Here is the R code used:

 

R.jpg

 

 

library(officer)

doc <- read_docx("XXXX")


content <- docx_summary(doc)
head(content)


write.Alteryx(content, 3)

 

 

 

 

Here is the Python code used:

Python.jpg

 

 

from ayx import Alteryx
Alteryx.installPackages("docx2txt")

 

 

 

 

from ayx import Alteryx
import pandas

import docx2txt

text = docx2txt.process('XXXX')

print(text)

#Turn the variabe with html page into Pandas' DF
df = pandas.DataFrame({"text":[text]})

#Write the data frame to Alteryx workflow for downstream processing
Alteryx.write(df,1)

 

 

 

For each method I packaged as a macro, in the code using 'xxxx' as a placeholder for the file name.

 

Attached is the Workflow+Macros and test file

 

 

Enjoy!!

 

Shaan Mistry

Asteroid

Hi Shaan,

 

When i try to run docx2txt i get this:

 

Could not find a version that satisfies the requirement docx2txt (from versions: )
No matching distribution found for docx2txt

can you give any guidance?


Alteryx
Alteryx

hi @JTCairns 

 

what if you try run just this in python:

 

from ayx import Alteryx
Alteryx.installPackages("docx2txt")

 

 

does it attempt to install?

 

 

Highlighted
Asteroid

Hi Shaan,

 

I get the below, i think it may be a net securty issue but i have no way of changing that, if it is that can a package be installed from a local file? Or is it something else?

 

Capture.PNG

Alteryx
Alteryx

hi @JTCairns 

 

The packages can be downloaded out of Alteryx and placed into this folder (default location):C:\Program Files\Alteryx\bin\Miniconda3\PythonTool_venv\Lib\site-packages

 

I will attempt to attach the file here. It needs unzipping then place the 2 folders into the folder above and it should then work.

 

hope this helps

Asteroid

Thanks fo rthis Shann, i hope this works but i dont have admin permision so i will have to wait and see.

Alteryx Partner

Hi I am getting this error while I am trying to parse it using R based macro.

Has anyone come across this issue?

Please help.

Capture.PNG

Alteryx
Alteryx

@gururajb 

 

what does the data look like going into the macro? 

 

Check it is represented as a full path e.g. c:\datafolder\worddoc.docx

Alteryx Partner
Hi Shaan I realized that the file extension is .doc which is not supported by officer library.
I guess we will have to use different package.
Alteryx
Alteryx

hi @gururajb 

 

i tested on my end with a .doc and the R macro still pulls the data in ok.

 

could you maybe send me a direct message with the file or upload here?

Labels