Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
The Expert Exam is now live online! Read about the specifics and what it took to bring it to life in the blog by our very own Elizabeth Bonnell!

How to use R and Python to Parse Word Documents

Alteryx Partner

Hi @ShaanM 


Please find the sample file.

Thanks in advance.

Alteryx Partner

Hi Shaan


Please find the file.

Thanks in advance.


@gururajb i tested with your file. Looks like some file properties have not been filled in.


i opened the doc and copied contents and pasted into a new word doc and then the file reads in ok.


it might be down to how the original file was created

Alteryx Partner

Thanks for the insights @ShaanM.

I will understand from the client how the files were created.

8 - Asteroid

If I wanted to add the input filepath to the python macro so I can link phrases back to source documents, what might that look like? Something like this?


from ayx import Alteryx
import pandas

import docx2txt

text = docx2txt.process('XXXX')
filepath = 'XXXX'


#Turn the variabe with html page into Pandas' DF
df = pandas.DataFrame({"text","filepath":[text],[filepath]})

#Write the data frame to Alteryx workflow for downstream processing




Yes think you are on the right path.


The main thing is to define the file path in the data frame that way it can be part of the data as it passes through the stream

8 - Asteroid

Hi ShaanM thanks for your info.


I got an error on installing the docx2txt so I tried saving the files where you suggest - in C:\Program Files\Alteryx\bin\Miniconda3\PythonTool_venv\Lib\site-packages.


However i have no PythonTool_venv folder (I asked IT to look too and they could not find it). I DO have a jupytertool_venv folder and it seems to be looking in there so i tried saving the files in the following location:

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\

But still no luck. Says environment error. Do you have any more suggestions? I am not familiar with all this back-end stuff. Thanks in advance


Collecting docx2txt
Installing collected packages: docx2txt
ERROR: Could not install packages due to an EnvironmentError: [WinError 5] Access is denied: 'c:\\program files\\alteryx\\bin\\miniconda3\\envs\\jupytertool_venv\\Lib\\site-packages\\docx2txt'
Consider using the `--user` option or check the permissions.
CalledProcessError                        Traceback (most recent call last)
<ipython-input-2-72d8c39b3961> in <module>
      1 from ayx import Alteryx
----> 2 Alteryx.installPackages("docx2txt")

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\ in installPackage(package, install_type, debug, **kwargs)
    138     This function will install a package or list of packages into the virtual environment used by the Python tool. If using an admin installation of Alteryx, you must run Alteryx as administrator in order to use this function and install packages.
    139     """
--> 140     __installPackages__(package, install_type=install_type, debug=debug, **kwargs)

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\ in installPackages(package, install_type, debug)
    112     print(pip_install_result['msg'])
    113     if not pip_install_result['success']:
--> 114         raise pip_install_result['err']

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\ in runSubprocess(args_list, debug)
     57     try:
---> 58         result = subprocess.check_output(args_list, stderr=subprocess.STDOUT)
     59         if debug:
     60             print("[Subprocess success!]")

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\ in check_output(timeout, *popenargs, **kwargs)
    355     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 356                **kwargs).stdout    357 

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\ in run(input, timeout, check, *popenargs, **kwargs)
    436         if check and retcode:
    437             raise CalledProcessError(retcode, process.args,
--> 438                                      output=stdout, stderr=stderr)    439     return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['c:\\program files\\alteryx\\bin\\miniconda3\\envs\\jupytertool_venv\\python.exe', '-m', 'pip', 'install', 'docx2txt']' returned non-zero exit status 1.
Alteryx Certified Partner

Hi @ShaanM ,


I desperately need this to work as the solution I was using has developed problems. 

I've followed the steps (I'm not overly familiar with R or Python, so I'm leaning toward the problem being between keyboard and chair) but I get the following error when using R:


Any ideas?

I get different errors when using Python, but we'll address those later if need be. I downloaded the officer package, then used the Alteryx R Package Installer to install. It confirmed it was installed correctly. I then needed to update the RLang package, which I did.

Now I get this error. Any ideas?

I'm literally on-site with a client now so any help will be greatly appreciated!!






Try this:


on the local machine browse to this location (using Alteryx defaults):


C:\Program Files\Alteryx\R-3.5.3\bin\x64


This is the R location.


Once in that location, find and run: RGui.exe


RGUI allows you to install R packages.


From the top menu go to : Packages>Install Packages


Then select the cran mirror. I just select London. Then it will give you a full list of all packages available.


Then select Officer.


Once downloaded and unpackaged (it should do it all by itself) then re open Alteryx and try again.


Hope this helps. Failing that I would reach out to our support team :




Looks like you may have some environment discrepancies


To fully diagnose please log a ticket with our client service team: