Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

How to use R and Python to Parse Word Documents

Alteryx Partner

Hi @ShaanM 


Please find the sample file.

Thanks in advance.

Alteryx Partner

Hi Shaan


Please find the file.

Thanks in advance.


@gururajb i tested with your file. Looks like some file properties have not been filled in.


i opened the doc and copied contents and pasted into a new word doc and then the file reads in ok.


it might be down to how the original file was created

Alteryx Partner

Thanks for the insights @ShaanM.

I will understand from the client how the files were created.


If I wanted to add the input filepath to the python macro so I can link phrases back to source documents, what might that look like? Something like this?


from ayx import Alteryx
import pandas

import docx2txt

text = docx2txt.process('XXXX')
filepath = 'XXXX'


#Turn the variabe with html page into Pandas' DF
df = pandas.DataFrame({"text","filepath":[text],[filepath]})

#Write the data frame to Alteryx workflow for downstream processing




Yes think you are on the right path.


The main thing is to define the file path in the data frame that way it can be part of the data as it passes through the stream


Hi ShaanM thanks for your info.


I got an error on installing the docx2txt so I tried saving the files where you suggest - in C:\Program Files\Alteryx\bin\Miniconda3\PythonTool_venv\Lib\site-packages.


However i have no PythonTool_venv folder (I asked IT to look too and they could not find it). I DO have a jupytertool_venv folder and it seems to be looking in there so i tried saving the files in the following location:

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\

But still no luck. Says environment error. Do you have any more suggestions? I am not familiar with all this back-end stuff. Thanks in advance


Collecting docx2txt
Installing collected packages: docx2txt
ERROR: Could not install packages due to an EnvironmentError: [WinError 5] Access is denied: 'c:\\program files\\alteryx\\bin\\miniconda3\\envs\\jupytertool_venv\\Lib\\site-packages\\docx2txt'
Consider using the `--user` option or check the permissions.
CalledProcessError                        Traceback (most recent call last)
<ipython-input-2-72d8c39b3961> in <module>
      1 from ayx import Alteryx
----> 2 Alteryx.installPackages("docx2txt")

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\ in installPackage(package, install_type, debug, **kwargs)
    138     This function will install a package or list of packages into the virtual environment used by the Python tool. If using an admin installation of Alteryx, you must run Alteryx as administrator in order to use this function and install packages.
    139     """
--> 140     __installPackages__(package, install_type=install_type, debug=debug, **kwargs)

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\ in installPackages(package, install_type, debug)
    112     print(pip_install_result['msg'])
    113     if not pip_install_result['success']:
--> 114         raise pip_install_result['err']

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\ in runSubprocess(args_list, debug)
     57     try:
---> 58         result = subprocess.check_output(args_list, stderr=subprocess.STDOUT)
     59         if debug:
     60             print("[Subprocess success!]")

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\ in check_output(timeout, *popenargs, **kwargs)
    355     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 356                **kwargs).stdout    357 

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\ in run(input, timeout, check, *popenargs, **kwargs)
    436         if check and retcode:
    437             raise CalledProcessError(retcode, process.args,
--> 438                                      output=stdout, stderr=stderr)    439     return CompletedProcess(process.args, retcode, stdout, stderr)

CalledProcessError: Command '['c:\\program files\\alteryx\\bin\\miniconda3\\envs\\jupytertool_venv\\python.exe', '-m', 'pip', 'install', 'docx2txt']' returned non-zero exit status 1.