Alteryx Designer

Find answers, ask questions, and share expertise about Alteryx Designer.
Register for the upcoming Live Community Q&A Session - and don't forget to submit your questions for @DeanS regarding the future role of analytics here.
SOLVED

How to use R and Python to Parse Word Documents

Highlighted
Alteryx Partner

Hi @ShaanM 

 

Please find the sample file.

Thanks in advance.

Highlighted
Alteryx Partner

Hi Shaan

 

Please find the file.

Thanks in advance.

Highlighted
Alteryx
Alteryx

@gururajb i tested with your file. Looks like some file properties have not been filled in.

 

i opened the doc and copied contents and pasted into a new word doc and then the file reads in ok.

 

it might be down to how the original file was created

Highlighted
Alteryx Partner

Thanks for the insights @ShaanM.

I will understand from the client how the files were created.

8 - Asteroid

If I wanted to add the input filepath to the python macro so I can link phrases back to source documents, what might that look like? Something like this?

 

from ayx import Alteryx
import pandas

import docx2txt

text = docx2txt.process('XXXX')
filepath = 'XXXX'

print(text)

#Turn the variabe with html page into Pandas' DF
df = pandas.DataFrame({"text","filepath":[text],[filepath]})

#Write the data frame to Alteryx workflow for downstream processing
Alteryx.write(df,1)

Highlighted
Alteryx
Alteryx

@coderockride 

 

Yes think you are on the right path.

 

The main thing is to define the file path in the data frame that way it can be part of the data as it passes through the stream

Highlighted
8 - Asteroid

Hi ShaanM thanks for your info.

 

I got an error on installing the docx2txt so I tried saving the files where you suggest - in C:\Program Files\Alteryx\bin\Miniconda3\PythonTool_venv\Lib\site-packages.

 

However i have no PythonTool_venv folder (I asked IT to look too and they could not find it). I DO have a jupytertool_venv folder and it seems to be looking in there so i tried saving the files in the following location:

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\

But still no luck. Says environment error. Do you have any more suggestions? I am not familiar with all this back-end stuff. Thanks in advance

 

Collecting docx2txt
Installing collected packages: docx2txt
ERROR: Could not install packages due to an EnvironmentError: [WinError 5] Access is denied: 'c:\\program files\\alteryx\\bin\\miniconda3\\envs\\jupytertool_venv\\Lib\\site-packages\\docx2txt'
Consider using the `--user` option or check the permissions.
 
---------------------------------------------------------------------------
CalledProcessError                        Traceback (most recent call last)
<ipython-input-2-72d8c39b3961> in <module>
      1 from ayx import Alteryx
----> 2 Alteryx.installPackages("docx2txt")
      3 

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\export.py in installPackage(package, install_type, debug, **kwargs)
    138     This function will install a package or list of packages into the virtual environment used by the Python tool. If using an admin installation of Alteryx, you must run Alteryx as administrator in order to use this function and install packages.
    139     """
--> 140     __installPackages__(package, install_type=install_type, debug=debug, **kwargs)
    141 
    142 

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\Package.py in installPackages(package, install_type, debug)
    112     print(pip_install_result['msg'])
    113     if not pip_install_result['success']:
--> 114         raise pip_install_result['err']

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\Utils.py in runSubprocess(args_list, debug)
     56 
     57     try:
---> 58         result = subprocess.check_output(args_list, stderr=subprocess.STDOUT)
     59         if debug:
     60             print("[Subprocess success!]")

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\subprocess.py in check_output(timeout, *popenargs, **kwargs)
    354 
    355     return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
--> 356                **kwargs).stdout    357 
    358 

c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\subprocess.py in run(input, timeout, check, *popenargs, **kwargs)
    436         if check and retcode:
    437             raise CalledProcessError(retcode, process.args,
--> 438                                      output=stdout, stderr=stderr)    439     return CompletedProcess(process.args, retcode, stdout, stderr)
    440 

CalledProcessError: Command '['c:\\program files\\alteryx\\bin\\miniconda3\\envs\\jupytertool_venv\\python.exe', '-m', 'pip', 'install', 'docx2txt']' returned non-zero exit status 1.
Highlighted
Alteryx Certified Partner

Hi @ShaanM ,

 

I desperately need this to work as the solution I was using has developed problems. 

I've followed the steps (I'm not overly familiar with R or Python, so I'm leaning toward the problem being between keyboard and chair) but I get the following error when using R:

mceleavey_0-1579597866571.png

Any ideas?

I get different errors when using Python, but we'll address those later if need be. I downloaded the officer package, then used the Alteryx R Package Installer to install. It confirmed it was installed correctly. I then needed to update the RLang package, which I did.

Now I get this error. Any ideas?

I'm literally on-site with a client now so any help will be greatly appreciated!!

 

M.

Highlighted
Alteryx
Alteryx

@mceleavey 

 

Try this:

 

on the local machine browse to this location (using Alteryx defaults):

 

C:\Program Files\Alteryx\R-3.5.3\bin\x64

 

This is the R location.

 

Once in that location, find and run: RGui.exe

 

RGUI allows you to install R packages.

 

From the top menu go to : Packages>Install Packages

 

Then select the cran mirror. I just select London. Then it will give you a full list of all packages available.

 

Then select Officer.

 

Once downloaded and unpackaged (it should do it all by itself) then re open Alteryx and try again.

 

Hope this helps. Failing that I would reach out to our support team : support@alteryx.com

Highlighted
Alteryx
Alteryx

@G1 

 

Looks like you may have some environment discrepancies

 

To fully diagnose please log a ticket with our client service team: support@alteryx.com 

Labels