A lot of people I have been speaking to recently have asked this and seems to crop up more and more.
i thought it would be useful to build two macros to help solve this challenge.
Using either R or Python the two macros take a feed of files and process them.
If using R, the package used is called 'Officer' which you will need to install separately
For Python, the package used is called 'Docx2txt' , also to be installed seperately
It is a very basic example and there are a whole host of other packages that do something similar.
Here is the R code used:
library(officer)
doc <- read_docx("XXXX")
content <- docx_summary(doc)
head(content)
write.Alteryx(content, 3)
Here is the Python code used:
from ayx import Alteryx
Alteryx.installPackages("docx2txt")
from ayx import Alteryx
import pandas
import docx2txt
text = docx2txt.process('XXXX')
print(text)
#Turn the variabe with html page into Pandas' DF
df = pandas.DataFrame({"text":[text]})
#Write the data frame to Alteryx workflow for downstream processing
Alteryx.write(df,1)
For each method I packaged as a macro, in the code using 'xxxx' as a placeholder for the file name.
Attached is the Workflow+Macros and test file
Enjoy!!
Shaan Mistry
Solved! Go to Solution.
@gururajb i tested with your file. Looks like some file properties have not been filled in.
i opened the doc and copied contents and pasted into a new word doc and then the file reads in ok.
it might be down to how the original file was created
Thanks for the insights @ShaanM.
I will understand from the client how the files were created.
If I wanted to add the input filepath to the python macro so I can link phrases back to source documents, what might that look like? Something like this?
from ayx import Alteryx
import pandas
import docx2txt
text = docx2txt.process('XXXX')
filepath = 'XXXX'
print(text)
#Turn the variabe with html page into Pandas' DF
df = pandas.DataFrame({"text","filepath":[text],[filepath]})
#Write the data frame to Alteryx workflow for downstream processing
Alteryx.write(df,1)
Yes think you are on the right path.
The main thing is to define the file path in the data frame that way it can be part of the data as it passes through the stream
Hi ShaanM thanks for your info.
I got an error on installing the docx2txt so I tried saving the files where you suggest - in C:\Program Files\Alteryx\bin\Miniconda3\PythonTool_venv\Lib\site-packages.
However i have no PythonTool_venv folder (I asked IT to look too and they could not find it). I DO have a jupytertool_venv folder and it seems to be looking in there so i tried saving the files in the following location:
c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\
But still no luck. Says environment error. Do you have any more suggestions? I am not familiar with all this back-end stuff. Thanks in advance
Collecting docx2txt Installing collected packages: docx2txt ERROR: Could not install packages due to an EnvironmentError: [WinError 5] Access is denied: 'c:\\program files\\alteryx\\bin\\miniconda3\\envs\\jupytertool_venv\\Lib\\site-packages\\docx2txt' Consider using the `--user` option or check the permissions.
--------------------------------------------------------------------------- CalledProcessError Traceback (most recent call last) <ipython-input-2-72d8c39b3961> in <module> 1 from ayx import Alteryx ----> 2 Alteryx.installPackages("docx2txt") 3 c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\export.py in installPackage(package, install_type, debug, **kwargs) 138 This function will install a package or list of packages into the virtual environment used by the Python tool. If using an admin installation of Alteryx, you must run Alteryx as administrator in order to use this function and install packages. 139 """ --> 140 __installPackages__(package, install_type=install_type, debug=debug, **kwargs) 141 142 c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\Package.py in installPackages(package, install_type, debug) 112 print(pip_install_result['msg']) 113 if not pip_install_result['success']: --> 114 raise pip_install_result['err'] c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\site-packages\ayx\Utils.py in runSubprocess(args_list, debug) 56 57 try: ---> 58 result = subprocess.check_output(args_list, stderr=subprocess.STDOUT) 59 if debug: 60 print("[Subprocess success!]") c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\subprocess.py in check_output(timeout, *popenargs, **kwargs) 354 355 return run(*popenargs, stdout=PIPE, timeout=timeout, check=True, --> 356 **kwargs).stdout 357 358 c:\program files\alteryx\bin\miniconda3\envs\jupytertool_venv\lib\subprocess.py in run(input, timeout, check, *popenargs, **kwargs) 436 if check and retcode: 437 raise CalledProcessError(retcode, process.args, --> 438 output=stdout, stderr=stderr) 439 return CompletedProcess(process.args, retcode, stdout, stderr) 440 CalledProcessError: Command '['c:\\program files\\alteryx\\bin\\miniconda3\\envs\\jupytertool_venv\\python.exe', '-m', 'pip', 'install', 'docx2txt']' returned non-zero exit status 1.
Hi @ShaanM ,
I desperately need this to work as the solution I was using has developed problems.
I've followed the steps (I'm not overly familiar with R or Python, so I'm leaning toward the problem being between keyboard and chair) but I get the following error when using R:
Any ideas?
I get different errors when using Python, but we'll address those later if need be. I downloaded the officer package, then used the Alteryx R Package Installer to install. It confirmed it was installed correctly. I then needed to update the RLang package, which I did.
Now I get this error. Any ideas?
I'm literally on-site with a client now so any help will be greatly appreciated!!
M.
Try this:
on the local machine browse to this location (using Alteryx defaults):
C:\Program Files\Alteryx\R-3.5.3\bin\x64
This is the R location.
Once in that location, find and run: RGui.exe
RGUI allows you to install R packages.
From the top menu go to : Packages>Install Packages
Then select the cran mirror. I just select London. Then it will give you a full list of all packages available.
Then select Officer.
Once downloaded and unpackaged (it should do it all by itself) then re open Alteryx and try again.
Hope this helps. Failing that I would reach out to our support team : support@alteryx.com
Looks like you may have some environment discrepancies
To fully diagnose please log a ticket with our client service team: support@alteryx.com