A lot of people I have been speaking to recently have asked this and seems to crop up more and more.
i thought it would be useful to build two macros to help solve this challenge.
Using either R or Python the two macros take a feed of files and process them.
If using R, the package used is called 'Officer' which you will need to install separately
For Python, the package used is called 'Docx2txt' , also to be installed seperately
It is a very basic example and there are a whole host of other packages that do something similar.
Here is the R code used:
library(officer)
doc <- read_docx("XXXX")
content <- docx_summary(doc)
head(content)
write.Alteryx(content, 3)
Here is the Python code used:
from ayx import Alteryx
Alteryx.installPackages("docx2txt")
from ayx import Alteryx
import pandas
import docx2txt
text = docx2txt.process('XXXX')
print(text)
#Turn the variabe with html page into Pandas' DF
df = pandas.DataFrame({"text":[text]})
#Write the data frame to Alteryx workflow for downstream processing
Alteryx.write(df,1)
For each method I packaged as a macro, in the code using 'xxxx' as a placeholder for the file name.
Attached is the Workflow+Macros and test file
Enjoy!!
Shaan Mistry
Solved! Go to Solution.
Hi Shaan,
When i try to run docx2txt i get this:
Could not find a version that satisfies the requirement docx2txt (from versions: ) No matching distribution found for docx2txt
can you give any guidance?
hi @JTCairns
what if you try run just this in python:
from ayx import Alteryx
Alteryx.installPackages("docx2txt")
does it attempt to install?
Hi Shaan,
I get the below, i think it may be a net securty issue but i have no way of changing that, if it is that can a package be installed from a local file? Or is it something else?
hi @JTCairns
The packages can be downloaded out of Alteryx and placed into this folder (default location):C:\Program Files\Alteryx\bin\Miniconda3\PythonTool_venv\Lib\site-packages
I will attempt to attach the file here. It needs unzipping then place the 2 folders into the folder above and it should then work.
hope this helps
Thanks fo rthis Shann, i hope this works but i dont have admin permision so i will have to wait and see.
Hi I am getting this error while I am trying to parse it using R based macro.
Has anyone come across this issue?
Please help.
what does the data look like going into the macro?
Check it is represented as a full path e.g. c:\datafolder\worddoc.docx
hi @gururajb
i tested on my end with a .doc and the R macro still pulls the data in ok.
could you maybe send me a direct message with the file or upload here?