A lot of people I have been speaking to recently have asked this and seems to crop up more and more.
i thought it would be useful to build two macros to help solve this challenge.
Using either R or Python the two macros take a feed of files and process them.
If using R, the package used is called 'Officer' which you will need to install separately
For Python, the package used is called 'Docx2txt' , also to be installed seperately
It is a very basic example and there are a whole host of other packages that do something similar.
Here is the R code used:
library(officer)
doc <- read_docx("XXXX")
content <- docx_summary(doc)
head(content)
write.Alteryx(content, 3)
Here is the Python code used:
from ayx import Alteryx
Alteryx.installPackages("docx2txt")
from ayx import Alteryx
import pandas
import docx2txt
text = docx2txt.process('XXXX')
print(text)
#Turn the variabe with html page into Pandas' DF
df = pandas.DataFrame({"text":[text]})
#Write the data frame to Alteryx workflow for downstream processing
Alteryx.write(df,1)
For each method I packaged as a macro, in the code using 'xxxx' as a placeholder for the file name.
Attached is the Workflow+Macros and test file
Enjoy!!
Shaan Mistry
Solved! Go to Solution.
Legend.
I've done all that, but I'm now getting an rlang version error. It seems to be unpacking and using an older version, so I need to override that somehow. Any ideas?
Sorry for bothering you Shaan, but if I can get this working I can knock of early and go jet-skiing with movies stars.
Maybe not, but still...
Double check the Alteryx version is using the correct version of R.
It sounds like a mismatch somewhere.
Alteryx 2019.4 - the R version should be R-3.5.3
check the version of Designer, and make sure it is correct. Also check if Designer is non-admin, the R installed is also non admin.
downloads.alteryx.com is where you can download older verisons and non admin/admin.
Failing that try this:
browse to this location: C:\Program Files\Alteryx\R-3.5.3\library
see if you have an officer folder in the location.
I upload a zip of mine. Once unzipped replace with yours and try it again.
Thanks @ShaanM ,
I've actually done all of that and it moves on to another error each time. I'm now having a problem with Zip. It gave me the same error, so I downloaded the latest version, it gave an error saying it can't uninstall the previous version. I unpacked the zip and copied the zip folder into the library but now it's just returning a zip error:
"Cannot open zip file for reading"
I'm stuck.
I've checked the versions and all is well. I'm on 2019.4 and the correct version of R is being used...
M.
try taking the following components out of the zip folder I sent, and copy into the officer folder in your location:
R and Libs
also try running RGUI.exe as admin (right click run as admin)
I'm getting a zip error. Copying the folders into my library folder for officer did not change anything.
I'm trying to convert .doc, not .docx so is there anything I need to change in the R macro? I tried changing the references to docx to doc and that caused an error.
zip error: 'Cannot open zip file 'C:\Users\\xxxxxx\AppData\Local\Temp\xxxxxx.doc' for reading in file zip.c:238'
sounds like it is the input causing an issue with that error.
can you create a new folder on the machine, and place in one word doc.
then using that location as the input.
Test that, then you know it runs ok, so it could then be more relating to the input
@mceleavey just had test on my end with .doc an that works.
it might be how that file is formatted or created.
adding a word doc for you test.
Noooooooooo!
I've just realised the problem. I need to load the data into text that is held within the Word docs as selected from a dropdown. The .docx version runs without errors but does not return the data if it has been selected in a drop-down. The .doc version simply returns an error if there is a dropdown within the document.
Is there an option to access the actual XML to return the data?
I've attached an example of what I'm trying to do. The problem section is specifically this part: