Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Input Data from Word document (.docx)

Magne
7 - Meteor

Hi,

 

I have many monthly financial reports in MS Word that is build up on the same structure (same headings, tables etc.). I would like to import those reports into Alteryx in order to structured and analyse the data in Alteryx and presenting later in Power BI. 

 

A good solution might be a tool that read the word document and input one row for each paragraph, including some paragraph information like style (e.g. Heading 1, Heading 2, Normal, Punctuation etc.). Where there are tables in the document; several fields should be added and separated in Field 1, Field 2... in order to extract and analyse the content of each table. 

 

Are there any solutions like this in Alteryx today, or any good workarounds? Or is it planned to be added in the close future?

9 REPLIES 9
BenMoss
ACE Emeritus
ACE Emeritus

A .docx file is just a fancy zip file that word can read.

 

If you take the extension and convert it to .zip and then unzip the contents, you will see a series of structured xml files. When you open word and a .docx file it knows how to stitch these together.

 

You could look at each of the xml files and identify the one which appears to contain the actual block text (most of them are related to syling and so on), and then import this xml file into alteryx before doing some parsing.

 

I have seen no sign that this will be coming in the near future to the product but you could add it as an idea to the forum.

 

Ben

The_Data_Loop
8 - Asteroid

You would need to do server-side conversion of the documents to a repository. Once you have usable files in a repository - the skys the limit. 

 

This is possible with Spire.Doc and I believe a number of other providers. 

 

Once the documents are converted to a useable format you would use a variety of techniques to dynamically parse out the information needed. (RegEx Parsing).

 

This could be completely automated as well so that document conversion and subsequent new file creation are done at the time of original file placement.

 

 

-AM

 
 
chinta
7 - Meteor

The new R package, 'officer' can provide the details of a docx file with ease. Checkout the package on CRAN. I was able to implement the exact thing you are trying to do. 

jmedidi
8 - Asteroid

Could you please let me how you used R package to read the word files ?

RogerS
Alteryx
Alteryx

I created the attached macro with python.  Hope it work for your needs.

 

Thx

jmedidi
8 - Asteroid

that works. Thanks.

jmedidi
8 - Asteroid

the macro works with single file. How to handle multiple files ? Please advise.

jmedidi
8 - Asteroid

Appreciate the response

srea541
8 - Asteroid

Excellent workflow @RogerS , thank you!!!!

Labels