Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extracting tables from word file using Alteryx

SahadhKuruniyan
8 - Asteroid

Hello,

 

I have financial statement of my client in word file (450Pages). I want to extract the tables from the statement into a Excel file (One table in each tab) Is there any way to do this using Alteryx?

 

Thanks!

5 REPLIES 5
cgoodman3
14 - Magnetar
14 - Magnetar

Office documents are just xml, so it should be possible to create a workflow to parse this out, albeit I haven’t done anything with Word but have done it with Excel before. If you make a copy of your word doc, change the file extension to .zip and then extract it you will discover the underlying XML.

 

There is this great use case submitted which uses Alteryx to update a PowerPoint which might be helpful reading? https://community.alteryx.com/t5/Alteryx-Use-Cases/Adidas-Automates-PowerPoint-Presentations-to-Save...

Chris
Check out my collaboration with fellow ACE Joshua Burkhow at AlterTricks.com
AkimasaKajitani
17 - Castor
17 - Castor

Hi @SahadhKuruniyan ,

 

I made the sample workflow using Python tool that use the Python-docx package.

 

Input

AkimasaKajitani_0-1638675417667.png

 

Result

AkimasaKajitani_4-1638676558244.png

 

Column 0 : filepath

Column 1 : Table Number

Column 2~ : data

 

 

Python code

 

from ayx import Alteryx
from docx import Document
import pandas as pd

df1 = Alteryx.read("#1")
lines=[]

for index, files in df1.iterrows():
    filepath = files['FilePath']
    document = Document(filepath)
    tbl_num=0
    for tbl in document.tables:
        tbl_num += 1
        for row in tbl.rows:
            values=[filepath,tbl_num]
            for cell in row.cells:
                values.append(cell.text)
            lines.append(values)

df = pd.DataFrame(lines)

Alteryx.write(df,1)

 

 

You can run this part only first time. After you run first, you should comment out the line "Package.installPackages(['python-docx'])".
If you don't comment out, it takes a little time to run.

 

1st run

AkimasaKajitani_2-1638675536182.png

 

2nd run~

AkimasaKajitani_3-1638675747333.png

 

 

mutama
Alteryx
Alteryx

Hi @SahadhKuruniyan, there’s also the Word Docx Parser macro downloadable here: https://gallery.alteryx.com/#!app/Word-DocX-Parser/5cafb0b08a93370e40222b9c

Stella
6 - Meteoroid

Hi , This word parser is not available any more. Is there any replacement? How did you solve this?

 

Thank you.

Stella
6 - Meteoroid
Labels