Hello,
I have financial statement of my client in word file (450Pages). I want to extract the tables from the statement into a Excel file (One table in each tab) Is there any way to do this using Alteryx?
Thanks!
Solved! Go to Solution.
Office documents are just xml, so it should be possible to create a workflow to parse this out, albeit I haven’t done anything with Word but have done it with Excel before. If you make a copy of your word doc, change the file extension to .zip and then extract it you will discover the underlying XML.
There is this great use case submitted which uses Alteryx to update a PowerPoint which might be helpful reading? https://community.alteryx.com/t5/Alteryx-Use-Cases/Adidas-Automates-PowerPoint-Presentations-to-Save...
Hi @SahadhKuruniyan ,
I made the sample workflow using Python tool that use the Python-docx package.
Input
Result
Column 0 : filepath
Column 1 : Table Number
Column 2~ : data
Python code
from ayx import Alteryx
from docx import Document
import pandas as pd
df1 = Alteryx.read("#1")
lines=[]
for index, files in df1.iterrows():
filepath = files['FilePath']
document = Document(filepath)
tbl_num=0
for tbl in document.tables:
tbl_num += 1
for row in tbl.rows:
values=[filepath,tbl_num]
for cell in row.cells:
values.append(cell.text)
lines.append(values)
df = pd.DataFrame(lines)
Alteryx.write(df,1)
You can run this part only first time. After you run first, you should comment out the line "Package.installPackages(['python-docx'])".
If you don't comment out, it takes a little time to run.
1st run
2nd run~
Hi @SahadhKuruniyan, there’s also the Word Docx Parser macro downloadable here: https://gallery.alteryx.com/#!app/Word-DocX-Parser/5cafb0b08a93370e40222b9c
Hi , This word parser is not available any more. Is there any replacement? How did you solve this?
Thank you.
Here is the new one : Read Word Table Macro - Alteryx Community