Hi Everyone,
I have a encountered a challenge and I wish to get solution if anyone knows something.
So i have a Word document which I need to insert it Alteryx the way it is and use the Interface tools to Create an Analytical tools which will be able to change specific areas from the Document. S o my question is can you input the Document in Excel without changing its Format structure and If Yes what are the tools that you would recommend for me and how to use them
@Gratus_tz This is more of an office 365 question than alteryx question, I know Alteryx cannot handle Word documents.
I would look to possibly a way in python to convert word to excel and pass into the alteryx workflow.
Bacon
I would also look into this macro.
https://community.alteryx.com/t5/Community-Gallery/Word-doc-input/ta-p/1055948
Hello @Gratus_tz
Great question, there are multiple ways you could tackle this. Below I have outlined two methods:
The first one would be to read the word doc in as a .zip file then try and parse the xml.
And the second method involves Python.
Alternatively, you could create a .bat file which converts all your Word documents to PDFs and then uses some of the computer vision tools to read these in.
For context, the word document i am trying to read in looks like this.
Method one:
Whilst I agree with @abacon that Alteryx does not appear to support word documents natively, there is a way around this. You have to configure the Alteryx input data tool to read in your word document as a zip file.
From there you will need to open up option 3 "File in Archive" and select the following
Finally, please make sure you have selected "Return outer xml". This should allow you to read in the word documents XML as seen below., You will then need to use some further logic to re-tabulate your results.
Method 2:
There are some python libraries which are designed to specifically help extract out tables from word documents. I have used the following code to help me do this (the black box simply covers the file path):
Once you have done this, you can do further transformation in Alteryx before outputting your results to an excel file.
One thing to note is that you will need to install a python-docx package (all from within the python tool). In order to do this you may need to open Alteryx as an administrator - hence my screenshots switched from dark mode to light mode!
I have attached the workflow below, If you have any questions please just let me know.
Regards - Pilsner
Thanks
Please do so i believe it can be done.
Hi Pilsner
So i appreciate for your recommended solution and I think it will be of a great help as i am looking into it.
Though i have a Question there and i would like to ask referring the Document that I have.
So the Document is the Letter of Engagement which we usually send them to the Client where the main reason for asking the Solution to Approach is that I want to create a Workflow that has the Letter Format Installed in and just to Put the Interactive tools in a Specific Dynamic changes needed on the Letter such as Client Names, Contacts, Address extra where the users will be able to put the Dynamic Inputs and hence get the Output of Formatted Letters with his/her Inputs set
Regards - Gratus Tz
Older post but I'll add: To do this, you would either need to:
Thanks For Reconsidering sharing your Ideas and thoughts and for sure I did find the Solution to the Query where i used Pyhton Tool to write Codes that gets the Letter to from the Specific Path and do the Dynamic changes by using the Text Input which i used the interface and Action tools to connect from it inorder to Input the Dynamic changes
This is the Code that I used
from ayx import Alteryx
import pandas as pd
from docx import Document
from datetime import datetime
import os
# Step 1: Read input
df = Alteryx.read("#1")
# Step 2: Handle empty input
if df.empty:
output_df = pd.DataFrame({"Path": ["[Placeholder] File path will appear after actual input."]})
Alteryx.write(output_df, 1)
else:
# Step 3: Safe value extraction function
def safe_value(val):
return "" if pd.isna(val) else str(val)
client_name = safe_value(df["Client Name"].iloc[0])
letter_date = safe_value(df["Letter Date"].iloc[0])
ref_no = safe_value(df["Reference No"].iloc[0])
attention_name = safe_value(df["Attention name"].iloc[0])
amount = safe_value(df["Amount"].iloc[0])
currency = safe_value(df["Currency"].iloc[0])
po_box = safe_value(df["P.O Box Number"].iloc[0])
region = safe_value(df["Region"].iloc[0])
country = safe_value(df["Country"].iloc[0])
email = safe_value(df["Email"].iloc[0])
end_date = safe_value(df["End Date"].iloc[0])
period_end = safe_value(df["Period End"].iloc[0])
standards = safe_value(df["Standards"].iloc[0])
professional_fees_i = safe_value(df["Professional fees"].iloc[0])
professional_fees_ii = safe_value(df["Professional fees ii)"].iloc[0])
professional_fees_iii = safe_value(df["Professional fees iii)"].iloc[0])
title_name = safe_value(df["Title"].iloc[0])
# Step 4: Format dates using pd.to_datetime to handle extra time part
if letter_date:
formatted_date = pd.to_datetime(letter_date).strftime("%d %B %Y")
else:
formatted_date = ""
if end_date:
formatted_end_date = pd.to_datetime(end_date).strftime("%d %B %Y")
else:
formatted_end_date = ""
# Step 5: Define paths
output_filename = f"LOE_{client_name.replace(' ', '_')}.docx"
output_path = rf"\\shareddata\CommonDocuments\Alteryx documents\LOE\{output_filename}"
template_path = r"\\shareddata\CommonDocuments\Alteryx documents\LOE\Letter of Engagement for Company.docx"
# Step 6: Test access to template
try:
with open(template_path, 'rb') as f:
print(f"✅ Template file opened successfully at: {template_path}")
except Exception as e:
raise FileNotFoundError(f"❌ Could not access template. Reason: {e}")
# Step 7: Load document and replace placeholders using runs
doc = Document(template_path)
placeholder_map = {
"[Client Name]": client_name,
"[Letter Date]": formatted_date,
"[Reference No]": ref_no,
"[Attention name]": attention_name,
"[Amount]": amount,
"[Currency]": currency,
"[P.O Box No]": po_box,
"[Region]": region,
"[Country]": country,
"[Email]": email,
"[End Date]": formatted_end_date,
"[Period End]": period_end,
"[Standards]": standards,
"[Professional fees]": professional_fees_i,
"[Professional fees ii)]": professional_fees_ii,
"[Professional fees iii)]": professional_fees_iii,
"[Title]":title_name,
}
for para in doc.paragraphs:
for run in para.runs:
for placeholder, value in placeholder_map.items():
if placeholder in run.text:
run.text = run.text.replace(placeholder, value)
doc.save(output_path)
# Step 8: Output plain UNC file path
output_df = pd.DataFrame({"Path": [output_path]})
Alteryx.write(output_df, 1)
@Gratus_tz if you save it as a pdf you can bring it via the intelligence suite (and have access toi it) , would that be possible
@aatalai so we haven't paid for intelligence suite License in our side but hopefully it can be done
User | Count |
---|---|
106 | |
85 | |
76 | |
54 | |
40 |