I have a workflow that runs in 8 minutes on Designer and 4 hrs in Gallery. Any idea why or how to fix it?
Python Code below:
from ayx import Alteryx
import pandas as pd
import os
import subprocess
from datetime import datetime, timedelta
# Read the input data from Alteryx
df = Alteryx.read("#1")
# Path to the 7-Zip executable
seven_zip_path = r'C:\Program Files\7-Zip\7z.exe'
# Path to save the batch script
batch_script_path = 'CSD-MFT BATCH FILE.bat'
# Open the batch script file for writing
with open(batch_script_path, 'w') as bat_file:
# Write the initial batch script setup
bat_file.write('@echo off\n')
bat_file.write('setlocal EnableDelayedExpansion\n\n')
for index, row in df.iterrows():
# Extract details from each row
source_file_path = row['FullPath']
file_name = row['Output File Name']
password = row['Password']
mft_folder = row['MFT Folder ']
extension = row['Extension']
# Handle different extensions
if extension == '.csv':
# Handle CSV files (no compression)
target_file_path = os.path.join(mft_folder, file_name)
bat_file.write(f'echo Copying {source_file_path} to {target_file_path}...\n')
bat_file.write(f'copy "{source_file_path}" "{target_file_path}"\n')
elif extension == '.zip':
# Handle ZIP files
target_zip_file_path = os.path.join(mft_folder, file_name.replace('.csv', '.zip'))
if pd.notna(password) and password.strip():
bat_file.write(f'echo Zipping {source_file_path} to {target_zip_file_path} with password...\n')
bat_file.write(f'"{seven_zip_path}" a -tzip -p{password} "{target_zip_file_path}" "{source_file_path}"\n')
else:
bat_file.write(f'echo Zipping {source_file_path} to {target_zip_file_path} without password...\n')
bat_file.write(f'"{seven_zip_path}" a -tzip "{target_zip_file_path}" "{source_file_path}"\n')
elif extension == '.gzip':
target_gzip_file_path = os.path.join(mft_folder, file_name.replace('.csv', '.gz'))
bat_file.write(f'echo Compressing {source_file_path} to {target_gzip_file_path} using 7-Zip...\n')
bat_file.write(f'"{seven_zip_path}" a -tgzip "{target_gzip_file_path}" "{source_file_path}"\n')
else:
print(f"Unsupported file extension: {extension}")
# Finalize the batch script
bat_file.write('\necho Processing complete.\n')
bat_file.write('pause\n')
print(f"Batch script created at {batch_script_path}")
# Execute the batch script
try:
subprocess.run([batch_script_path], check=True, shell=True)
print("Batch script executed successfully.")
except subprocess.CalledProcessError as e:
print(f"An error occurred while executing the batch script: {e}")
# Write output to Alteryx
Alteryx.write(df, 1)
@Peter_Guirguis
Is your Gallery server is set and configured in the same way as your computers, speed, memory, download speed, upload speed, same accesses, same Python packages saved etc.
Gallery machine has even higher specs that my local machine. Both machines' configurations are identical.
can you confirm it does run -- ie it doesn't error out? asking because sometimes with tools causing the errors are misdiagnosed. Assuming the datastream going from Alteryx to Python (to create #1) or other file sizes are not drastically different - I don't really see a reason why this would take considerably longer on a workernode - assuming the worker node can reach the depdencies (ie 7zip) and doesn't have specific network buffers.
I'd test how this runs without the python code - my hunch is the python code may not be the pain point.
@Peter_Guirguis Were you able to figure out what was the issue with Gallery runs? Thank you.
Yes I have tried this and I can confirm that Python is the pain point
No ended up splitting the code where the function of copying and renaming files to export as .bat file and run it using a run command tool