Alteryx Designer Desktop Discussions

Anasalter · ‎10-08-2025

Hi Community,

Data-->emp1-->expense_report1-->bill1

bill2

bill3

expense_report2-->bill34

23
emp2-->expensereport_344
expensereport_454
emp3-->expensereport_345

above is the structure how bills and invoices are present in a folder for each employees.
i have to extract text from the images, pdf and then compare all bills for a particular employees with each other to find duplicacy.

problem i am facing is when i am using image input and image to text tool it is giving the some memory error and unable to extract the text.(there are around 2800 bills)

what approach should i use to make this workflow?

Karen763Purvis · ‎10-09-2025

Hello!

To process 2800 bills efficiently, use a batch OCR workflow with tools like Tesseract or Google Vision, avoiding memory overload by streaming files and parallelizing tasks. Preprocess mywisely com images for better accuracy, store extracted text with metadata, and compare bills per employee using fuzzy matching or hashing to detect duplicates. Stick to VPP-installed apps for managed environments if using Home Assistant.

OllieClarke · ‎10-09-2025

Hi @Anasalter

Are you currently loading all 2800 files through the tool in one go? If so can you try batching them, so you're only working on one employee at a time?
If you take your current workflow, and use a control parameter to affect your directory input (which I'm assuming is there), that would let you make a batch macro which should limit the amount of memory being used by the tool in one go.

There's more info on batch macros here: https://knowledge.alteryx.com/index/s/article/Getting-Started-with-Batch-Macros-1583461640393

Hope that helps,

Ollie

Anasalter · ‎10-09-2025

Hi @OllieClarke

yes earlier i was trying to load all of the files at one go but now

I have used this approach and now i am able to extract text from the images.

OllieClarke · ‎10-09-2025

Happy to hear it :)

Alteryx Designer Desktop Discussions

Image to Text tool

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...