I am trying out the new Text Mining PDF Input tool.
The PDF Input Tool works great if you can use a static template. Unfortunately, a large number of my source PDFs do not conform to a static image template.
On my virtual system, it can take up to 90 seconds to output a 4 pages from PDF to text. AMP Engine is selected. I have a few hundred notifications to convert. It's much quicker to convert the PDFs to TXT in Preview or Acrobat and then import them into Alteryx.
I have read some of the forum articles on system performance. My VM runs on a iMac Pro w/ 10 cores (Xeon W), 64GB RAM, 2TB SSD, and a 16GB Vega64Pro GPU. The VM is given 5 cores, 32GB of RAM, and X GB of GPU memory (set to auto).
My question is: To what extent, if at all, does GPU performance influence workflow time to complete -- particularly as it applies to the new PDF Input and Image to Text tools?
Solved! Go to Solution.
The Image to Text Tool is using a pre-trained image processing algorithm (Tesseract). Since we are only scoring data not training, only the CPUs are being utilized.