Adobe Acrobat Pro creates a type of form with the suffix "distributed" with labeled input fields. Multiple people can download the form, enter data in the fields, and when returned, these multiple files can be added to a Response file. Response files can be exported to .csv.
I have a "distributed" form to which respondents can add one or more images. The images, of course, do not get exported to the .csv file.
The Alteryx computer vision tools seem incapable of recognizing this PDF format, which has fixed, labeled fields, while Acrobat Pro does. In the response file, the Acrobat .csv export file just has the field names and underneath, just the filled-in data. All the extraneous user interface instruction text is ignored.
I've got a dozen complex forms similar to "Perpendicular-7345020F.pdf" that need to be converted to fillable PDF forms. "Ramp Form-Perpendicular2_distributed.pdf" is a partial attempt to do this. "Ramp Form-Perpendicular2.example.pdf" is what this partial attempt looks like looks like when it comes back. "Ramp Form-Perpendicular2_responses.pdf" has two collected forms that can be exported as .csv by Acrobat, but lacking the images in each.
The PDF to text tool does not work, in any of its configurations, once you get beyond the simple text at the top. It falls apart at the Running Slope 1 and beyond. I've not tried the Image to text tool yet, because 1) setting up all the tiny template boxes will be a TREMENDOUS time sink and 2) my experience with the template tool is that it gets confused when there's too many fields to check. It may get the first two or three forms right, and then **bleep** out when there's some minute difference in the fourth or fifth supposedly identical form.
What would be actually time saving, as Alteryx claims for its tools, would be for AIS to actually be able to recognize a fillable PDF form that has the fields already labeled and ready to go. This is possible for Acrobat Pro, as the attached .csv file shows. But my hope is that Alteryx could do the same thing, but with the addition of an image blob.
Thanks for any help anyone can give.
David
Have you looked at using the Image Template tool to specify the parts you want to pull out from these examples? You might need a couple of different Image Template configurations, but it would avoid the issue of reading everything in every time!
My experience using the Image Template is that it doesn't work reliably on anything complex, and each of these forms has 70+ very small boxes or check marks to be read... Image Template workflows often fail the read the data in one or several locations. On other fields, it reads the data, but adds a new line and garbage text after the new line, creating a great deal of cleanup work. Since Acrobat obviously can read each field precisely and put it in the correct column, I was hoping AIS could decipher the "filename_distributed.pdf" structure to do the same thing. The issue is that the Acrobat export of the data skips over the images.
I see - then I am not sure it is possible with AIS. Maybe with R or Python...