Hi, I'm relatively new to Alteryx, and I was working on a workflow for bulk reading of payslips from PDFs.
I am using the Computer Vision PDF to Text tools with an Image Template, as all the documents to be read are identical.
Once I have defined the areas to capture in the template, the problem I am encountering is that, although the different PDF payrolls are identical, in some Alteryx reads data from a table with three columns, others with 4.
Although I have made the flow that works for the total set of data of the last year, if I only select some of the PDF of the inputs, the flow gives me an error due to missing fields, since not all the workers have the same concepts in the payroll.
When we read this year's data I may have new concepts that I did not have last year, so I'll get the same error.
I have been reading in the community about the Multi-field formula, but I am not sure if I can dynamize my data transformation with this tool.
Many thanks,
Jordi
@Jaloy_1973
One way to solve this issue is by creating a headers file, where you will have all the potential headers, new ones will be added to it, and then connect with a union the headers and the data, in this way you will always going to have all the headers. That is a simple and workable way. Obviously there are other methods to get it done.
Multi Fields Formula tool does have Unknown/Dynamically fields option
@Jaloy_1973
please add the workflow created with sample data
will be in better position to help
Regards
Raj
Hi,
Unfortunately the dataset is a number of PDFs with personal payroll information of workers, with first and last names, company names and logos, information that I cannot share.
Thanks,
Jordi