Hi everyone,
I need to parse 100+ pdfs to text, specifically only the tabulated data in the pdfs. All the pdfs are in one folder and the number of tables and number of columns likely differs for many of the pdfs. Thus far I have tried sending the pdf through as an image and got the following result for the attached pdf: (also attaching the flow)


I expected the tabulated data to come out as in the pdf table, but the order / position of some lines are not correct i.e "AIRFLOW" is supposed to be with another line of text etc. Some of the data is truncated, some parsed where there seems to be a space or new line and some images seems to be red incorrectly etc.

I am hoping to transform the pdf data to text without needing to do lots of parsing since there are many files to convert. Can someone help me with this? Is there a specific kind of delimiter on which I need to parse to get all the data in the cells? Or will I need to instead connect it as a pdf and then parse the outcome?
Thank you for helping!
Rouche