Alteryx Designer Desktop Discussions

buddhiDB · ‎07-13-2025

Hi Alteryx Community,

I’m working on a workflow to extract a specific table from a batch of PDFs using the "PDF to Text" tool with the "Line" method.

In the attached Excel file:

The "Lines method" tab shows the raw extracted data from the PDFs.
The "Table snips" tab includes screenshots of the tables from the PDFs, provided for reference.
The "Expected Result" tab shows the desired output format.

I'm struggling to format the extracted data into a structured table. The main issue is that the relevant values are not always aligning correctly under the appropriate headers, likely due to inconsistencies in spacing or formatting in the original PDF files.

Could anyone guide me on how to transform the data in the "Lines method" tab into the desired format shown in the "Expected Result" tab? Any suggestions or example workflows would be greatly appreciated.

Thank you in advance for your support!

Best regards,
Buddhi

KGT · ‎07-14-2025

I'm not sure why you would use the Lines method output instead of the table method. The table method has it all dropped out and you just need to re-align. As the table header is already tagged, you can just figure out which column is which and join it back on. I haven't validated the data, and I expect with a lot more, you may need to spend a little longer than 5 mins to build and test. I also wouldn't be surprised if there's 1-2 things you may need to write a rule to overcome.

Bonnie219Bailey · ‎07-14-2025

Hello!

Since I can't directly access external files or view attachments like the Excel file you mentioned, I can't provide a precise, ready-to-use Alteryx workflow. However, I can offer strategies and Alteryx tools that are commonly used to tackle the challenge of extracting structured data from inconsistently formatted PDF text output, especially when using the "Line" method. LiteBlue

Your problem is a classic text parsing challenge where you need to normalize varying spacing and align data to headers.

Alteryx Designer Desktop Discussions

Help Needed: Formatting PDF to Text Extraction (Line Method) into Table Format

Re: Is there any way the computer vision tools can...

Re: Batch Macro

Re: How to get cell reference address from excel

Re: Replacing Forecast columns with Actual Data

Re: Row creation