Alteryx Designer Desktop Discussions

terry10 · ‎09-09-2021

Auto detecting a table of data from a PDF using the new Computer Vision tools from Intelligence Suite is a great new functionality, but figuring out how to make it work was not intuitive for me. I did a lot of trial and error figuring out how to extract a table of data from a pdf. Here's how I made it work.

Image Input - browse to "Titanic Data.PDF" (attached)
Connect the output from the Image Input tool to both
1. D input anchor of Image to Text tool
2. the optional input anchor of an Image Template tool - set configuration image to 'image'
Image Template Tool - set configuration Image to 'image'

the output contains a Markup field - connect the output to the T anchor of the Image to Text tool

Image to Text tool - set configuration Image to 'image'

the output contains one pipe-delimited field called table0 (there may be additional table fields depending on the structure of the input pdf.)

Use a Text to Columns tool with | delimiter
Use a Sample tool to skip the first row of 0|1|2|3...
Use Dynamic Rename to 'Take Field Names from First Row of Data'

My workflow (version 2021.3) and the PDF of titanic data are attached. I hope you find this helpful!

terry10

pdf extract table wf.jpg

Laurap1228 · ‎09-10-2021

Thanks for sharing! It's good to know I'm not the only one who struggled to work this out.

juuustin · ‎04-21-2023

The 'Parse Table Columns' container of this workflow is operating properly. When the text to columns tool is run, it only grabs the first line of the Table0 column which is 0|1|2|3.. etc. and does not grab any of the actual table data. I'm not sure how to get the actual underlying data into row format to then continue processing.

PARESH · ‎04-22-2024

Hello, I tried above but unable to extract it in tabular format, How do I extract table into Alteryx for further blending ?

Appreciate your support.

James89 · ‎05-29-2024

Hi Juuustin,

Are you able to get the underlying data, I couldn't as well.

Lumjing · ‎12-27-2024

@terry10 Thank you this saved me a lot of time. For the 'Parse Table Columns' seeing one column. We can update the below configuration in Text to column tool to get the desired output

MervynClarke

To parse the columns, add a "Text to Columns" tool that splits table0 to rows. Set the delimiter to "\n" for each new row. Place it before the current "Text to Columns"

Thanks for posting this solution! Reigniting an old project with this one.

Alteryx Designer Desktop Discussions

Extract Data from PDF by Auto-detecting Table (Intelligence Suite Computer Vision)

Re: Row creation

Re: How to select columns dynamically using number...

Re: Batch macro to read 1000+ .xlsx files with var...

Re: Issue when using Block Until Done and Power BI...

Example workflow for setting up a custom list to u...