Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Extract Data from PDF by Auto-detecting Table (Intelligence Suite Computer Vision)

terry10
12 - Quasar

Auto detecting a table of data from a PDF using the new Computer Vision tools from Intelligence Suite is a great new functionality, but figuring out how to make it work was not intuitive for me. I did a lot of trial and error figuring out how to extract a table of data from a pdf. Here's how I made it work.  

 

  • Image Input - browse to "Titanic Data.PDF" (attached)
  • Connect the output from the Image Input tool to both 
    1. D input anchor of Image to Text tool
    2. the optional input anchor of an Image Template tool - set configuration image to 'image'
  • Image Template Tool -  set configuration Image to 'image'

the output contains a Markup field - connect the output to the T anchor of the Image to Text tool

markup.jpg

  • Image to Text tool - set configuration Image to 'image'

the output contains one pipe-delimited field called table0 (there may be additional table fields depending on the structure of the input pdf.)

titanic3.jpg

 

  • Use a Text to Columns tool with | delimiter
  • Use a Sample tool to skip the first row of 0|1|2|3...
  • Use Dynamic Rename to 'Take Field Names from First Row of Data'

 

My workflow (version  2021.3) and the PDF of titanic data are attached. I hope you find this helpful!

terry10

 

pdf extract table wf.jpg

 

 

4 REPLIES 4
Laurap1228
11 - Bolide

 Thanks for sharing! It's good to know I'm not the only one who struggled to work this out.

juuustin
5 - Atom

The 'Parse Table Columns' container of this workflow is operating properly. When the text to columns tool is run, it only grabs the first line of the Table0 column which is 0|1|2|3.. etc. and does not grab any of the actual table data. I'm not sure how to get the actual underlying data into row format to then continue processing.

 

juuustin_0-1682093699529.png

 

PARESH
7 - Meteor

Hello, I tried above but unable to extract it in tabular format, How do I extract table into Alteryx for further blending ?

Appreciate your support. 

James89
7 - Meteor

Hi Juuustin, 

 

Are you able to get the underlying data, I couldn't as well. 

Labels