We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Extracting Text from Semi Structured PDF tables

Vineet003
7 - Meteor

Hello All,

 

I am working with semi structured PDFs which are mostly in the form of a table (not the table one would ideally like to imagine) - i am struggling to extract the following data

 

1) Order No.:

2) Medical Device

3) Reviewer (Supervisor)

4) Essential- / Standard Requirement:

5) Explanation of Deficiency:

6) Manufacturers response:

 

Please find the pdf attached

 

I would like to do two things:-

1)Extract the above mentioned information

2) Split "Explanation of Deficiency" questions into sub questions ( E.g. 1.1,1.2,1.3...) Similarly Manufacturers response (if possible)

 

Example of one output row -

1) Order No.:1234567890

2) Medical Device: DeLorean DMC-12 Time Machine

3) Reviewer (Supervisor): Luke Skywalker

4) Essential- / Standard Requirement: MDD Annex X

5) Explanation of Deficiency:1) Flux Capacitor Stability: The Flux Capacitor, which is at the heart of the time-travel mechanism, has shown a tendency to malfunction at high speeds, particularly 88mph. This poses a potential safety concern. What measures have been taken to ensure stability and safety during temporal transition?

6) Manufacturers response: 1) Flux Capacitor Stability: We appreciate your concerns regarding the Flux Capacitor's functionality. Our team, led by Dr. Emmett Brown, has implemented a new stabilizing algorithm to minimize flux oscillations at high speeds. Additionally, we've reinforced the housing unit to ensure that any unforeseen malfunctions do not compromise passenger safety. Testing has demonstrated a 98% stability rate during temporal transitions.

 

Thanks for the help !

2 REPLIES 2
Bluebird_Tim
7 - Meteor

Hi @Vineet003 - Have you taken a look at the computer vision tools?  This seems like an applicable use case for them. See here for a good post on them

https://community.alteryx.com/t5/Data-Science/Unlocking-Insights-from-Images-using-Computer-Vision/b...

Felipe_Ribeir0
16 - Nebula

Hi @Vineet003 

 

If you dont have access to the computer vision tools, you can also use python/tabula library to solve this (for free). Take a look at the pdf/workflow that i created for this other topic. With some changes to it you will be able to extract the info that you need.

 

Solved: Re: Can Alteryx Designer with Intelligence Suite P... - Alteryx Community

Labels
Top Solution Authors