Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Extracting Text from Semi Structured PDF tables

Vineet003
7 - Meteor

Hello All,

 

I am working with semi structured PDFs which are mostly in the form of a table (not the table one would ideally like to imagine) - i am struggling to extract the following data

 

1) Order No.:

2) Medical Device

3) Reviewer (Supervisor)

4) Essential- / Standard Requirement:

5) Explanation of Deficiency:

6) Manufacturers response:

 

Please find the pdf attached

 

I would like to do two things:-

1)Extract the above mentioned information

2) Split "Explanation of Deficiency" questions into sub questions ( E.g. 1.1,1.2,1.3...) Similarly Manufacturers response (if possible)

 

Example of one output row -

1) Order No.:1234567890

2) Medical Device: DeLorean DMC-12 Time Machine

3) Reviewer (Supervisor): Luke Skywalker

4) Essential- / Standard Requirement: MDD Annex X

5) Explanation of Deficiency:1) Flux Capacitor Stability: The Flux Capacitor, which is at the heart of the time-travel mechanism, has shown a tendency to malfunction at high speeds, particularly 88mph. This poses a potential safety concern. What measures have been taken to ensure stability and safety during temporal transition?

6) Manufacturers response: 1) Flux Capacitor Stability: We appreciate your concerns regarding the Flux Capacitor's functionality. Our team, led by Dr. Emmett Brown, has implemented a new stabilizing algorithm to minimize flux oscillations at high speeds. Additionally, we've reinforced the housing unit to ensure that any unforeseen malfunctions do not compromise passenger safety. Testing has demonstrated a 98% stability rate during temporal transitions.

 

Thanks for the help !

2 REPLIES 2
Bluebird_Tim
7 - Meteor

Hi @Vineet003 - Have you taken a look at the computer vision tools?  This seems like an applicable use case for them. See here for a good post on them

https://community.alteryx.com/t5/Data-Science/Unlocking-Insights-from-Images-using-Computer-Vision/b...

Felipe_Ribeir0
16 - Nebula

Hi @Vineet003 

 

If you dont have access to the computer vision tools, you can also use python/tabula library to solve this (for free). Take a look at the pdf/workflow that i created for this other topic. With some changes to it you will be able to extract the info that you need.

 

Solved: Re: Can Alteryx Designer with Intelligence Suite P... - Alteryx Community

Labels
Top Solution Authors