Data Science

Machine learning & data science for beginners and experts alike.
user0112
Alteryx
Alteryx

Dental offices across the US are generally run as private practices (small businesses). While this provides advantages in terms of customer service, it limits management's ability to hire expensive software engineers for automating time consuming administrative work. As always, this is where Alteryx comes to the rescue, putting a smile on our customers' faces by automating the complex using minimal code 😁!

 

via GIPHY

 

Example: Among the various administrative tasks, the most common one is to evaluate insurance coverage for a patient prior to their appointment. The manual work is as follows:

  1. Prior to the appointment, the admin requests the patient's benefits and coverage information from the insurance company’s portal.
  2. Insurance company responds with the patient’s coverage as a PDF attached to the email.
  3. PDF is pulled by the admin and added to the patient's file.
  4. Staff scrolls through the length of the PDF in order to find the data point each time.
  5. Decision is made.

via GIPHY

 

Using AIS, the workflow can be modified as such:

  1. Admin retrieves coverage PDF for the entire week's appointments.
  2. All PDF files are parsed using Computer Vision tools and entered into a database system.
  3. Specific data points can be retrieved as needed.

 

FUN FACT: Alteryx now also has the capability of pulling information from emails using the Outlook Connector. In some ways, we can even automate Step 1 from the above list!

 

Sound like fun? Okay, lets go over how to accomplish this. If you would like to follow along, instructions are provided in blue font!


Background information

 

Overview of AIS tool(s) we’ll be using: Computer Vision

File Used: Delta Dental of Pennsylvania Benefits Coverage (click to download)

Download the above file.

 

Sample data usedSample data used

 

Things to note:

  1. The data comes as a table in a pdf file (sample above and file attached).
  2. Data must be retrieved from each cell across all columns of the table.

 

Workflow

Snapshot of the workflowSnapshot of the workflow

 

We start by using the Computer Vision tools to extract the data from the pdf:

NeilR_0-1651600185936.png

  1. Image Input tool for retrieving data. (You can find this in the computer vision tools tab, point this to the file you downloaded.)
    Here we point Alteryx to the file location and identify the file type. Additionally, we can use the filter tool right after to filter the pages as needed.
  2. Image Template tool for annotating the table to parse. (Configure this to read the downloaded file and annotate as needed, you do not need an incoming connection for this tool.)
    Often times we need to retrieve data from a small portion of the page. Parsing an entire page and the following cleansing can be exhaustive in terms of development and compute time. The Image Template tool helps annotate portions of the page, instructing Alteryx to avoid parsing the unannotated sections. As a developer, this is easily by favorite AIS tool 😌
  3. Image to Text for retrieving the data in tables. (Connect the Image Template tool to Input T of this tool and Image Input tool to Input D and hit run.)
    This is where the magic happens. Data from the pdf is extracted with delimiters representing columns, ready for us to transform without leaving the Designer ecosystem.

    What’s really great about the above steps is the ability accomplish complex PDF parsing that uses text recognition models with a few clicks and no code!

Once the data is in table format, we use standard Alteryx data prep tools to transform and retrieve the necessary information:

 

image-20220228-154059.png

 

Data processing

 

Output after text extractionOutput after text extraction

 

Output after wrangling dataOutput after wrangling data

 

via GIPHY

 

It’s magic!

 

Outcome & experience

 

The AIS tools were able to easily parse the necessary information out of the table.

 

Table annotation: The annotation tool made the process very easy. Rather than extracting all content and spending considerable effort transforming the output, we are able to precisely point the extraction to the content we are interested in.

 

Text Extraction: Performed exactly as expected. Parsing all the text and generating the outputs in a table format for consumption. In this particular case, it was provided with challenging data (pdf file with tables).

 

Data cleansing: The extracted text is part of an Alteryx workflow, removing artifacts and wrangling the data into the appropriate structure was accomplished with very little effort using the plethora of tools Alteryx provides.

 

How to use the attached workflow

 

  1. Download the attached file.
  2. Double click the .yxzp file. It will extract and open the workflow. The data source (pdf file) will be extracted into a subfolder.
  3. The input tool in the workflow should contain a relative link to the pdf, you should not experience any errors when opening or executing the workflow. In the event you do, please configure the input tool to read from the folder containing the pdf extracted in step 2.
  4. Hit “Run Workflow” in Designer. The workflow should read in the pdf and extract all the necessary information.
  5. The Basic Table tool can be used to view the final output in a pdf format.

 

Useful links

 

Comments
gautiergodard
13 - Pulsar

Thank you for sharing !