Alteryx Use Cases

Read Alteryx customer stories to learn how they transform their organizations into becoming a data-driven business.
Analytic Automation That Adds Up | With Alteryx Analytic Process Automation™️, organizations can realize top-line growth, bottom-line savings, risk reduction, and efficiency gains. Calculate your ROI

Using the Text Mining Tools on Federal Tax Forms

12 - Quasar

Overview of Use Case

There has been a lot of excitement regarding the new Intelligence Suite released by Alteryx. I am very fortunate and thankful to my sales engineer - @MikeN helping me get the new Text Mining tool palette installed on my computer. Since, I can only say I’ve been addicted to building out use cases and a business case to buy the complete suite out in September 2020.

 

One of the biggest challenges I’ve faced as an Alteryx Artisan and user in my organization is telling teams throughout the organization that I cannot help them with their problems using Alteryx if most of their data is in PDF format. There were times where I’d suggest converting PDF documents to Excel, or utilize OCR technologies, but these solutions are either inefficient, inconsistent, or very expensive. The new Intelligence Suite Text Mining tool palette has changed that for me going forward.

 Season 10 Cha Ching GIF by RuPaul's Drag Race - Find & Share on GIPHY

 
Describe the business challenge or problem you needed to solve
 
I’m an accountant. I deal with PDF forms all the time. I send clients PDF forms, I file certain states with PDF forms, and I receive PDF forms from almost everyone.
 
Sleeping GIF - Find & Share on GIPHY
 

In the example that spiked this use case, we were onboarding a new client that only had their prior year forms in PDF format. If you are a tax accountant and familiar with Thomson Reuters products and XML filings, you know there are certain ways of moving data within and between systems, however – this was not the case.

 

I was tasked with manually entering prior year information from the clients PDF federal tax returns into Excel workbooks – much of the information is carried forward on current year federal tax forms.

I was like uhhhh…manually? With my fingers? In Excel? Like Adobe? What?


Queens What GIF by Like A Boss Movie - Find & Share on GIPHY

 
Describe your working solution
 
All I have to say now is Alteryx – Text Mining Tool Palettes.

Firstly, I used the Image Template tool to map out the annotations (or fields) in my PDF form that I wanted to extract information from.

 
 
 

Image 1.png


Then, I navigate to a PDF form with the PDF Input tool also found in the new Text Mining tool palette.

 

I perform some simple data manipulation to make sure that the pages in my PDF document match with the correct PDF template built into an Image Template tool, and simply run it through the new Image to Text tool. The configuration is very simple if you haven’t used it before.

 

Image 2.png

 

 

With further manipulation and Transform tools, I am able to transpose all of the data on the path and page of the data extracted from the PDF form.

 

Image 3.png

 

 

I used a batch macro grouped by the actual fields (i.e. annotations) to cleanse all of the data. This is so that I can append a RecordID to each grouped set in instances where there may be a tabbed line of information containing information on multiple lines.

 

You can see how clean the data looks after it runs through the batch macro.

 

Image 5.png

 

Also note the point I just referenced where for Line 02a – you actually get three records each with a different RecordID number. This way I know when there is a line 1, a line 2 and a line 3. Often times, this is a name and address.

 

Describe the benefits you have achieved
 

This use case and tool we built is going to save us time extracting data from prior year federal tax returns that we only have via PDF file sharing. We also eliminate human error in data entry.

 

We simply now mapped the Excel output that comes out of workflow to other third party applications, Excel workbooks and Alteryx workflows to continue the process efficiently and quick.

 

There is so much more to discuss with regards to confirming all of the information that you wanted to receive. I’ve simply built Text Input tools that contain all of the data points I want to extract, and simply use a Find Replace tool to append the extracted data to my template – then I can simply just review what was extracted and what was empty on the form.

 

Mind Blown GIFs - Get the best GIF on GIPHY


Why this over OCR?

Well simply OCR is expensive. It takes a lot of time to map documents to their proper data points, and it requires users to confirm the data extraction in order to assist with the machine learning side of the technology. With the annotations feature in the Image Template tool, I don’t have to confirm data extraction – I know that the tool will always refer to the exact same location.

 

Where is there to grow?

There are a lot of things I am still unsure of with the new Text Mining tool palette. For example, sometimes when I map out an entire PDF form, the tool actually bugs out and I lose my annotations which I spend a lot of time creating. As well, there are instances where the actual PDF shared with me is printed in a different format or size which also causes error in my data extraction. This is something that proves OCR more consistent and a better investment if this scenario will present itself in many cases. However, for the current investment we make in Alteryx, and the tools that we have at hand – this has become an amazing feature and addition to my data skills library.

 

If you have any questions on anything discussed in this use case, please feel free to reach out here, Via LinkedIn, or through my Instagram (if you hover over my Profile icon).

 

I am so happy to share this with the Alteryx community and I can’t wait to see what others build with the text mining tools.

 

Happy Pride – Happy Summer – Happy Health – Happy Unity.

 

Walking to 11.11.11. A Day Of Peaceful Intention | Contemporary Shaman

 

J

Unicorn Horn GIFs - Get the best GIF on GIPHY

Comments
8 - Asteroid

I should try this out as well, thanks.

6 - Meteoroid

Thank you for sharing and this is an awesome process! This may be a silly question, but where did you find the text mining tool palette? I was doing some searching and I'm not able to find them. Are these tools that you have developed yourself?

8 - Asteroid
Hi,

Even i am also not able to see tool palette.
12 - Quasar

Hey everyone! I felt the same way when I heard about the Text Mining Tools! I'd recommend that you reach out to your Sales Engineer! I reached out and they were extremely helpful in installing the tool palette. 

Please remember that you have to have the R-Package installed on your computer; and that should be running as Admin 😉

 

Thanks 🙂

 

J

Alteryx
Alteryx

@ChrisK1 @rohit782192 you can also find more information about how to install the Text Mining tool categories in here:

https://help.alteryx.com/current/designer/alteryx-intelligence-suite

and like Jacob said you can always contact your Sales Engineer and they will be happy to help you. 

8 - Asteroid
Hi,

There is no trail version available for these. I have alteryx higher
version 20.2
Alteryx
Alteryx

@rohit782192 - there is a trial for this : reach out to your Alteryx rep.

6 - Meteoroid

This is really cool @the_jake_tool !  I work in Indirect Tax so I can truly see the benefits of this on our hundreds of returns each month ;).  

 

Thanks for sharing this!

Alteryx
Alteryx

Good content @the_jake_tool !
Can you share the workflow with us?

Thank you!

8 - Asteroid

Awesome article! Also interested in the workflow...

8 - Asteroid
If possible to share workflows

Thanks and Regards
Rohit Gupta.
Alteryx Partner

can you please share your workflow? I am trying to build my first pdf extraction WF and I am not following your example.

 

Thank you

8 - Asteroid

Hi 

Great stuff. Can you please share the work flow? Thanks

Labels