Alteryx tiles or Tetrominoes?
Tetris is a game known and played by many. Like Alteryx, there are many ways to solve Tetris. In both, the thrill and fulfillment in finishing a workflow or level are palpable.
I have recently come across a use case where the customer needed to create a master data table by extracting information from a legacy form provided by their clients. This form has supplier information in both header and tabular form. In the sample screenshot below, the information in the yellow box and green box are for two separate suppliers. Other interesting properties:
As a Solution Engineer with a development background at Alteryx, I suggested RegEx. I regretted this approach when I discussed the concept of RegEx with the customer, even with the help of ChatGPT. When working with business-focused users, adding code (even fragments of it) may complicate conversations in scenarios like this.
Even if RegEx works in extracting the values, how can we do that recursively across the pages of the document while keeping the relationship between the Supplier Number and all the other related fields intact? How can we make the approach user-friendly?
Rotating the shape in Tetris gives the player options on the potential approaches to solving the puzzle.
Pivoting a table changes its orientation "wholesale" so that approach is not applicable here. In this scenario, parts of the form need to be broken down, rotated, and made whole again to arrive at the desired outcome.
There were two approaches we explored. This blog will focus on Flat File Layout.
The customer currently uses Datasnipper to extract the information from the legacy forms. Said tool does not consistently extract information from fields that can have up to three lines in the document. Coincidentally, Alteryx Flat File Layout has a similar interface to DataSnipper, albeit with higher reliability, as it creates fixed-width columns. Tools such as the Multi-Row Formula can be used in conjunction to ensure no lines in the form are missed.
How the solution works and how it is used:
1. Each Extraction process uses the Alteryx Flat File Layout tool. Flat file layouts in legacy forms are expected to change infrequently. Should there be changes in the format of the source file, this can be reflected in the Flat File Layout tool at the Input Data (see 1 in the workflow screenshot). In the Configuration panel, look for File/Field Layout, then click on the "..." (ellipsis).
a. You will be directed to the File Layout Screen, where you can click or un-click the red vertical lines to move the flat file layout (delimiter). This approach applies to all the extraction processes for Master, Tax, Address, Bank, and Status Information.
b. There are three different instances of Input Data, but these are all referencing the same file/s. The replication is because of the need to use three different flat file layouts to extract the different sets of information.
c. NOTE: ONLY do modifications in Flat File Layout if there are changes in the page format or business requirements.
2. View the Output via the Browse tool. The download of the output is possible by configuring the Output Data tool.
Assumptions and why this may be an effective approach:
Limitations and why this workflow may not be fit for purpose:
In Tetris, using a hard drop immediately drops the shape to the stack of shapes (Tetrominoes). Players normally do a hard drop if they feel that they have solved the puzzle and they want to go on to the next step.
In Alteryx, once you have validated your solution, you can complete complex and previously manual tasks almost immediately. You also have the option to schedule your workflow via Alteryx Server, which allows for scheduling, bulk execution, and governance of your workflows, among many other enterprise-grade capabilities.
There is only one button for both rotate and hard drop in Tetris. The player taps the button quickly to rotate or holds the same button for a hard drop. In Alteryx, the workflow solution you have created (rotate) is also the automation (hard drop). As a bonus, that is also your documentation! Alteryx also allows for a user-friendly and low-code approach to parse text from legacy forms with complex formats.
I hope this post gave you ideas on how you can approach the extraction of information from legacy forms that contain header and tabular data.
Oh, and yeah - No RegEx!
Source: GIPHY
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.