- Subscribe to RSS Feed
- Mark as New
- Mark as Read
- Bookmark
- Subscribe
- Printer Friendly Page
- Notify Moderator
Alteryx tiles or Tetrominoes?
Tetris is a game known and played by many. Like Alteryx, there are many ways to solve Tetris. In both, the thrill and fulfillment in finishing a workflow or level are palpable.
I have recently come across a use case where the customer needed to create a master data table by extracting information from a legacy form provided by their clients. This form has supplier information in both header and tabular form. In the sample screenshot below, the information in the yellow box and green box are for two separate suppliers. Other interesting properties:
- "Address" can have up to three lines of text
- "Number" may be blank or not exist altogether
- "Supplier Name" can have up to three lines of text
- There are no limits to the number of suppliers or "Pages" in the form
- A supplier may be listed more than once in a document; i.e., it can appear on multiple "Pages" in the form
- While supplier master data has fewer changes compared to transactional data, updates are still expected to happen regularly
- The business team who needs to format the document does not have access to the legacy system, nor can they can make change requests to modify the form
As a Solution Engineer with a development background at Alteryx, I suggested RegEx. I regretted this approach when I discussed the concept of RegEx with the customer, even with the help of ChatGPT. When working with business-focused users, adding code (even fragments of it) may complicate conversations in scenarios like this.
Even if RegEx works in extracting the values, how can we do that recursively across the pages of the document while keeping the relationship between the Supplier Number and all the other related fields intact? How can we make the approach user-friendly?
Rotate (Solution)
Rotating the shape in Tetris gives the player options on the potential approaches to solving the puzzle.
Pivoting a table changes its orientation "wholesale" so that approach is not applicable here. In this scenario, parts of the form need to be broken down, rotated, and made whole again to arrive at the desired outcome.
There were two approaches we explored. This blog will focus on Flat File Layout.
- Using Text to Columns
- Using Flat File Layout
The customer currently uses Datasnipper to extract the information from the legacy forms. Said tool does not consistently extract information from fields that can have up to three lines in the document. Coincidentally, Alteryx Flat File Layout has a similar interface to DataSnipper, albeit with higher reliability, as it creates fixed-width columns. Tools such as the Multi-Row Formula can be used in conjunction to ensure no lines in the form are missed.
How the solution works and how it is used:
1. Each Extraction process uses the Alteryx Flat File Layout tool. Flat file layouts in legacy forms are expected to change infrequently. Should there be changes in the format of the source file, this can be reflected in the Flat File Layout tool at the Input Data (see 1 in the workflow screenshot). In the Configuration panel, look for File/Field Layout, then click on the "..." (ellipsis).
a. You will be directed to the File Layout Screen, where you can click or un-click the red vertical lines to move the flat file layout (delimiter). This approach applies to all the extraction processes for Master, Tax, Address, Bank, and Status Information.
b. There are three different instances of Input Data, but these are all referencing the same file/s. The replication is because of the need to use three different flat file layouts to extract the different sets of information.
c. NOTE: ONLY do modifications in Flat File Layout if there are changes in the page format or business requirements.
2. View the Output via the Browse tool. The download of the output is possible by configuring the Output Data tool.
Assumptions and why this may be an effective approach:
- Supplier number is the primary key, i.e., not page numbers.
- Allows supplier numbers to be repeated in one document across multiple pages.
- Multiple legacy forms can be processed SIMULTANEOUSLY as long as it follows a naming convention; in this case, "Sample Supplier Master Data*" including the asterisk (*) should be the value in the "Connect a File or Database" field in the Input Data tool.
- Uses an intuitive visual flat file layout tool (see above) to do modifications in the extraction logic.
Limitations and why this workflow may not be fit for purpose:
- Document formats change often.
- Field names and placements for the same Supplier Master Data Report differ significantly from one customer to another, which can still result in manual efforts.
- Requires creation/maintenance of multiple layouts in the Flat File Layout tool. This means changes in the file format or business requirements will require a review of all the flat file layouts and their corresponding portions of the workflow.
Hard Drop (Automation)
In Tetris, using a hard drop immediately drops the shape to the stack of shapes (Tetrominoes). Players normally do a hard drop if they feel that they have solved the puzzle and they want to go on to the next step.
In Alteryx, once you have validated your solution, you can complete complex and previously manual tasks almost immediately. You also have the option to schedule your workflow via Alteryx Server, which allows for scheduling, bulk execution, and governance of your workflows, among many other enterprise-grade capabilities.
Mic Drop (Conclusion)
There is only one button for both rotate and hard drop in Tetris. The player taps the button quickly to rotate or holds the same button for a hard drop. In Alteryx, the workflow solution you have created (rotate) is also the automation (hard drop). As a bonus, that is also your documentation! Alteryx also allows for a user-friendly and low-code approach to parse text from legacy forms with complex formats.
I hope this post gave you ideas on how you can approach the extraction of information from legacy forms that contain header and tabular data.
Oh, and yeah - No RegEx!
Source: GIPHY
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.