Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Weekly Challenges

Solve the challenge, share your solution and summit the ranks of our Community!

Also available in | Français | Português | Español | 日本語
IDEAS WANTED

Want to get involved? We're always looking for ideas and content for Weekly Challenges.

SUBMIT YOUR IDEA

Challenge #53: Parsing Report Formatted Data

GeneR
Alteryx Alumni (Retired)

The link to last week’s challenge (exercise #52) is HERE

 

This week Challenge is to perform some parsing of report formatted data to turn it into a table.  I have seen several solutions but keep in mind that you will want to make your solution as dynamic as possible.  My solution that I will post next week uses a batch macro but I look forward to seeing how you solve the problem.  Have fun and remember there is usually more than one solution.

 

Use case:  A company has a system that produces data that is not formatted into columns and rows. In this case, each row is separated by a line of dashes and each field is delimited by a |. 

 

Objective:  You goal is to create a process that will dynamically transform the data into a series of columns and rows keeping in mind that the number of columns and rows may vary on a case by case basis.

andrewdatakim
12 - Quasar
12 - Quasar

I was able to accomplish this one by using a standard macro. I did make a couple of assumptions.

 

1. The first row doesnt contain the actual headers.

2. The user was comfortable with the # instead of N/A for clientids.

3. Headers will be in the second row ( or first row of data) and the first row doesnt contain any required data.

 

After bringing in the text my primary objective was to find how many columns were in the report and what the headers were. By separating out the first row I was able to remove out the spacing and substitute the delimiter for something easier to count (@). This allowed me to count the columns and redirect the count to the Text to Columns tool. Next I renamed all the columns using the first row of data which contains all of the headers. Then I filtered out all of the rows used as dividers by filtering out the dividers containing '-'.  Next I removed the first column because it contained the full text and any field with "Field" in the name. This removes any fields that didn't have a column header. The final step was the cleaning to remove the whitespace.  This maco should take care of most of the basic needs and address the 'n' of columns. 

 

Text to Columns WorkflowText to Columns Workflow

The interface is simple but it does allow for the user to change the reporting column they wish to parse in case there is any additional metadata and the metadata can be swapped as well.

Text to Columns InterfaceText to Columns Interface

mceleavey
17 - Castor
17 - Castor

Removed.



Bulien

mceleavey
17 - Castor
17 - Castor

I've been beaten to it, but here's mine:

 

I first created a macro input to feed in the sample file. I then gave each row an ID, and given that each row is split by "-----" rows, I removed these using the mod() formula. I then replaced pipes with a better behaving "!" and did a count of these characters:

 

Initial Data Cleanse.PNG

 

I then took a branch off to sum these counts, which gave me the number of columns. This, I then fed into a list box, which I then fed into the Text to Columns tool using the action to update the number of rows with the number of columns determined earlier:

 

Create Input variable.PNG

 

This updated the "# of Columns" field to the dynamically calculated figure, did some tidying up, then used Dynamic Rename to rename the columns from the first row of data:

 

Text to columns on designated character and number of columns.PNG

 

Simple.



Bulien

mceleavey
17 - Castor
17 - Castor

Not sure why it posted twice...hey ho.



Bulien

seven
12 - Quasar

My solution is dynamic and does not need a macro. Please let me know what you think!

JoeM
Alteryx Alumni (Retired)

Thanks everyone for you solutions to challenge #53!  We have posted one solution to the challenge in the original post.

Harbinger
9 - Comet

 No need for configuration (just field to be parsed mapping) standard macro that calls two custom macros-- one Standard and one Batch. Take a look; I have attached the standard macro to the post.

Spoiler

/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
Here is the macro used in the workflow (spectacles Icon)
/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/CompletedWeek53Workflow.PNG

/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
Here is the macro that is exposed to the end user (spectacles Icon)
/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/CompletedWeek53Workflow.PNG
/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
Here is the number of columns standard macro (No Icon) 
/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
CompletedWeek53WorkflowNumberOfColumns.PNG

/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
Here is the column parsing batch Macro (Column Icon)
/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/ 
CompletedWeek53WorkflowColumnParse.PNG  

  

SeanAdams
17 - Castor
17 - Castor

Wow - this has to be the most diverse set of responses for a challenge - everything from @Harbinger 's macros, each with their own icon, to @andrewdatakim and @mceleavey's analytic apps.

 

My solution is much closer to @seven - but without the double summarize component, and the @GeneR without the column counter.

Spoiler
My thinking is that the way to deal with the dynamic nature of the input is to deal with it in rows rather than columns, and then you can always cross-tab it back to columns when you're done.   That way you never need to know how many columns there are....

- Add a row ID
- text to columns to get the various columns into rows
- then put a field ID using a multi-row formula, grouped on row ID
- then cross-tab back using the field-ID as the column name 
- then dynamic rename back to the right column name

 

Note: the provided output data set is a bit of a cheat because there's a little bit of column cleaning done (client ID replaces blank & # with a 'N/A' for example - I replicated this just to make the data match, but this does impact the ability to be truly dynamic.

 

NicoleJohnson
ACE Emeritus
ACE Emeritus

Similar to @SeanAdams solution I believe... looking at the other results, though, I clearly need to spend more time playing with macros! :) 

 

Spoiler
WeeklyChallenge53.JPG