Weekly Challenge

Solve the challenge, share your solution and summit the ranks of our Community!
IDEAS WANTED

We're actively looking for ideas on how to improve Weekly Challenges and would love to hear what you think!

Submit Feedback
We've recently made an accessibility improvement to the community and therefore posts without any content are no longer allowed. Please use the spoiler feature or add a short message in the message body in order to submit your weekly challenge.

Challenge #53: Parsing Report Formatted Data

Highlighted
Alteryx Alumni (Retired)

The link to last week’s challenge (exercise #52) is HERE

 

This week Challenge is to perform some parsing of report formatted data to turn it into a table.  I have seen several solutions but keep in mind that you will want to make your solution as dynamic as possible.  My solution that I will post next week uses a batch macro but I look forward to seeing how you solve the problem.  Have fun and remember there is usually more than one solution.

 

Use case:  A company has a system that produces data that is not formatted into columns and rows. In this case, each row is separated by a line of dashes and each field is delimited by a |. 

 

Objective:  You goal is to create a process that will dynamically transform the data into a series of columns and rows keeping in mind that the number of columns and rows may vary on a case by case basis.

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

I was able to accomplish this one by using a standard macro. I did make a couple of assumptions.

 

1. The first row doesnt contain the actual headers.

2. The user was comfortable with the # instead of N/A for clientids.

3. Headers will be in the second row ( or first row of data) and the first row doesnt contain any required data.

 

After bringing in the text my primary objective was to find how many columns were in the report and what the headers were. By separating out the first row I was able to remove out the spacing and substitute the delimiter for something easier to count (@). This allowed me to count the columns and redirect the count to the Text to Columns tool. Next I renamed all the columns using the first row of data which contains all of the headers. Then I filtered out all of the rows used as dividers by filtering out the dividers containing '-'.  Next I removed the first column because it contained the full text and any field with "Field" in the name. This removes any fields that didn't have a column header. The final step was the cleaning to remove the whitespace.  This maco should take care of most of the basic needs and address the 'n' of columns. 

 

Text to Columns WorkflowText to Columns Workflow

The interface is simple but it does allow for the user to change the reporting column they wish to parse in case there is any additional metadata and the metadata can be swapped as well.

Text to Columns InterfaceText to Columns Interface

Highlighted
Alteryx Certified Partner

Removed.

Highlighted
Alteryx Certified Partner

I've been beaten to it, but here's mine:

 

I first created a macro input to feed in the sample file. I then gave each row an ID, and given that each row is split by "-----" rows, I removed these using the mod() formula. I then replaced pipes with a better behaving "!" and did a count of these characters:

 

Initial Data Cleanse.PNG

 

I then took a branch off to sum these counts, which gave me the number of columns. This, I then fed into a list box, which I then fed into the Text to Columns tool using the action to update the number of rows with the number of columns determined earlier:

 

Create Input variable.PNG

 

This updated the "# of Columns" field to the dynamically calculated figure, did some tidying up, then used Dynamic Rename to rename the columns from the first row of data:

 

Text to columns on designated character and number of columns.PNG

 

Simple.

Highlighted
Alteryx Certified Partner

Not sure why it posted twice...hey ho.

Highlighted
Alteryx Certified Partner

My solution is dynamic and does not need a macro. Please let me know what you think!

Highlighted
Alteryx Community Team
Alteryx Community Team

Thanks everyone for you solutions to challenge #53!  We have posted one solution to the challenge in the original post.

Highlighted
Alteryx Certified Partner

 No need for configuration (just field to be parsed mapping) standard macro that calls two custom macros-- one Standard and one Batch. Take a look; I have attached the standard macro to the post.

Spoiler

/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
Here is the macro used in the workflow (spectacles Icon)
/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/CompletedWeek53Workflow.PNG

/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
Here is the macro that is exposed to the end user (spectacles Icon)
/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/CompletedWeek53Workflow.PNG
/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
Here is the number of columns standard macro (No Icon) 
/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
CompletedWeek53WorkflowNumberOfColumns.PNG

/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/
Here is the column parsing batch Macro (Column Icon)
/*-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------*/ 
CompletedWeek53WorkflowColumnParse.PNG  

  

Highlighted
16 - Nebula
16 - Nebula

Wow - this has to be the most diverse set of responses for a challenge - everything from @Harbinger 's macros, each with their own icon, to @AndrewDataKim and @mceleavey's analytic apps.

 

My solution is much closer to @seven - but without the double summarize component, and the @GeneR without the column counter.

Spoiler
My thinking is that the way to deal with the dynamic nature of the input is to deal with it in rows rather than columns, and then you can always cross-tab it back to columns when you're done.   That way you never need to know how many columns there are....

- Add a row ID
- text to columns to get the various columns into rows
- then put a field ID using a multi-row formula, grouped on row ID
- then cross-tab back using the field-ID as the column name 
- then dynamic rename back to the right column name

 

Note: the provided output data set is a bit of a cheat because there's a little bit of column cleaning done (client ID replaces blank & # with a 'N/A' for example - I replicated this just to make the data match, but this does impact the ability to be truly dynamic.

 

Highlighted
14 - Magnetar
14 - Magnetar

Similar to @SeanAdams solution I believe... looking at the other results, though, I clearly need to spend more time playing with macros! :) 

 

Spoiler
WeeklyChallenge53.JPG