The link to last week’s challenge (exercise #52) is HERE
This week Challenge is to perform some parsing of report formatted data to turn it into a table. I have seen several solutions but keep in mind that you will want to make your solution as dynamic as possible. My solution that I will post next week uses a batch macro but I look forward to seeing how you solve the problem. Have fun and remember there is usually more than one solution.
Use case: A company has a system that produces data that is not formatted into columns and rows. In this case, each row is separated by a line of dashes and each field is delimited by a |.
Objective: You goal is to create a process that will dynamically transform the data into a series of columns and rows keeping in mind that the number of columns and rows may vary on a case by case basis.
I was able to accomplish this one by using a standard macro. I did make a couple of assumptions.
1. The first row doesnt contain the actual headers.
2. The user was comfortable with the # instead of N/A for clientids.
3. Headers will be in the second row ( or first row of data) and the first row doesnt contain any required data.
After bringing in the text my primary objective was to find how many columns were in the report and what the headers were. By separating out the first row I was able to remove out the spacing and substitute the delimiter for something easier to count (@). This allowed me to count the columns and redirect the count to the Text to Columns tool. Next I renamed all the columns using the first row of data which contains all of the headers. Then I filtered out all of the rows used as dividers by filtering out the dividers containing '-'. Next I removed the first column because it contained the full text and any field with "Field" in the name. This removes any fields that didn't have a column header. The final step was the cleaning to remove the whitespace. This maco should take care of most of the basic needs and address the 'n' of columns.
The interface is simple but it does allow for the user to change the reporting column they wish to parse in case there is any additional metadata and the metadata can be swapped as well.
I've been beaten to it, but here's mine:
I first created a macro input to feed in the sample file. I then gave each row an ID, and given that each row is split by "-----" rows, I removed these using the mod() formula. I then replaced pipes with a better behaving "!" and did a count of these characters:
I then took a branch off to sum these counts, which gave me the number of columns. This, I then fed into a list box, which I then fed into the Text to Columns tool using the action to update the number of rows with the number of columns determined earlier:
This updated the "# of Columns" field to the dynamically calculated figure, did some tidying up, then used Dynamic Rename to rename the columns from the first row of data:
No need for configuration (just field to be parsed mapping) standard macro that calls two custom macros-- one Standard and one Batch. Take a look; I have attached the standard macro to the post.
Note: the provided output data set is a bit of a cheat because there's a little bit of column cleaning done (client ID replaces blank & # with a 'N/A' for example - I replicated this just to make the data match, but this does impact the ability to be truly dynamic.