Ever gotten to the end of adding some new functionality to a module or app and asked yourself something like “This process used to output 77 columns—why am I seeing only 75?” Or ever have a downstream process break because you inadvertently changed an int to a string? Tracking down and preventing these little data issues can take time away from the fun of developing something new, or the fun of, say, reading more blogs like this one. Enter the new Metadata Macro. [Cue the super hero music.] The Metadata Macro is here to simplify development and save time by alerting you to metadata issues in a quick and easy way. Stick it on the end of a module to ensure your metadata always comes out the way you want it, or use it during development in the middle of a module to ensure that any metadata changes you caused are intentional.
Here’s how it works:
The Metadata Macro accepts 2 inputs:
The Metadata Macro essentially does 3 things, detailed further down:
The macro produces 2 outputs:
Record counting:
The first thing the macro does is simply count the total number of records on the test stream.
Data type comparison:
Next, the macro compares data types on the test stream to the standard stream and indicates the following:
By default the macro looks for all 3 metadata differences noted here, but you have the option to select/deselect any of the options above. For example, you may not care about extra columns being on the test stream, but you want to ensure there’s nothing missing or mismatching, so you could deselect the ‘extra columns’ checkbox and leave the other 2 selected. Selecting/deselecting these options will not affect speed performance of the macro.
Null and empty-string counting:
Finally, the macro counts nulls and empty strings on all columns in the test stream. This can be very useful in cases of:
By default the macro will count nulls and empty strings on all columns on all rows, but you have the option to count them on a sampling of rows, e.g. 1 out of every 100 rows. While counting nulls and empty strings on just a sampling of rows will not catch every single occurrence of a null or empty string, it will perform much faster on large data sets. Depending on the usage scenario, a sampling may be more than enough to capture any issues you’re trapping for. Or for maximum speed you can simply skip null and empty-string counting altogether.
2 Modes:
In addition, in either mode you have the option to make the column comparison case-sensitive, e.g. make it so that zip5 will be treated as a different column than ZIP5. This can be helpful for example on loads to relational database management systems (RDBMS’s) that are case sensitive on column names, such as Oracle, vs. other RDBMS’s like SQL Server—and the Alteryx platform itself—which are not case-sensitive.
The macro reports the results in 2 ways:
This versatility allows you to use the macro the way you want to and still have access to all the same information. Depending on the situation, you may just wish to
Two of the most common usages for the Metadata Macro are:
Both of these example usages are demonstrated on the demo module for the macro, found at the bottom of this blog post.
Save the MetadataMacro.yxmc in your Alteryx macros folder (C:\Program Files\Alteryx\Engine\RuntimeData\Macros\) and then you’ll find it in the ‘Data Investigation’ tool set in the tool palette the next time you fire up Alteryx.
Enjoy!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.