This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on 05-20-201909:11 AM - edited on 06-19-201909:40 AM by ichand
Hector Amaya, Greg Murray
BI Analyst; BI Developer
Overview of Use Case
Leveraging the power of batch macros has the potential to eliminate days of work from an analyst perspective on our BI team. We would like to share our journey of creating batch macros and how it has eliminated busy work for the analysts on our team. In the time it would have taken to generate 70% of the documentation needed to bring in a new data source into our data warehouse, we were able to create two workflows that automate this work. Follow us as we generate source to target documentation using batch macros and dynamic input tools to query the source database. We then show how to read in the documentation created in the prior Alteryx workflow and compare it against the tables that have been developed. We end with how we hacked a Crew Macro to validate sample data from Source to Stage to ODS.
Describe the business challenge or problem you needed to solve
We found ourselves spending more time formatting and prepping source to target documentation than actually analyzing and profiling when bringing in a new data source.
On top of that, whenever we bring in a new source we have to verify that the column names, order, datatypes, and primary keys are being created according to the specifications that we provided the developers. We also want to make sure to validate the data coming in by matching it to the source. We have a standard testing process that takes a lot of time, copying and pasting and making sure the target data is the same as the source data.
We sought to create something that would eliminate iterative tasks of creating source to target documentation and test planning while making it easy to identify issues. We also wanted to free up some time for the analysts to actually focus on analytics!
Describe your working solution
Although our batch macro solution is being used by our internal team, concepts of our Alteryx solution are being used by the Finance department and it has sparked interest in other departments/teams. The solution also helps free up resources within our IT department to focus on other projects.
To generate our source to target documentation for Stage and ODS, we used a SQL query (input tool) to read in the table and column information from the Information schema of a database. We use that output to identify which tables we will extract and label them accordingly. We then join the two outputs to keep only the table and column information that will be extracted from the source. This is when we introduce the Append and Formula tools to convert datatypes appropriately. Finally, we tap into the power of batch macros to group the data streams by table and write it a sheet in an Excel template document.
For our test planning solution, we broke it up into two steps. First, we validated that the metadata of the table and columns built in our test environment for our Stage, ODS, and LOAD tables match the specification document. Second, was to validate the target data in the tables matched the source data for Stage and ODS.
In order to carry out Step 1 we employed a similar strategy from our source to target solution, we queried the Information schema of our database. We then read in the source to target documentation and compare table/column names, datatypes, order, primary keys, and indexes. If everything matched, then the output was written to a specific sheet in a Test Plan xlsx template; otherwise we would address disparities with our developers.
For Step 2, we used our ability to dynamically change a SQL query to pull random samples out of our source table and compare to our target table. This is where multiple batch macros were used to do the heavy lifting of iterating the steps for each table. We were also able to hack into the CReW Expect Equal macro for comparing the sample data sets with each other to ensure that our data validated.
STM Matching (Column Names & Order, Data types, and Primary Keys
Querying DBs & creating random samples files and Prep Data for Validation by blending sample record file paths w/ SRC, STG, & ODS schema and table
Batch prepped data output to validate and insert results and sample data into ATP. ***BLOBs & SpatialObjs are excluded in sample record export, but are still validated. Also, string records are right trimmed and empty strings are converted to NULLs***
Describe the benefits you have achieved
For Greg Murray, Alteryx has completely change his career. “I was an analyst in HR. Alteryx helped me get a role in IT, it empowered me to further explore data. And, I recently took a role in another company as senior BI developer and I still use Alteryx,” he said.
For Hector Amaya “I feel like I can build workflows similar to the way my mind works. It gives the chance to someone that is analytical like I am to actually start building, especially since I don’t have a background in coding.”
Alteryx is great for integration and we want to use it more for data science and eventually spatial analytics. We currently use Informatica for our enterprise ETL, but Alteryx is a little easier to quickly identify a solution and provide a proof of concept to our business partners and developer colleagues.