This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
It's the most wonderful time of the year - Santalytics 2020 is here! This year, Santa's workshop needs the help of the Alteryx Community to help get back on track, so head over to the Group Hub for all the info to get started!
Hello all, I am hopeful to learn more about how my Alteryx community is approaching and conducting SPC (Statistical Process Control) for QC inside of Alteryx. What kinds of best practices have others found in terms of creating tests to validate their completed workflows? I am referencing a medium blog below, does anyone have any workflows they are proud as punch of that has this important stage of development?
A few examples from the blog:
-Count Verification — Check that row counts are in the right range, … -Conformity — US Zip5 codes are five digits, US phone numbers are 10 digits, … -History — The number of prospects always increases, … -Balance — Week over week, sales should not vary by more than 10%, … -Temporal Consistency — Transaction dates are in the past, end dates are later than start dates, … -Application Consistency — Body temperature is within a range around 98.6F/37C, … -Field Validation — All required fields are present, correctly entered, …
Outputtests check the results of an operation, like a Cartesian join. For example:
-Completeness — Number of customer prospects should increase with time -Range Verification — Number of physicians in the US is less than 1.5 million
While I cannot share any workflows due to confidential nature, I can comment on some of the items listed below.
We use a number of different tools, some default, some with the CrEW macro pack, and some custom in-house tools to ensure that the data pipeline is behaving as expected.
The three biggest are:
1) Expect Zero Records (CrEW): this tool will throw an error should any records be passed to it. Why is this useful? In a join scenario where a tool has multiple outputs, you might expect every record to be joined with the other input datastream. If any were to drop off, the "expect zero" tool would let you know.
Also useful in a filter scenario where all records should be in a specified date range as mentioned below. Connected to the correct output anchor, this would flag any unsuspected records.
2) Unique Record Check (custom): this tool functions like the unique tool, but if any records have duplicate values it will throw an error. This elminates the extra steps of additional tools after a unique record tool, and has added benefit of error messages.
3) Expect Equal (CrEW & custom): This tool takes in two data streams and compares them and will error if the streams do not match.
We have created a custom tool that checks for equal number of records, independent of record content or formatting. This is useful to make sure that no records were dropped or duplicated.
Most of this falls within the verification steps you outlined below.
We do have a number of workflows that perform actual SPC checks on the data, most using formulas and multi-row formulas in order to check a number of different control checks. If any are out of control, then a warning email is sent to necessary parties.