Alteryx Connect Data Quality Score



When Alteryx connect is first installed to a company with a small alteryx designer base, you do not benefit from lineage.

There are not much workflows at hand. So in order to realize Alteryx connect's immediate benefits I'd like to suggest;


a company-wide Data Quality Score.


  1. Let's score each data element in distributed data stores
  2. And automatically give a simple scale between one and five
    • 1 equals to, “we don’t know”
    • 2 data is entered or updated prior to 1 year, has conflicting data
    • 3 would be the norm and means customer provided this data, as accurate and as up-to-date as they have entered it and ‘agreed’ to share with you.
    • 4 means we cross checked the data with 3rd party sources or the addresses work in Google Maps”.
    • 5 equals to “we had the customer or the representative validated the address in last 3 months”.
  3. The scale will be based on;
    • Missingness
    • Information value (variance is high or not, if there is no variance no info useful thru the column)
    • How many times that column is addressed in other tables
    • Format (structured like a telephone number ###-##-## or semi structured like an address)
    • Is it an ID column
    • Is it a Datetime column, any discrepancies in date time columns etc.
    • Time since last update of data
  4. Once we have some lineage information than we'll weight th data based on how frequently it's needed, how many formulas are requiring the field etc.


And as soon as we install connect we'll have a grand vision of our data and even we'll be able to track the status of our whole distributed data assets with a trend line if we are going better or worse... Here is an example;






13 - Pulsar

Obviously we can have Data Quality Scores specifically on


  • specific data sources (eg. ORACLE EXADATA, CRM Teradata, etc.)
  • LOB data like CRM, CEM, Risk management data set quality scores etc.

Do you think this will be useful? @AshleyK