This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
One of the biggest areas of time spent is in basic data cleaning for raw data - this can be dramatically simplified by taking a hint from the large ETL / Master data Management vendors and making this core Alteryx.
- Allow the users of the server & connect product to define their own Business Types (what Microsoft DQS calls "Domains")
- Example may be a currency code - there are many different synonyms, but in essence you want your data all cleaned back to one master list
- Then allow for different attributes to be added to these business types
- Currency code would have 2 or 3 additional columns: Currency name; Symbol; Country of issue
- Similar to Microsoft DQS - allow users to specify synonyms and cleanup rules. For example - Rupes should be Rupees and should be translated to INR
- You also need cross business type rules - if the country is AUS then $ translates to AUD not to USD.
- These rules are maintained by the Data Steward responsibility for this Business Type.
- This master data needs to be stored and queryable as a slowly changing dimension (preferrably split into a latest & history table with the same ID per entry; and timestamps and user audit details for changes)
- When you get a raw data set - user can then tag some fields as being one of these business types
- Example: I have a field bal_cur (Balance Currency) - I tag this as Business Type "Currency"
- Then Alteryx automatically checks the data; and applies my cleanup rules which were defined on the server
- For any invalid entries - it marks these as an error in the canvas; and also adds them to a workflow for the data steward for this Business Type on the server - value is set to an "unmapped" value. (ID=-1; all text columns set to "unmapped")
- For any valid entries - it gives you the option to add which normalised (conformed) columns you want - currency code; description; ID; symbol; country of issue
Data Steward Workflow:
- The data steward is notified that there is an invalid value to be checked
- They can either mark this as a valid value (in which case this will be added to the knowledge base for this business type) or a synonym of some other valid value; or an invalid value
Cleanup Audit & Logs:
- In order to drive upstream data cleaning over time - we would need to be able to query and report on data cleanups done by source; by canvas; by user; by business type; and by date - to report back to the source system so that upstream data errors can be fixed at source.