Be sure to review our Idea Submission Guidelines for more information!
Submission GuidelinesOne of the biggest areas of time spent is in basic data cleaning for raw data - this can be dramatically simplified by taking a hint from the large ETL / Master data Management vendors and making this core Alteryx.
Server Side
- Allow the users of the server & connect product to define their own Business Types (what Microsoft DQS calls "Domains")
- Example may be a currency code - there are many different synonyms, but in essence you want your data all cleaned back to one master list
- Then allow for different attributes to be added to these business types
- Currency code would have 2 or 3 additional columns: Currency name; Symbol; Country of issue
- Similar to Microsoft DQS - allow users to specify synonyms and cleanup rules. For example - Rupes should be Rupees and should be translated to INR
- You also need cross business type rules - if the country is AUS then $ translates to AUD not to USD.
- These rules are maintained by the Data Steward responsibility for this Business Type.
- This master data needs to be stored and queryable as a slowly changing dimension (preferrably split into a latest & history table with the same ID per entry; and timestamps and user audit details for changes)
Alteryx Designer:
- When you get a raw data set - user can then tag some fields as being one of these business types
- Example: I have a field bal_cur (Balance Currency) - I tag this as Business Type "Currency"
- Then Alteryx automatically checks the data; and applies my cleanup rules which were defined on the server
- For any invalid entries - it marks these as an error in the canvas; and also adds them to a workflow for the data steward for this Business Type on the server - value is set to an "unmapped" value. (ID=-1; all text columns set to "unmapped")
- For any valid entries - it gives you the option to add which normalised (conformed) columns you want - currency code; description; ID; symbol; country of issue
Data Steward Workflow:
- The data steward is notified that there is an invalid value to be checked
- They can either mark this as a valid value (in which case this will be added to the knowledge base for this business type) or a synonym of some other valid value; or an invalid value
Cleanup Audit & Logs:
- In order to drive upstream data cleaning over time - we would need to be able to query and report on data cleanups done by source; by canvas; by user; by business type; and by date - to report back to the source system so that upstream data errors can be fixed at source.
Many thanks
Sean
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.