This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Connect currently provides lineage for a datasource at the table level but we would like to be able to have this at a more granular level i.e. the column level so if a specific column changes in the table (e.g. data type change), we are notified of the change and able to identify quickly assets that will be affected.
In a large environment- especially an analytical environment - copies of data will often appear in multiple places. an example of this is where a copy of a shared dimension or a shared piece of reference data is copied in multiple different data marts.
In order to manage this - we need to be able to mark these as copies of each other so that we can point folk to the golden-source; and so that we don't need to document this asset multiple different times.
- Client List appears in the data lake; on the Sales data mart; on the Finance data mart; etc
- We would want to group all 3 of these together; and mark the Data Lake version as the master; and all the others as copies.
When I navigate to any Sales assets - it tells me that the Client List is a data asset which is used
When i click on this - it tells me that the sales version is a copy - and directs me to the one on the data lake
NOTE: There are circumstances where a copy may be deliberately filtered or incomplete (for example - regional subsets of clients) - in this case the relationship needs to be "Partial Copy" not "Copy"