In a large environment- especially an analytical environment - copies of data will often appear in multiple places. an example of this is where a copy of a shared dimension or a shared piece of reference data is copied in multiple different data marts.
In order to manage this - we need to be able to mark these as copies of each other so that we can point folk to the golden-source; and so that we don't need to document this asset multiple different times.
Example:
- Client List appears in the data lake; on the Sales data mart; on the Finance data mart; etc
- We would want to group all 3 of these together; and mark the Data Lake version as the master; and all the others as copies.
User experience:
- When I navigate to any Sales assets - it tells me that the Client List is a data asset which is used
- When i click on this - it tells me that the sales version is a copy - and directs me to the one on the data lake
NOTE: There are circumstances where a copy may be deliberately filtered or incomplete (for example - regional subsets of clients) - in this case the relationship needs to be "Partial Copy" not "Copy"
CC: @DavidM @Arianna_Fuller