Since Hive as technology doesn't support to get metadata tags you need to take example from different loader and base on that customize your Hive loader.
Nice example would be Tableau Loader, see container "Union all tags"
Parent_XID is Connect XID (unique indentification of asset in Connect on which label is being applied, such ID you can find on Asset URL in Connect), and TagValue is name of the tag.
Applying similar logic into different loader should work for you.
I have added logic to add tags to connect entities in the Hive metadata loader with reference to Tableau loader. In the Hive loader, got the value of TagValue and Load_Code. Can you please help me to find out the value of PARENT_XID? I have checked Asset URL in Connect but how to use that value into hive loader workflow?
hi @mohitjoshi ,
PARENT_XID is the unique id of the object you would like to add tag to.
In Tableau it's easy since we are using their UUIDs so it's coming with the record so you can use it.
In Hive (and other database objects) you don't have such persistent ID, so we are using FULLY QUALIFFIED object name in following format: [CATALOG_ENGINE]+"/"+[CATALOG_ENVIRONMENT]+"/"+[CATALOG_NAME]+"/"+[SCHEMA_NAME]+"/"+[TABLE_NAME]
and doing base64 encoding.
You can use fragment from our different loader to do so in your Hive loader. Pls go to "Azure Data Catalog Loader" and see the section Tables & files tags, there is formula tool which takes fully qualiffied object name and passes it into macro "Create XID", you should use the same inside your Hive loader.
After testing, to load the tag into Connect, we got to know that the LOADED_TAGS table works on truncate and add basis. It deletes all the existing tags and loads of new tags. Can we change the behavior of the table to work as an append basis?
hi @mohitjoshi ,
this is exactly how it works we are truncating LOADED_TAGS table for one specific source, but keeping other sources there.
Imagine you are loading TAGS from three different systems (S3, Tableau and Hive) each system will be uniquely identified (LOAD_CODE column). So once you are loading Tableau for the second time you are removing all previously loaded tags from Tableau only (identified by LOAD_CODE) and loading currently existing tags on Tableau. In this case you are able to remove old tags which are not existing anymore on Tableau.
Then inside Connect we are identifying only changes in all LOADED_TAGS and once there is needed update : tag was created on any of source systems so we are creating new tag in Connect, or has been removed and no longer exists on any ofthe source system we are removing it from Connect).
This is how such LOADED_TAGS table looks like once you have more source systems loaded:
Hope this helps,