Hi there,
I am using the alteryx loader as a base for a custom loader.
I am trying to make a relationship to a custom file i created on connect but i am not able to achieve this, see process below:
1. Created file named 'Kafka_Topics', connect folder path is Data Sources/files/kafka/ , formatted as a file as is parent folder.
2. Using the AYX_WF_FS staging table in my workflow i am inputting the below:
USAGE_TYPE | SYSTEM_NAME | SYSTEM_ENVIRONMENT | NAME | PATH | PARENT | PATH_HASH | TABLE_NAME | TABLE_PATH_HASH | TYPE | WORKFLOW_ID | LOAD_CODE |
2 | Files | Kafka | Kafka_Topics | files/kafka/kafka_topics | kafka | F | DATABRICKS_2598393148941662 | DATABRICKS CUSTOM LOADER |
i have also tried a few different permutations but not really sure as to which should be working.
3. I check the workflow that should be outputting to this file, and whilst the details in the table above are shown there is no relationship made.
My queries are:
1. Is this approach feasible ? or is there a better way?
2. If feasible what variables should i be inputting into the AYX_WF_FS staging table to get the connection made?
Any ideas or suggestions would be appreciated,
Cheers,
Harry
Hi @HarryM123 , out of curiosity - is it possible that you are trying to load metadata from Databricks? If not, could you share what data source are you building the custom loader for? Maybe we have a better option to start with than ayx loader.
Thx
Hey @VojtechT , i am using the databricks rest API as an input. All notebooks are being downloaded and parsed to then isolate the read/write commands. I then join these commands back to other notebooks to find the paths and names of the objects the notebooks reads/writes to.
The rest API is first queried for directory paths using an iterative macro, these directories are then fed into another API call which downloads the notebooks text is a subsequent iterative macro. The result is the complete download of all notebooks in the specified directory.
This seems to work well for the time being, I have successfully created relationships from these to other data sources in the database/database servers locations its just when trying to create relationships to objects stored in the files location i am running into the above issue.
I thought that the Alteryx loader would provide the best functionality for this use case as i am able to create different subsections for each category of the notebooks. If there is a more suitable loader/method i would be keen to hear your thoughts on it !
Cheers,
Harry
@HarryM123 , we actually have a Beta version of Databricks metadata loader already, just looking for someone with a real instance who could verify it works also in "real environment", not just our artificial one. And also to confirm all the main use cases are covered. However, the loader requires new Connect as well. Would you be willing to give it a shot and install a Beta Connect and run the Beta Databricks loader?
@VojtechT i may be able to help with this, as my current databricks instance is on a clients real environment i would not be able to do this myself. I can send to another colleague who could test on their server instance. It would be interesting to see the approach taken and how it pulls the data through to connect.
As i would not be able to deploy it on my clients work, do you think the above approach i have taken would give me the desired results (linking of my databricks workflow to a file source) ?
Is there something i am doing wrong in the configuration that is not making the relationship? Do you know what i should be inputting into the AYX_WF_FS staging table to get the connection made?
I have successfully loaded all metadata into Alteryx Connect from Azure Databricks in our client's enterprise production environment with the Beta Databricks Metadata loader. While trying to load our 2nd Azure Databricks site metadata into Alteryx Connect, the loader will not load the metadata into Alteryx Connect. After a few minutes, the Analytic App disappears from the Designer canvas without error. The Databricks cluster starts, but the Analytic App from the Databricks loader errors out before any metadata is downloaded. Does Alteryx Connect 2022.1 support loading multiple Azure Databricks site data?
UPDATE: I could finally load the 2nd Databricks data into Alteryx Connect with the 2022.1 loader by using the entire URL of the hive_metastore catalog in Azure Databricks within the debugged workflow. The debug failed due to the "fullpath" being necessary for the Azure Databricks catalog, which is azuredatabricks.net/explore/data/hive_metastore. It would not work within the Analytic App. I saved the workflow, published it to Gallery, and the metadata will harvest on its schedule outside ConnectScheduler.
This is odd because the first Databricks data loaded with only the DSN. Thanks for the messages about this!