Showing results for 
Search instead for 
Did you mean: 

alteryx connect Discussions

Find answers, ask questions, and share expertise about Alteryx Connect.

Harvester / Scanner is not registering data sets


We are seeing a strange behaviour where the scanner / harvester registers the data sources; tables; databases across our Alteryx Canvasses (you can see this when you look at the detail of a canvas) - however it doesn't create these entries in the "DataSets" section of Connect.


Our expected behaviour is:

- We scan an Alteryx canvas which uses 1 file and 1 database connection

- We should then be able to see 1 alteryx canvas under this section in Connect - and we should see 1 database and 1 file under the datasets section.


However what is happening is that the canvas is scanned - it correctly identifies the data sets on the canvas details page, but DOES NOT create the entries under the Datasets section.


This may be something that can be configured in the scanners - but it would be good for these scanners to work this way by default.



cc:  @Svetlana @AshwiniChezhiyan


@SeanAdams Sorry I just want to clarify this. Are you scanning the workflow AND scanning the location of the data set and then the data set is not showing up?




Hey @Treyson

When we scan the workflow - if you look at the workflow, you can see that the workflow knows the databases and tables that it's hitting.

However these are not added to the DataSets section of connect until you then scan the database itself.


This seems to be very inefficient because the Tableau workbook and the Alteryx Canvas both know their databases - and you can see this on the asset when you have completed a scan of just the workbook / canvas - but we can't seem to get these to add automatically to the datasets section.


Let me know if I need to mock this up with screenshots?


Hey @SeanAdams,

this is how the current architecture works with harvested objects:

when you harvest (extract metadata) from gallery you can get list of workflows and on each workflow you can get details such as used data sources, you have many options in workflow how to address database table (by DNS alias, by full database name, with or without schema name etc.). So if we were creating "DataSets" inside the Datasource folder it would potentinally end up in many duplicities and inconsistencies in the naming convetions. Also the "exploration" feature will only shows used objects in the workflows, not full object list from that datasource. So in our architecture we are just keeping info that such workflow is using the table (with some other identification such as technology type, server name ...) and we are trying to much with algorithm to already existing object (harvested from the technology specific harvester e.g. oracle loader).