This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Connect currently provides lineage for a datasource at the table level but we would like to be able to have this at a more granular level i.e. the column level so if a specific column changes in the table (e.g. data type change), we are notified of the change and able to identify quickly assets that will be affected.
Connect applies a standard set of weightings to different categories of information (people, terms etc.) when returning search results. When combined with likes/dislikes, these determine the order in which results are returned - details below:
Alteryx Connect uses the following scoring parameters for the Lucene engine:
Likes and Dislikes, using the following formula: (Number of Likes) / (Number of Likes + Number of Dislikes).
Certified assets: +1.2
Report sheet: +1.8
Alteryx workflow: +1.5
It would be useful to have control over this weighting, e.g. when you have large numbers of Person records being returned before Terms; but advice from Customer Support has been that these are not currently customisable. I'd like to request that this ability be considered for inclusion in a future release of Connect.
The email notifications that are received are not easily marked or flagged/prefixed as coming from Alteryx. I expect that this may be on purpose so that data owners can't "hide" from changes/comments on their areas of expertise, but from a time management point of view, it would be good to be able to flag them in some way so that a rule can be run on them in the users inbox, and they can batch up their reviews/answers.
If you remove the configuration of the SMTP server while you are in development, (the current solution to spamming users with notifications during the setup period of the data catalog) then put the SMTP server configuration back in when you are finished setup, all the changes that have happened in the interim will not be sent as a bulk send. This a risk when an email server is unavailable for a period of time that notifications for changes/comments/tasks will be missed.
Once the ability to switch notifications on and off is implemented, I think it would be worth reviewing how feasible this is.
for the past 8 months I have been using alteryx and mostly working with the connect in db components , there are many issues which I am facing and i this this can be improved
1. there is no such flexibility of creating a table with the keys defined which is the most important pillar in database, also the options provided are limited , i.e. to create a new table , delete and append, drop table and recreate,
now, there are many times where in we need to update the tables based on the keys, which i find missing. Also how the option is defined is create a new table, next time if the job is run it states that the table is already created, for which we need to manually change the option in the next run.
2.which switching between in db and alteryx , if the records are more the alteryx lags completely and the job keeps running for hours , how can we achieve the flexibility of alteryx designer if there is such bottleneck.
3.the flexibility that is provided with alteryx designer should also be given to in db components.
4.the parameters defined in the workflow can not be accessed in the in db formula tools but can be used in the designer formula tools. this reduces the flexibility.
When Alteryx connect is first installed to a company with a small alteryx designer base, you do not benefit from lineage.
There are not much workflows at hand. So in order to realize Alteryx connect's immediate benefits I'd like to suggest;
a company-wide Data Quality Score.
Let's score each data element in distributed data stores
And automatically give a simple scale between one and five
1 equals to, “we don’t know”
2 data is entered or updated prior to 1 year, has conflicting data
3 would be the norm and means customer provided this data, as accurate and as up-to-date as they have entered it and ‘agreed’ to share with you.
4 means we cross checked the data with 3rd party sources or the addresses work in Google Maps”.
5 equals to “we had the customer or the representative validated the address in last 3 months”.
The scale will be based on;
Information value (variance is high or not, if there is no variance no info useful thru the column)
How many times that column is addressed in other tables
Format (structured like a telephone number ###-##-## or semi structured like an address)
Is it an ID column
Is it a Datetime column, any discrepancies in date time columns etc.
Time since last update of data
Once we have some lineage information than we'll weight th data based on how frequently it's needed, how many formulas are requiring the field etc.
And as soon as we install connect we'll have a grand vision of our data and even we'll be able to track the status of our whole distributed data assets with a trend line if we are going better or worse... Here is an example;