This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Would like a component to analyse an incoming dataset and suggest a key for the data, i.e. detect what field or composite of fields would uniquely identify a record in the data. The key could then be detected by the output data component and add primary key's to tables when created. Great for when using the drop and recreate option, i.e. would retain an index on the key.
Agree with your sentiment @spainn; but this does present some interesting problems - and there's a very easy round this.
Easy way round: Add a record ID
- Computational cost: 1 run through the data. for 1M rows, takes 1M updates. by definition, this is O(n)
Complexities of the "identifyUniqueCombination".
- For keys of 1 field - relatively easy, just count unique records in each field, and any field that is unique is a candidate. For 10M rows, this could require substantially more than 10M computations per column (because each value potentially needs to be compared to each one seen previously, like a sort. So, this would be O(n Log n) for each column. For all columns, this would be O(m * n * log n)
- for keys of 2 fields, you then need to additionally do the same for each combination of 2 fields
- for keys of n fields, you would need to look for unique combinaitons of all combinations of m fields - so it would be O (m! * n * log n). In other words - incredibly computationally painful.
Computational summary: This kind of problem is called NP Complete - and is similar to a classic problem called Travelling Salesman which can be demonstrated mathematically to not be solvable in anything other than exponential time (i.e. it doesn't scale at all).
Practicality summary: there are cases where the only key that is guaranteed to be unique is the combination of all fields on the data - which does not make for a usable primary key. A good primary key should generally be 1 field, at most 2 (to allow database servers to effectively make use of this in queries).
So - all in all, adding a unique row ID (or an IDENTITY column at table level) gives a better outcome usually; and also is incredibly computationally cheap - so it may be worth using this in your flow?
The status of this idea has been changed to 'Inactive'. This status indicates that:
1. The idea has not had activity in the form of likes or comments in over a year.
2. The idea has not reached ten likes.
3. The idea is still in the 'New Idea' status.
However, this doesn't mean your idea won't be implemented! The Community can still like and comment on this idea. With enough renewed interest, this idea can be brought back into the 'New Idea' status.
Thank you for contributing to the Alteryx Community and the Alteryx Product Idea Boards!
You must be a registered user to add a comment. If you've already registered, sign in. Otherwise, register and sign in.