This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Virtually every client/project I work on, there is a nead to clean up data. Most of the time, that involved standardizing to some existing list of data. However, as we all know, data from differnet systems or being manually collected will not match perfectly in all cases. This is most often when I tend to use the Fuzzy Match tool.
However, I have to use a lot of weird steps to effectively create a "Fuzzy Join", which is something I've done using database functions in the past. I think it would be great if a new tool were created that would do the following:
Accept two inputs, one for the "raw" data and another for the "list" of data to match to.
Perform a fuzzy join based on similar functionality to the fuzzy match, convert data to metaphone keys and then run Jaro/Levenstein matches. By default, return only the highest matching result.
Expand the pre-process functionality to include words to exclude from the analysis (beyond just "and", "the" and "in").
Match on the whole string. No need to try and do joins based on partial words within a string.
This seems like a very common thing (I've created a macro for this anyway) that could be made to be simpler for everyday use.
Seems there is inconsistency across tool when it comes to being case sensitive.
One simple example might be - For Join it works based on case sensitivity, but for Formula (Multi Row) - it works based on case insensitivity.
The Idea is to make the functionality case sensitive for all tools.
1) When some code is written in Alteryx, it works same way when coming to case consistency, and does not differ tool-by-tool.
2) It is common practice to make use of functions (Uppercase/Lowercase) to ignore differences coming out of case sensitivity for items like joining with some standardized values (like List of Values etc), so it should not be much of an overhead. At times the case related differences might be something which needs to be treated differently.
3) If a tool ignores case sensitivity, there might not be an easy way (if not - no way available) to make it work for case sensitivity as per business requirement, but vice versa is attainable making use of Uppercase/Lowercase case conversion functions.
If you have a field length of say 10 in a Select Tool, then you use a Left Join tool and change that length to say 4. This turns that field red - as it should. Then add a Select tool after the Union. It should say 4 in the second Select tool. But instead it says 10. If it was changed to 10 (and it wasn't) then the field s/b red.
I am aware there are posts on this in the community and that there is a macro available to do this on the public gallery, however I think this is such important functionality that it should be incorporated into the main product. I want to be able to join 2 data sources by a date range. In SQL the code would look like this:
select ric.*,map.* from Staging.TicksHourlySummary ric LEFT OUTER JOIN Reference.ReutersInstrumentCodeMap map on ( ric.#RIC = map.ReutersInstrumentCode and ric.[Datetime] >= map.EffectiveFromDate and ric.[Datetime] < map.EffectiveToDate
While using the Join tool, I have ran across the following which I believe if included as part of Alteryx vanilla Join Tool version would be helpful -
1) Joining two data sets on Null values should be optional or should be removed. Generally Null means the value is not known so it seems like a logical error to treat two unknown values as same, unless specified otherwise.
2) Compress whitespaces, I have come across data sets coming from two entites which are all same except for the whitespace. So I think it would help to have an option wherein multiple whitespaces are compressed into one.
3) Case sensitivity/insensitivity - This is quite common for users to convert into upper case or lower case the columns on which Join condition are based. But IT developers end up coding more and at times creating new fields just for joining purpose.
4) Null matches non null - At times the requirement is such that if the join succeeds on a particular key column, null and not null values should be considered a match (but not two non equal non null values).
5) Removal of junk characters - There should be some functionality to remove junk characters from the columns on which joins are performed.
All/Any of the above points can be made available as an additional option in conjunction with the settings available today.
I'm curently creating an app using interface tools to control multiple worklflows. It would be nice if I didnt have to physically drag the interface tool to the recceiving node. For example, right now I can click on the Left node of a join tool and it gives me the option to make connections to that tool or out of that tool wireless. It would be nice if I could right click and have an option to select from a list of interface tool incoming connections.
Running into an issue when typing from keyboard to make a selection in some dropdown lists found in tool configurations. I've replicated it in the Join, Join Multiple, and Filter tools. (Sort tool worked as expected.) Running version 10.6.
Join tool configuration.
From "Left" dropdown start typing to select a column.
Click off the "Left" dropdown (into "Right" for example).
Notice that your selection in "Left" is maintained.
Now click on another tool or in the cavas.
Click back to the Join tool. Notice that the "Left" selection has reverted to the original field and your typed selection is lost.
Note: you can "commit" your selection by hitting "Enter" after typing, but I don't think this extra step should be required (as evidenced by the Sort tool).
This small mis-step can have SIGNIFICANT impact if a developer assumes their join field selection was set correctly, but in fact it changed without them noticing and is resulting in incorrect joins.
I propose an update to the Join Tool to allow users to select "Only right" or "only left" fields after a join. I like the options you have there currently (e.g., select highlighted fields), it would just be nice to add these options to it.
I came across the Find Replace Tool when I needed to find values from a column in one table in a column in another table. My first instance to solve the problem was to write a batch macro with a contains function in a formula followed by a not null filter (see attachment). This worked perfectly besides the fact that it was slow. Then I got excited when I discovered the Find Replace Tool accomplishes the same thing WAY faster, but I was wrong.
What I would love is the equivalent of an SQL query like this:
ON A.1 LIKE "%" || B.2 || "%"
which is a legal query in SQLite and is equal to the output of the attached macro. This is what I wish the Find Replace tool could do (Or a different tool), but it only finds one instance per "Find Within Field" value. The tools decision making doesn't line up with the decision-making that I need, for example it doesn't return the longest values found, instead the one with the first key to appear in the field. One way I've found to configure it better is to string a number of these together, that will give me a better result but still won't find every instance and uses 90 or so tools when I feel I should only need 1-3 to accomplish the same thing.
Instead of an Inner Join, the Find Replace is more like of Left Outer Join followed by a Unique() on A.1. Is there a way to accomplish this out-of-database in Alteryx?
When using the Levenshtein distance matching in the fuzzy match it gives you the option to select a match threshold % but does not allow for other options. Is it possible to provide these additional options in a future release?
I have long and large workflows, IMO, that are getting difficult to follow. I'd like the ability to highlight the joins and set specific colors or at the very least highlight and toggle on/off highlights. I'd also like to be able to move my joins and so they are not curving all over the canvas.
It would be great if the deselecting of fields in a select tool updated the output window(before next run) as a "review" to make sure you are removing what you expect and/or you can see other items left behind that should be removed. This would also be useful for seeing field names update as you organize and rename.
Often I join tables w/o pre-selecting the exact fields i want to pass and so I clean up at the end of the join. I know this is not the best way but a lot of times i need something downstream and have to basically walk through the whole process to move the data along.
The join tool right now does only inner joins, requiring the use of the union tool to emulate outer joins.
By itself it works fine, however when having workflows involving a large number of joins and unions with lots of data in it performance is severely impacted, leaving to situations where Alteryx Designer slows down massively even with extra CPU/RAM available.
Since the Geocoders create point data, having the spatial field recognized as a centroid can be helpful to the user. If it's marked as a centroid, it makes it easier for the user to differentiate between point vs polygon/line, especially if the user is already using polygon data in the module. If I do a Spatial Match or Join, I can have two fields called SpatialObj, which I have to track back to figure out which is which. If after combining, it showed Centroid for my geocoded points and SpatialObj for my polygon data. Kind of like you do for the Create Points tool, if I use that the field is called Centroid.
When you use a Find Replace tool, it would be GREAT if you had a select all option for what you want to append to the find data set.
It would be extra GREAT if you added the ability to draw a box over the ones you want to add, or have a feature where you could select the first on on the list and hold the shift+ctr to select multiple lines at once.