This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Have you ever used a Join tool with several (or many) Join fields, looked at the the L and R outputs and wondered, why didn't these records join? When there are many columns in your data, this can be a hard question to answer. It would be very handy if Alteryx could somehow report the Field(s) that each record failed to join on (perhaps as an optional added field to the L and R outputs).
I think it would be nice to be able to more easily reorder fields that you're joining by in the Join tool.
For example, I have already joined by CASS_Address and CASS_City. After I did this, I realized I wanted to go back and join on Name, too, and I want that to be first. How the tool is configured now, if I want Name to be first, I must redo all of the drop downs. I would like to be able to add Name to the next set of open drop downs then use some arrow buttons to be able to move them up in the order (similar to the Summarize tool).
I would like to be able to use the join tool to join on inequalities. We could join two tables, A and B on A.value is >= B.value1 AND A.value <= B.value2. This would replicate the "between" function in SQL. The equvalent feature in Tableau is pictured below.
Today, any Alteryx tool with "Select" functionality has an option for "Dynamic or Unknown Fields" which, when checked, allows any new fields to pass through that tool. This is a great function for most of the tools as you can allow workflow updates to pass through the tool without issue.
However, in the Join tool, there are some use cases where there is NEVER a reason to pass new fields from one side or the other into the tool, but you might still want new fields from a primary process. Examples being something like a lookup/cross-reference to do an inclusive join, where adding new fields to the lookup might inadvertently pass these downstream. Having the option to only allow unknown fields from one side through would greatly enhance this output.
Virtually every client/project I work on, there is a nead to clean up data. Most of the time, that involved standardizing to some existing list of data. However, as we all know, data from differnet systems or being manually collected will not match perfectly in all cases. This is most often when I tend to use the Fuzzy Match tool.
However, I have to use a lot of weird steps to effectively create a "Fuzzy Join", which is something I've done using database functions in the past. I think it would be great if a new tool were created that would do the following:
Accept two inputs, one for the "raw" data and another for the "list" of data to match to.
Perform a fuzzy join based on similar functionality to the fuzzy match, convert data to metaphone keys and then run Jaro/Levenstein matches. By default, return only the highest matching result.
Expand the pre-process functionality to include words to exclude from the analysis (beyond just "and", "the" and "in").
Match on the whole string. No need to try and do joins based on partial words within a string.
This seems like a very common thing (I've created a macro for this anyway) that could be made to be simpler for everyday use.
I am finding that I often need to use a union tool after a join to capture the entire data set I am trying to isolate. It isn't a huge deal but it does happen often enough that it seems worthwhile to consider more options to customize the join type. For example- it would be helpful to be able to configure the Join tool itself to specify a left inner or left outer join instead of needing to Union the L and J anchors afterward.
We build some pretty robust maps with multiple connections and it would be great to copy the map tool and paste it with all of the connections when we want to tweak the map slightly but keep our original map. It is a regular occurrence for us to have a very detailed map grouping by trade area name and then may want to have an overview map with all of the same connections but slightly different layout. Tracking down the connections, reconnecting them and naming them accordingly takes a substantial amount of time even in the most organized of workflows. This function would be a huge time-saver. It would also be of value with joins and unions - anywhere you have multiple streams coming in.
Back in the days of Visual Basic 6, and Delphi 3 - the development environment for software coding started introducing the idea of Optimization hints - specifically looking at things like variables you'd declared but never used.
All of the Alteryx tips guides say "as early as possible use a Select tool to remove fields that are not used in the workflow" - it would be very useful if the Alteryx system did a reverse walk through the workflow (from outputs back to inputs) and suggested fields that were not used anywhere or components that could / should be removed because they don't go anywhere.
Unused Components: if I have a filter with a summarize on both legs, and only one leg goes to an output - then the summarize on the other leg is just a waste of processor and memory.
Unused Fields: If I've been lazy and brought in all the fields in a spreadsheet and only used two of them through a complex and heavy workflow - then suggest that I only bring in 2; or that I add a select component for those two fields early on
Ineffective data types:
Numbers stored in a double that could be stored in an int16;
numbers stored in strings;
strings stored in very large containers which only need 10 characters;
dates stored in strings
Using a field for "True" and "False" instead of the Boolean True and False (1 and 0)
By adding a reverse-walk to a workflow (sort of like an A-B-Tree prune on a game-tree) - Alteryx could spot many of these kinds of issues - and by observing the workflow in action over a few iterations, it could identify the remainder.
When bringing data together it is often needed to assign a source to the data. Generally this happens when you union data and need to know things later about the data for context. It would save time to generate a source field that is assigned based upon the input connections of the union tool. Perhaps when unioning data you can assign a name to each input stream?
I've come to realize that the JOIN tool is case-sensitive by design but it would be helpful if you could turn that behavior on/off (via checkbox?) within the JOIN tool. For those of us that work predominantly in database environments that are not case-sensitive, this default behavior has caused me problems many times. Having to force the case to either upper or lower upstream of the JOIN on both flows in order to ensure a successful join is an extra step that would not be necessary if you could disable case-sensitive with a checkbox.
In other data programs like access or Toad you can put conditions on a join. You can choose if something is greater than, less than, left join, right join, etc. But with Alteryx you are only allowed to join a perfect match. It would be really great if you could add that functionality into 9.0
Seems there is inconsistency across tool when it comes to being case sensitive.
One simple example might be - For Join it works based on case sensitivity, but for Formula (Multi Row) - it works based on case insensitivity.
The Idea is to make the functionality case sensitive for all tools.
1) When some code is written in Alteryx, it works same way when coming to case consistency, and does not differ tool-by-tool.
2) It is common practice to make use of functions (Uppercase/Lowercase) to ignore differences coming out of case sensitivity for items like joining with some standardized values (like List of Values etc), so it should not be much of an overhead. At times the case related differences might be something which needs to be treated differently.
3) If a tool ignores case sensitivity, there might not be an easy way (if not - no way available) to make it work for case sensitivity as per business requirement, but vice versa is attainable making use of Uppercase/Lowercase case conversion functions.
I haven't been able to find a similar idea, but I have a scenario were I'm fully joining a dataset but I get a mismatch of column names.
When using the column renaming functionality in the Join tool, data sent through the Left or Right connector is not renamed. When I'm performing a Full Join the Union tool is adding these columns to the resulting table.
The renaming column functionality of the Join Tool should apply to all join types for consistency. My workaround has been to avoid renaming columns in the Join tool if aiming to perform a multi-join.
I am aware there are posts on this in the community and that there is a macro available to do this on the public gallery, however I think this is such important functionality that it should be incorporated into the main product. I want to be able to join 2 data sources by a date range. In SQL the code would look like this:
select ric.*,map.* from Staging.TicksHourlySummary ric LEFT OUTER JOIN Reference.ReutersInstrumentCodeMap map on ( ric.#RIC = map.ReutersInstrumentCode and ric.[Datetime] >= map.EffectiveFromDate and ric.[Datetime] < map.EffectiveToDate