Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Ideas

Share your Designer Desktop product ideas - we're listening!
Submitting an Idea?

Be sure to review our Idea Submission Guidelines for more information!

Submission Guidelines

Case insensitive join

Hey all,

 

The join tool currently does not allow case-insensitive joins, but the find/replace tool does.    Additionally- even if both sides are identical, the join tool will not join "Sean's house" to "Sean's house" because of the non-letter character in the middle.    Finally - if one side is a string(2), and the other is a vString(200) - even if you have a single identical character on both sides you get uncertain outcomes unless you force the type

 

Please could you consider amending the join tool to include 3 new options or capabilities:

- Case insensitive join

- Allow full Unicode character set in join

- Full match across text types (irrespective of string size) - this would allow a string(2) value to match to a string(100) value as long as the string(100) value only has the same 2 characters in it as the string(2) value

 

That would remove a load of work from every text-join that's being done on every canvas we do.

 

Thank you 

Sean

 

 

25 Comments
Rohit_Bajaj
9 - Comet

Hi,

 

Seems there is inconsistency across tool when it comes to being case sensitive.

 

One simple example might be - For Join it works based on case sensitivity, but for Formula (Multi Row) - it works based on case insensitivity.

 

The Idea is to make the functionality case sensitive for all tools.

 

Benefits -

 

1) When some code is written in Alteryx, it works same way when coming to case consistency, and does not differ tool-by-tool.

2) It is common practice to make use of functions (Uppercase/Lowercase) to ignore differences coming out of case sensitivity for items like joining with some standardized values (like List of Values etc), so it should not be much of an overhead. At times the case related differences might be something which needs to be treated differently.

3) If a tool ignores case sensitivity, there might not be an easy way (if not - no way available) to make it work for case sensitivity as per business requirement, but vice versa is attainable making use of Uppercase/Lowercase case conversion functions.

 

Thanks,

Rohit Bajaj

honytoad
5 - Atom

Hey Rohit,

 

I definitely agree that there should be consistency across tools and forumlae. It came as a surprise to me that the equals sign isn't actually case sensitive, since it's fairly common to use uppercase/lowercase functions to iron out differences in case as you say.

Failing consistency, then there needs to be more detailed documentation from Alteryx so that there aren't any nasty surprises.

 

I've included an example and workaround for checking the case of a string in this post.

 

Tony

Ned
Alteryx Alumni (Retired)

The case insensitive join is a valid request.  Right now to do that you need to place a formula tool upstream of both sides and LowerCase() both values.  I agree that it is a little cumbersome.

 

As far as the others issues you are reporting, I am not sure what you are seeing.  A string(2) and a V_WString(1000) will join if they have the same values.  The join will pay attention to the full unicode character set - if 2 strings are identical including all unicode code points.  "Sean's house" will happily join to "Sean's house".

 

I just tested all these to make sure.  If you have a workflow that shows otherwise, you should post it so we can investigate as a bug.

SeanAdams
17 - Castor
17 - Castor

Hi Ned,

 

Thank you for your reply - the fact that someone as senior as you pays attention to the forums despite your punishing responsibilities as CTO is very impressive.

 

I'll send you an example by PM where the string size seems to impact the join - not sure how to attach it to this communication thread (there doesn't seem to be an ability to attach here.

 

For the join across special characters - where we bumped into this is:

- download a file daily with company names (CSV)  - some contain ü or β

- we then write them into a dimension table in a database (MS SQL Server 12) - only if they don't already exist.    To do this we do a join to the data from this table and take only the non-joining rows and append them to the table

- However we get a lot of errors on a unique constraint on company name when doing this append because the rows with ü or β are already in the table but failed into join.

 

I've also tested this myself using two text-inputs and get a good join, but when doing this as listed above we get a join failure.     I guess this may be related to differences in the way that the CSV processor in Alteryx handles Unicode encoding, compared to the SQL server (or we may have written it into a non-Unicode field in the SQL server) so it may be worth checking this out across data-source types.

 

Thanks again Ned - will send the example to you by PM

Sean

Ned
Alteryx Alumni (Retired)

For anyone else finding this thread:  Sean & I have been communicating.  I have confirmed a bug in the RegEx tool in tokenize mode that caused the incorrect join. We will have it fixed by the next release.

BenG
Alteryx Alumni (Retired)
Status changed to: Coming Soon
 
SeanAdams
17 - Castor
17 - Castor

Hey Rohit ( @Rohit_Bajaj)

It looks like you and I have accidentally posted the same idea - the good news is that the CTO jumped on this and it's now marked as coming soon.

https://community.alteryx.com/t5/Alteryx-Product-Ideas/Case-insensitive-join/idi-p/54124

 

Not sure if that only relates to the defect that we found related to string length, or if it also relates to the case insensitive join, but glad that you and I are thinking about the same kind of problems.

 

I've starred your idea - it seems that the ideas with most stars get most attention :-)

Cheers

Sean

rho
6 - Meteoroid

This has been a big pain for us as well; we've been upper casing on both input streams. Glad to see the case sensitivity will be coming soon.

 

If I could add a couple more ideas: a very common join we do is an 'update' similar to SQL where the left input is updated by an external 'key' as the right input. This can be imagined as similar to a vlookup in excel. One of the issues we face is duplication in Keys (explained below). It would be handy if the join tool could toggle between the standard join functionality in SQL to the 'Update' functionality wherein rows are not duplicated in the left input if they repeat in the right.

 

For instance, when the case sensitivity is removed:

 

Left Input

FruitColour
AppleRed
BananaYellow
PearGreen


Joining with Right Input:

FruitOrigin
AppleFiji
BANANAColombia
BananaColombia
Pear

South Africa

 

Will result in an Inner Join of:

FruitColourRight_FruitOrigin
AppleRedAppleFiji
BananaYellowBANANAColombia
BananaYellowBananaColombia
PearGreenPearSouth Africa

 

Edit: <However what we need is similar to SQL where the Update Join would just give:

FruitColourRight_FruitOrigin
AppleRedAppleFiji
BananaYellowBANANAColombia
PearGreenPearSouth Africa

>

Another very handy add on in the join will be the option to fuzzy join since inexact matches are very common. The match threshold of 100 can be default and can resemble the existing join, but if we toggle this downwards we will be able to work with fuzzy joins. I've tried the fuzzy join macro and while it works well it doesn't have the same level of functionality as the base join tool.

 

We've been overcoming some of these issues by using multiple tools (for instance uppercasing/trimming/unique etc.) but the sheer number of joins we need to create makes this a tedious and often hard to audit process. 

 

Thanks

R

pgensler
8 - Asteroid

This is definitely a pain point for me, as I was running into issues trying to convert strings into all camel case as such for joins:

 

big top-dog

 

Using the cleanse macro in v11.0, as well as the Proper function both return the value as Big Top-dog, when I would expect Big Top-Dog Is this a bug due to the hyphen treating the second part of the string as its own string, or should it truly be making all fields Camel Case/Proper Case? Thanks.

quilemanolotete
5 - Atom

Hello Rohit, with regards to the case sensitive values, If I run a query using a specific case sensitive word, is Alteryx capable to differentiate and select the values anyways. For example if the word is "Bicycle" but the data contains the word "bicycle", is the query going to select the value regardless of the difference in letters? Thank you!!!