Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

How to remove repeating character from a string

biswajit86
7 - Meteor

I have test cases to mark which string is invalid or valid. I used below in formula tool 

IF contains([Seller City], " Blank") then "Seller City is Blank"
ELSEIF (REGEX_Match([Seller City],"[a-zA-Z\-\.\~]+"))
then "Seller Country Valid"
ELSE "Seller Country Invalid"
ENDIF

After using above formula, It is showing invalid for  last one ABC DER, The output file after using formula is attached.

Please help me if there is any function or any way to achieve this

Below is the expected result

Avalid
AAInvalid
AAAInvalid
A AAInvalid
A AA AAAInvalid
A AA AAA AAAAInvalid
A12AInvalid
1234Invalid
A~Invalid
ÃBValid
$#%ABCInvalid
-Invalid
(!@#$%^&*(?) –Invalid
A-B-CValid
A.B.CValid
A@BInvalid
A#BInvalid
A$BInvalid
A%BInvalid
A^BInvalid
@Invalid
#Invalid
$Invalid
%Invalid
^Invalid
 Blank
ABCDValid
ABC DERValid

 

Thanks

Biswajit

18 REPLIES 18
biswajit86
7 - Meteor

Thanks @mceleavey . Just brilliant   :)

Its working fine.

Whats the meaning of  "\u[B]|\u\u\u\u|\u\u\u\s\u+|\u.\u.\u")) ?

 

Where can I get these kind of learning knowledge. and Could you please suggest me?

 

Thanks

Biswajit

 

mceleavey
17 - Castor
17 - Castor

Hi @biswajit86 ,

 

it's not the most elegant of solutions, but it does the trick!

I'm not sure it will work in all cases as it is just built using the "Test Cases" tab data, so you might need to tweak it as other use cases come in.

Basically the meaning is as follows:

\u is any uppercase character (be sure to use the cleanse tool to change everything to uppercase), | is "or", \s is space, + means any amount, so if there are more than one for example.

 

So, this string is basically saying match anything that fits these formats.

 

A great place to start is the help documentation on the Regex tool which you can find here:

https://help.alteryx.com/2019.1/RegEx.htm

 

I hope this helps!

 

M.

 



Bulien

biswajit86
7 - Meteor

Hi @mceleavey ,

 

Need your help again.After test case when I ran for actual data with city name. All records almost came as invalid but it should be valid like below. When I applied the logic its coming as Invalid. Could you please check. Attaching the sheet (Test Cases Tab)for the expected result sheet.

 

SHANGHAIValid
NANTONGValid
GUANGZHOUValid
MEIZHOUValid
MELBOURNEValid
SYDNEYValid
COLLINGWOODValid
DAISY HILLValid
SYDNEYValid
MELBOURNEValid
HAWTHORNValid
PRAHRANValid

 

mceleavey
17 - Castor
17 - Castor

Hi @biswajit86 ,

 

this is because we were never given the real data and so it's very difficult to build without knowing what we're trying to match.

Can you provide a list of the cities with plenty of examples of valid and invalid according to the rules provided. Only then can we really build anything.

 

Thanks,

 

M.



Bulien

biswajit86
7 - Meteor

Hi @mceleavey ,

 

The test cases I sent you should be the check to remove junk and special character.But in most real city name where there is no junk its coming as Invalid.

I have attached the sample  for name of city list (field: sell_city_nm). 

mceleavey
17 - Castor
17 - Castor

Hi @biswajit86 ,

 

Give the attached workflow a try.

I think the rules you gave previously are incorrect as they say any single letter and double letter is invalid, which is wrong:

 

incorrect rules.PNG

 

but if they are just a mis-type then that's fine.

 

I've removed all records where there are characters repeated more than twice, and anything that contains anything other than letters and "-" and "."

If there are any errors, let me know.

 

M.

 

 

 

 



Bulien

biswajit86
7 - Meteor

Hi @mceleavey 

 

What I have knew is a city name can't be only single or double character  e.g A or AA So invalid. can we check invalid in this way?

Only one repeating character like AAA or AA AAA.. it can't be a city name.SO it is invalid

If repeating character are in a word ,then  I think this is fine.like COLLINGWOOD. here 'OO' is repeating but it is valid.

 

Thanks

biswajit86
7 - Meteor

Hi @mceleavey ,

 

Apart from last mail.Below are some  sample observations after checking the solution.yxmd workflow which is not true.The expected out is different.

 

ValuesOutputExpected Output
A@BValidInvalid
A#BValidInvalid
A$BValidInvalid
A%BValidInvalid
A^BValidInvalid
“-“ValidInvalid
“.”ValidInvalid
NANTONGInvalidValid
COLLINGWOODInvalidValid
mceleavey
17 - Castor
17 - Castor

Hi @biswajit86 ,

 

this has got you far along enoughas far as the workflow goes. In the section that removes the characters, simply add the @#$% etc that weren't included in the rules you provided to the formula.

The last two seem to be repeated characters over three times that are not together, so amend the multi row formula accordingly.

 

Almost there!

 

M.

 



Bulien

Labels
Top Solution Authors