How to remove repeating character from a string
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I have test cases to mark which string is invalid or valid. I used below in formula tool
IF contains([Seller City], " Blank") then "Seller City is Blank"
ELSEIF (REGEX_Match([Seller City],"[a-zA-Z\-\.\~]+"))
then "Seller Country Valid"
ELSE "Seller Country Invalid"
ENDIF
After using above formula, It is showing invalid for last one ABC DER, The output file after using formula is attached.
Please help me if there is any function or any way to achieve this
Below is the expected result
A | valid |
AA | Invalid |
AAA | Invalid |
A AA | Invalid |
A AA AAA | Invalid |
A AA AAA AAAA | Invalid |
A12A | Invalid |
1234 | Invalid |
A~ | Invalid |
ÃB | Valid |
$#%ABC | Invalid |
- | Invalid |
(!@#$%^&*(?) – | Invalid |
A-B-C | Valid |
A.B.C | Valid |
A@B | Invalid |
A#B | Invalid |
A$B | Invalid |
A%B | Invalid |
A^B | Invalid |
@ | Invalid |
# | Invalid |
$ | Invalid |
% | Invalid |
^ | Invalid |
Blank | |
ABCD | Valid |
ABC DER | Valid |
Thanks
Biswajit
Solved! Go to Solution.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks @mceleavey . Just brilliant :)
Its working fine.
Whats the meaning of "\u[B]|\u\u\u\u|\u\u\u\s\u+|\u.\u.\u")) ?
Where can I get these kind of learning knowledge. and Could you please suggest me?
Thanks
Biswajit
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @biswajit86 ,
it's not the most elegant of solutions, but it does the trick!
I'm not sure it will work in all cases as it is just built using the "Test Cases" tab data, so you might need to tweak it as other use cases come in.
Basically the meaning is as follows:
\u is any uppercase character (be sure to use the cleanse tool to change everything to uppercase), | is "or", \s is space, + means any amount, so if there are more than one for example.
So, this string is basically saying match anything that fits these formats.
A great place to start is the help documentation on the Regex tool which you can find here:
https://help.alteryx.com/2019.1/RegEx.htm
I hope this helps!
M.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @mceleavey ,
Need your help again.After test case when I ran for actual data with city name. All records almost came as invalid but it should be valid like below. When I applied the logic its coming as Invalid. Could you please check. Attaching the sheet (Test Cases Tab)for the expected result sheet.
SHANGHAI | Valid |
NANTONG | Valid |
GUANGZHOU | Valid |
MEIZHOU | Valid |
MELBOURNE | Valid |
SYDNEY | Valid |
COLLINGWOOD | Valid |
DAISY HILL | Valid |
SYDNEY | Valid |
MELBOURNE | Valid |
HAWTHORN | Valid |
PRAHRAN | Valid |
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @biswajit86 ,
this is because we were never given the real data and so it's very difficult to build without knowing what we're trying to match.
Can you provide a list of the cities with plenty of examples of valid and invalid according to the rules provided. Only then can we really build anything.
Thanks,
M.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @mceleavey ,
The test cases I sent you should be the check to remove junk and special character.But in most real city name where there is no junk its coming as Invalid.
I have attached the sample for name of city list (field: sell_city_nm).
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @biswajit86 ,
Give the attached workflow a try.
I think the rules you gave previously are incorrect as they say any single letter and double letter is invalid, which is wrong:
but if they are just a mis-type then that's fine.
I've removed all records where there are characters repeated more than twice, and anything that contains anything other than letters and "-" and "."
If there are any errors, let me know.
M.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @mceleavey
What I have knew is a city name can't be only single or double character e.g A or AA So invalid. can we check invalid in this way?
Only one repeating character like AAA or AA AAA.. it can't be a city name.SO it is invalid
If repeating character are in a word ,then I think this is fine.like COLLINGWOOD. here 'OO' is repeating but it is valid.
Thanks
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @mceleavey ,
Apart from last mail.Below are some sample observations after checking the solution.yxmd workflow which is not true.The expected out is different.
Values | Output | Expected Output |
A@B | Valid | Invalid |
A#B | Valid | Invalid |
A$B | Valid | Invalid |
A%B | Valid | Invalid |
A^B | Valid | Invalid |
“-“ | Valid | Invalid |
“.†| Valid | Invalid |
NANTONG | Invalid | Valid |
COLLINGWOOD | Invalid | Valid |
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @biswajit86 ,
this has got you far along enoughas far as the workflow goes. In the section that removes the characters, simply add the @#$% etc that weren't included in the rules you provided to the formula.
The last two seem to be repeated characters over three times that are not together, so amend the multi row formula accordingly.
Almost there!
M.
- « Previous
-
- 1
- 2
- Next »