I have test cases to mark which string is invalid or valid. I used below in formula tool
IF contains([Seller City], " Blank") then "Seller City is Blank"
ELSEIF (REGEX_Match([Seller City],"[a-zA-Z\-\.\~]+"))
then "Seller Country Valid"
ELSE "Seller Country Invalid"
ENDIF
After using above formula, It is showing invalid for last one ABC DER, The output file after using formula is attached.
Please help me if there is any function or any way to achieve this
Below is the expected result
A | valid |
AA | Invalid |
AAA | Invalid |
A AA | Invalid |
A AA AAA | Invalid |
A AA AAA AAAA | Invalid |
A12A | Invalid |
1234 | Invalid |
A~ | Invalid |
ÃB | Valid |
$#%ABC | Invalid |
- | Invalid |
(!@#$%^&*(?) – | Invalid |
A-B-C | Valid |
A.B.C | Valid |
A@B | Invalid |
A#B | Invalid |
A$B | Invalid |
A%B | Invalid |
A^B | Invalid |
@ | Invalid |
# | Invalid |
$ | Invalid |
% | Invalid |
^ | Invalid |
Blank | |
ABCD | Valid |
ABC DER | Valid |
Thanks
Biswajit
Solved! Go to Solution.
Thanks @mceleavey . Just brilliant :)
Its working fine.
Whats the meaning of "\u[B]|\u\u\u\u|\u\u\u\s\u+|\u.\u.\u")) ?
Where can I get these kind of learning knowledge. and Could you please suggest me?
Thanks
Biswajit
Hi @biswajit86 ,
it's not the most elegant of solutions, but it does the trick!
I'm not sure it will work in all cases as it is just built using the "Test Cases" tab data, so you might need to tweak it as other use cases come in.
Basically the meaning is as follows:
\u is any uppercase character (be sure to use the cleanse tool to change everything to uppercase), | is "or", \s is space, + means any amount, so if there are more than one for example.
So, this string is basically saying match anything that fits these formats.
A great place to start is the help documentation on the Regex tool which you can find here:
https://help.alteryx.com/2019.1/RegEx.htm
I hope this helps!
M.
Hi @mceleavey ,
Need your help again.After test case when I ran for actual data with city name. All records almost came as invalid but it should be valid like below. When I applied the logic its coming as Invalid. Could you please check. Attaching the sheet (Test Cases Tab)for the expected result sheet.
SHANGHAI | Valid |
NANTONG | Valid |
GUANGZHOU | Valid |
MEIZHOU | Valid |
MELBOURNE | Valid |
SYDNEY | Valid |
COLLINGWOOD | Valid |
DAISY HILL | Valid |
SYDNEY | Valid |
MELBOURNE | Valid |
HAWTHORN | Valid |
PRAHRAN | Valid |
Hi @biswajit86 ,
this is because we were never given the real data and so it's very difficult to build without knowing what we're trying to match.
Can you provide a list of the cities with plenty of examples of valid and invalid according to the rules provided. Only then can we really build anything.
Thanks,
M.
Hi @mceleavey ,
The test cases I sent you should be the check to remove junk and special character.But in most real city name where there is no junk its coming as Invalid.
I have attached the sample for name of city list (field: sell_city_nm).
Hi @biswajit86 ,
Give the attached workflow a try.
I think the rules you gave previously are incorrect as they say any single letter and double letter is invalid, which is wrong:
but if they are just a mis-type then that's fine.
I've removed all records where there are characters repeated more than twice, and anything that contains anything other than letters and "-" and "."
If there are any errors, let me know.
M.
Hi @mceleavey
What I have knew is a city name can't be only single or double character e.g A or AA So invalid. can we check invalid in this way?
Only one repeating character like AAA or AA AAA.. it can't be a city name.SO it is invalid
If repeating character are in a word ,then I think this is fine.like COLLINGWOOD. here 'OO' is repeating but it is valid.
Thanks
Hi @mceleavey ,
Apart from last mail.Below are some sample observations after checking the solution.yxmd workflow which is not true.The expected out is different.
Values | Output | Expected Output |
A@B | Valid | Invalid |
A#B | Valid | Invalid |
A$B | Valid | Invalid |
A%B | Valid | Invalid |
A^B | Valid | Invalid |
“-“ | Valid | Invalid |
“.†| Valid | Invalid |
NANTONG | Invalid | Valid |
COLLINGWOOD | Invalid | Valid |
Hi @biswajit86 ,
this has got you far along enoughas far as the workflow goes. In the section that removes the characters, simply add the @#$% etc that weren't included in the rules you provided to the formula.
The last two seem to be repeated characters over three times that are not together, so amend the multi row formula accordingly.
Almost there!
M.
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
5 |