Alteryx Designer Desktop Discussions

biswajit86 · ‎06-13-2019

I have test cases to mark which string is invalid or valid. I used below in formula tool

IF contains([Seller City], " Blank") then "Seller City is Blank"
ELSEIF (REGEX_Match([Seller City],"[a-zA-Z\-\.\~]+"))
then "Seller Country Valid"
ELSE "Seller Country Invalid"
ENDIF

After using above formula, It is showing invalid for last one ABC DER, The output file after using formula is attached.

Please help me if there is any function or any way to achieve this

Below is the expected result

A	valid
AA	Invalid
AAA	Invalid
A AA	Invalid
A AA AAA	Invalid
A AA AAA AAAA	Invalid
A12A	Invalid
1234	Invalid
A~	Invalid
ÃB	Valid
$#%ABC	Invalid
-	Invalid
(!@#$%^&*(?) â€“	Invalid
A-B-C	Valid
A.B.C	Valid
A@B	Invalid
A#B	Invalid
A$B	Invalid
A%B	Invalid
A^B	Invalid
@	Invalid
#	Invalid
$	Invalid
%	Invalid
^	Invalid
	Blank
ABCD	Valid
ABC DER	Valid

Thanks

Biswajit

biswajit86 · ‎06-13-2019

Thanks @mceleavey . Just brilliant :)

Its working fine.

Whats the meaning of "\u[B]|\u\u\u\u|\u\u\u\s\u+|\u.\u.\u")) ?

Where can I get these kind of learning knowledge. and Could you please suggest me?

Thanks

Biswajit

mceleavey · ‎06-13-2019

Hi @biswajit86 ,

it's not the most elegant of solutions, but it does the trick!

I'm not sure it will work in all cases as it is just built using the "Test Cases" tab data, so you might need to tweak it as other use cases come in.

Basically the meaning is as follows:

\u is any uppercase character (be sure to use the cleanse tool to change everything to uppercase), | is "or", \s is space, + means any amount, so if there are more than one for example.

So, this string is basically saying match anything that fits these formats.

A great place to start is the help documentation on the Regex tool which you can find here:

https://help.alteryx.com/2019.1/RegEx.htm

I hope this helps!

M.

biswajit86 · ‎06-19-2019

Hi @mceleavey ,

Need your help again.After test case when I ran for actual data with city name. All records almost came as invalid but it should be valid like below. When I applied the logic its coming as Invalid. Could you please check. Attaching the sheet (Test Cases Tab)for the expected result sheet.

SHANGHAI	Valid
NANTONG	Valid
GUANGZHOU	Valid
MEIZHOU	Valid
MELBOURNE	Valid
SYDNEY	Valid
COLLINGWOOD	Valid
DAISY HILL	Valid
SYDNEY	Valid
MELBOURNE	Valid
HAWTHORN	Valid
PRAHRAN	Valid

mceleavey · ‎06-19-2019

Hi @biswajit86 ,

this is because we were never given the real data and so it's very difficult to build without knowing what we're trying to match.

Can you provide a list of the cities with plenty of examples of valid and invalid according to the rules provided. Only then can we really build anything.

Thanks,

M.

biswajit86 · ‎06-19-2019

Hi @mceleavey ,

The test cases I sent you should be the check to remove junk and special character.But in most real city name where there is no junk its coming as Invalid.

I have attached the sample for name of city list (field: sell_city_nm).

mceleavey · ‎06-19-2019

Hi @biswajit86 ,

Give the attached workflow a try.

I think the rules you gave previously are incorrect as they say any single letter and double letter is invalid, which is wrong:

but if they are just a mis-type then that's fine.

I've removed all records where there are characters repeated more than twice, and anything that contains anything other than letters and "-" and "."

If there are any errors, let me know.

M.

biswajit86 · ‎06-19-2019

Hi @mceleavey

What I have knew is a city name can't be only single or double character e.g A or AA So invalid. can we check invalid in this way?

Only one repeating character like AAA or AA AAA.. it can't be a city name.SO it is invalid

If repeating character are in a word ,then I think this is fine.like COLLINGWOOD. here 'OO' is repeating but it is valid.

Thanks

biswajit86 · ‎06-20-2019

Hi @mceleavey ,

Apart from last mail.Below are some sample observations after checking the solution.yxmd workflow which is not true.The expected out is different.

Values	Output	Expected Output
A@B	Valid	Invalid
A#B	Valid	Invalid
A$B	Valid	Invalid
A%B	Valid	Invalid
A^B	Valid	Invalid
â€œ-â€œ	Valid	Invalid
â€œ.â€	Valid	Invalid
NANTONG	Invalid	Valid
COLLINGWOOD	Invalid	Valid

mceleavey · ‎06-20-2019

Hi @biswajit86 ,

this has got you far along enoughas far as the workflow goes. In the section that removes the characters, simply add the @#$% etc that weren't included in the rules you provided to the formula.

The last two seem to be repeated characters over three times that are not together, so amend the multi row formula accordingly.

Almost there!

M.

Alteryx Designer Desktop Discussions

How to remove repeating character from a string

Re: Is there any way the computer vision tools can...

Re: Batch Macro

Re: How to get cell reference address from excel

Re: Replacing Forecast columns with Actual Data

Re: Row creation