Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Filtering a column of data to return alpha-information only

Archaeopteryx
10 - Fireball

Hi,

I found in this forum how to use a REGEX along with the "false" path of the Filter tool to pull only those rows where a column does not contain numeric data, REGEX_Match([column name], "\w+"). I placed this in a variable expression and utilized the 'false' of the Filter path to obtain only those rows where that column contains only alpha data. But, some of the cells in the column have dates formatted, YYYY-MM-DD.

 

Can I modify the REGEX expression to include data formatted like that? Or would I need to use an AND clause in the expression? If the latter, how would I formulate the expression to target date formatted information?

 

Thank you,

Chris

 

13 REPLIES 13
mcrew
6 - Meteoroid

 

@Archaeopteryx,

 

The following expression will find 4 numbers followed by a dash followed by 4 numbers followed by a dash followed by 2 numbers.

 

Regex_Match([Column Name],"\d{4}-\d{2-\d2")

Cheers,

Mark

MarqueeCrew
20 - Arcturus
20 - Arcturus

@Archaeopteryx,

 

The following expression will find 4 numbers followed by a dash followed by 4 numbers followed by a dash followed by 2 numbers.

 

Regex_Match([Column Name],"\d{4}-\d{2}-\d{2}")

Please mark this as the accepted solution.  I mistakenly signed in with a wrong account.

 

Cheers,

Mark 

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Archaeopteryx
10 - Fireball

I kinda sorta still get dates. This is the expression I'm using. I'm not sure how to attach a screen shot.

Would the below be the correct way to code the expression?

 

REGEX_Match([NEW VALUE],"\w+") OR
REGEX_Match([NEW VALUE],"\d{4}-\d{2-\d2")

 

 

MarqueeCrew
20 - Arcturus
20 - Arcturus
REGEX_Match([NEW VALUE],"\w+") OR 
REGEX_Match([NEW VALUE],"\d{4}-\d{2}-\d{2}")

That should work to test for either condition as TRUE

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Archaeopteryx
10 - Fireball

I captured the rejected data in the True path of the Filter, and while the REGEX expressions are working 99.999%, there are two rows that should not have been rejected. The [New Value] column in those two records contain "Withdrawal" and the other "Rejection". I do a data cleanse before we reach this filter to remove trailing and leading spaces. I ran the workflow a couple of times to just see if it was a fluke. But this condition remains. I inspected the data in the desired 'false' path. Those two records do not appear there.

MarqueeCrew
20 - Arcturus
20 - Arcturus

do you mean that they contain 

"Withdrawl" or "Rejection" ?

 

Including the quote symbols?

 

 

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Archaeopteryx
10 - Fireball

Among the data I do not want ( 'true' path of filter) there is data that I do want ('false' path of filter).

 

Two records went to the 'true' path of not wanted data that should have gone to the 'false' path of wanted data.

 

The value of the column [New Value] in one of those records is Rejection. The value of the column [New Value] in the other record is Withdrawal.

MarqueeCrew
20 - Arcturus
20 - Arcturus

They went to the True path because they match the pattern.

 

REGEX_Match([NEW VALUE],"\w+") OR
REGEX_Match([NEW VALUE],"\d{4}-\d{2}-\d{2}")

The first statement looks for anything that is a single string of characters (e.g. WhyNotMatchMe)

 

Maybe you need to tweak that statement a little bit.  Can you describe what you DO want to match a little more precisely?

 

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Archaeopteryx
10 - Fireball

Well, those two records represent about .0001% of the data that was properly rejected.

 

Here is what I understand the REGEX to be doing:

 

REGEX_Match([NEW VALUE],"\w+") will give us only numeric data (?)

 

REGEX_Match([NEW VALUE],"\d{4}-\d{2}-\d{2}") will give us only date formatted data (?)

 

The whole expression::

REGEX_Match([NEW VALUE],"\w+") OR
REGEX_Match([NEW VALUE],"\d{4}-\d{2}-\d{2}")

 

I take the above expression to mean that rows where the [New Value] column is numeric or where the [New Value] column has date formatted data will be selected. (?) This represents data that I do not want. The plot thickens.....

 

I use the false path of the filter tool because I want records where there [New Value] data is alpha only.

 

The true path of the filter would therefore contain the numeric and date information I do not want, the throw away information. Herein lies the problem. This path contains two measly rows that have defied the REGEX.

Labels