This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on 05-09-2018 09:02 AM - edited on 10-21-2021 01:09 PM by LeahK
ATTACHMENT:Regex Introduction.yxmd
Your RegEx Journey Starts Here
My journey with Regular Expressions (RegEx) started because I was lazy. I realized that substring, findstring and other searches were difficult to configure when my data wasn't well structured. As I began working with more and more data I found that the time spent in learning about RegEx was saving me loads of time and was getting me cleaner data. I didn't learn it all overnight. Sometimes I needed help and I did ask for it.
I see many questions in the Alteryx Community regarding patterns. Some of them I answer with RegEx and some without. Just the other day I used both the contains()function as well as the RegEx replace in the same expression for a post. The pattern used was not for a beginner and I apologized for not being able to adequately explain it. I am self-taught and sometimes I just know how to write something (learned from trial and error) and explaining the expression was difficult for me. In the following video I show you ways to identify basic target data (Tokens) and use a variety of formula replacements to demonstrate how the token finds matches throughout the string.
The following 20 minute video walks you through some simple replacements and hopefully provides you with the courage to explore your own data with RegEx at your side. If you want a quick reference, you can use RegEx101 to help you. I explain in the video that I prefer to use Alteryx designer, but the choice is yours to make. There are training opportunities here in community and at Inspire for you as well.
Cheers,
Mark
Very helpful. Thanks for creating this MarqueeCrew.
What would you like to see next?
I am fresh out of questions at the moment.
I am pretty sure I will run into some snags anytime soon.
I will keep you posted.
Thanks again.
Regards,
Pushkar
P.S. On second thought, do you blog?
https://marqueecrew.wordpress.com
http://www.chaosreignswithin.com/?m=1
And you can find me here in knowledge base and blog posts.
https://community.alteryx.com/t5/Alter-Everything-Podcast/4-Defensive-configuration/ba-p/157650
cheers,
mark
This is so helpful. Thank you immensely Mark.
Hello @MarqueeCrew,
I am trying to use RegEx_Replace function in Formula tool to extract the month from a string.
But this function works more like - let us remove whatever we don't need.
I looked for a different function but did not find any.
I guess I should spend more time on writing better logic.
Any thoughts?
Does this video help?
You can also subscribe and follow me on YouTube. If you post your sample data and desired output, I or someone will quickly reply. It might be that there is a datetimeformat() solution too.
Cheers,
Mark
Aaah I understand where I went wrong @MarqueeCrew, the problem I am working on is a part of the weekly challenge.
I have already seen your solution to that challenge but wanted to come up with something by myself. This helped a lot.
Thanks a ton. I will follow the youtube channel and post queries there going forward.
Hello,
I'm trying to utilize a regex formula to help me extrapolate a CUSIP from transactional data. The data structure and type won't always be similar to this data set, but it should be relatively close. The CUSIP should be no more than 9 characters (usually 9 sometimes less, but never more than 9) either numeric or alphanumeric.
I have attached some sample data transactions in column A and the CUSIP I would need from column A in column B.
TRANSACTION DATA | CUSIP NEEDED |
NO DATACASH RECEIPTTRAN # CASH DIVIDEND-LADDER CORP-505743104EX DATE:12/7/18, PAY DATE:1/24/19RATE: 0.422942 | 505743104 |
NO DATACASH RECEIPTTRAN # STOCK DIVIDEND-505743104-LADDER CORPORATIONEX DATE:12/7/18;PAYDATE:1/24/19RATE:0.147058 | 505743104 |
NO DATACASH RECEIPTTRAN # $0.0237/SHARE ON 8 SHARES DUE 1/25/1900776M517: ADVISORS DISCIPLINED TR | 00776M517 |
NO DATACASH RECEIPTTRAN # $0.166/SHARE ON 65 SHARES DUE 04/01/19115637209BROWN FORMAN CORP CL B | 115637209 |
NO DATACASH RECEIPTTRAN # $0.22/SHARE ON 70 SHARES DUE 04/01/19654106103NIKE INC CLASS B | 654106103 |
NO DATACASH RECEIPTTRAN # $0.47/SHARE ON 89 SHARES DUE 01/16/19857477103STATE STREET CORP | 857477103 |
NO DATACASH RECEIPTTRAN # $0.19/SHARE ON 956 SHARES DUE 01/23/1920030N101COMCAST CORP CL A | 20030N101 |
NO DATACASH RECEIPTTRAN # $0.8675/SHARE ON 65 SHARES DUE 12/31/19755111507RAYTHEON CO | 755111507 |
NO DATACASH RECEIPTTRAN # $1.1/SHARE ON 59 SHARES DUE 02/11/19009158106AIR PRODS & CHEMS INC | 009158106 |
NO DATACASH RECEIPTTRAN # $0.9/SHARE ON 100 SHARES DUE 03/12/19478160104JOHNSON & JOHNSON | 478160104 |
NO DATACASH RECEIPTTRAN # $0.4/SHARE ON 10 SHARES DUE 03/29/19737630103POTLATCHDELTIC CORPORATION | 737630103 |
NO DATACASH RECEIPTTRAN # OPTIONAL DIVIDEND STOCK PAYMENTRETAIL-76133Q102 | 76133Q102 |
NO DATACASH RECEIPTTRAN # $0.7925/SHARE ON 59 SHARES DUE 01/14/1992276F100VENTAS INC | 92276F100 |
NO DATACASH RECEIPTTRAN # $0.43/SHARE ON 445 SHARES DUE 01/15/1921871N101CORECIVIC INC | 21871N101 |
Any assistance would be greatly appreciated.
Since you asked for "any" assistance, here is a suggestion:
REGEX_Replace([TRANSACTION DATA], ".*(\d\w{7}\d).*", '$1')
It works with your 9 digit CUSIP values as long as they begin and end with a number. Sometimes you need to setup your data first by identifying multiple patterns. You can use a regex match function to look for one pattern and see if it matches, then use one formula there and another formula for say 10 digit values. Of course, for that example, I would check for the larger first.
Cheers,
Mark
Hello,
This was very helpful and worked flawlessly. However, I've been trying to tweak your original REGEX for use on the sample below, but I'm not getting the desired results. This kind of data the CUSIP does not have to be 9 characters long and could begin with either a letter or number, but should end with a number.
TRANSACTION DATA | CUSIP NEEDED |
DIVIDEND @ 0.0568293 USD/AUD FX RATE 0.7077EX DATE: 12/28/2018 PAY DATE: 03/04/2019BY7QXS7 : VICINITY CENTRES | BY7QXS7 |
DIVIDEND @ 0.001658 USD/AUD FX RATE 0.7138EX DATE: 12/28/2018 PAY DATE: 02/28/20196161978 : MIRVAC GROUP | 6161978 |
DIVIDEND @ 0.2521019 USD/AUD FX RATE 0.7138EX DATE: 12/28/2018 PAY DATE: 02/28/2019B033YN6 : DEXUS | B033YN6 |
DIVIDEND @ 0.0628857 USD/AUD FX RATE 0.7138EX DATE: 12/28/2018 PAY DATE: 02/28/20196365866 : G P T GROUP | 6365866 |
DIVIDEND @ 73 USD/JPY FX RATE 108.17EX DATE: 11/13/2018 PAY DATE: 01/10/2019B0MKZN5 : TSURUHA HOLDINGS INC | B0MKZN5 |
DIVIDEND @ 7.767109 USD/50 SHSEX DATE: 12/28/2018 PAY DATE: 04/26/20194942818 : SAMSUNG ELECTRONICS CO LTD -GDR 144A | 4942818 |
CASH DIVIDEND PAID IN GBP-BD8QVH4-INTERCONTINENTALEX DATE:1/14/19;PAYDATE:1/29/19RATE:2.038 | BD8QVH4 |
DIVIDEND REINVESTMENT PAID IN AUDEX DATE:12/28/18; PAY DATE:2/15/19DIVIDEN RATE=0.046TRANSURBAN-6200882 | 6200882 |
DIVIDEND @ 0.8 USD/EUR FX RATE 1.116EX DATE: 04/26/2019 PAY DATE: 05/20/2019BYQP136 : A B N AMRO GROUP NV-CVA | BYQP136 |
DIVIDEND @ 50 USD/JPY FX RATE 110.62EX DATE: 12/26/2018 PAY DATE: 03/29/2019B5LTM93 : OTSUKA HOLDINGS CO LTD | B5LTM93 |
Any assistance would be greatly appreciated.
Very helpful. Thank you very much.
Thanks for the wonderful post. One think I personally recommend is fixing the case of the txt fields into upper or sentence case or lower case and then run the regex tokens on the fields to get more accurate answers.
Helpful, Thanks.
I need a cheat sheet in text version. I cant extrapolate from the video what I'm trying to do :(