Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Regex? Extract non-uniform string text

Empower49
8 - Asteroid

I welcome your expertise. I have data formatted in a long text field in Excel and need to extract the number field (such as 590288-01 in the first example below.) My challenge is that this number is not necessarily in the same place within each record. The number typically, but not always has an extension of 01, 02, etc. 

 

Sample records:

 

Originator To Beneficiary Info:    /ROC/PLAN15987 OTabcW 590288-01    AMIGOIN FUNHOUSE, 23

 

Originator To Beneficiary Info:    REFERENCE 1260422-01 BOB CASS IDY AND ASSOCIATES 

 

Originator To Beneficiary Info:    THE DIRECTED ACCOUNT PLAN      PLAN 385222-02 

 

Thank you!

 

Originator To Beneficiary Info:    STATE OF BLISS #98988-01 

 

Originator To Beneficiary Info:    Applied Pumpkin Plan 45486901 

9 REPLIES 9
Claje
14 - Magnetar

Hi,

 

Based on your examples, I identified the following defining characteristics.  If these are inaccurate, the below REGEX will not work.

 

the target number field is one of the following

a set of digits up to 6 in length followed by a dash ("-") and two digits

a set of digits of length 7 or 8 with no dashes or spaces.

 

((?:\d{0,6}-\d{2})|(?:\d{7,8}))

This regex looks for either a set of 0 to 6 digits, followed by a dash and 2 digits, OR a set of 7 or 8 digits.


Hope this helps!

MarqueeCrew
20 - Arcturus
20 - Arcturus

@Empower49,

 

Along with @Claje, I used a regex expression to create your desired output.

 

REGEX_Replace([Field1], ".*([0-9]{5,7}\-{0,1}[0-9]{2}).*", '$1')

\d is a shortcut for the set of characters between 0 and 9.  I look for a group (enclosed within parenthesis) that comes after any characters where the group contains:

  • 5-7 characters of "digits"
  • followed by 0 or 1 dash symbols
  • followed by 2 digits

This solves for your desired results too.

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Empower49
8 - Asteroid

Thank you! This is fabulous. This works for many of the records, but some of the records have other numbers in them in that are not part of the "Originator To Beneficiary Info:" string. These values are being returned in my output when I only need the value of 590288-01 below.  For example, I have the following in one Excel field:

 

Originator To Beneficiary Info:    /ROC/PLAN15987 OTabcW 590288-01    AMIGOIN FUNHOUSE, 23

Reference #: P482973

ID:893723

 

Thanks SO much!

MarqueeCrew
20 - Arcturus
20 - Arcturus

@Empower49,

 

When using my solution (above), I get 90288-01 as a result.  So I made a minor tweak to the expression:

REGEX_Replace([Field1], ".*?([0-9]{5,8}\-{0,1}[0-9]{2}).*", '$1')

The ? will look to the first occurrence of the pattern.  Now I get 590288-01

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Empower49
8 - Asteroid

Thanks so much Mark, for your help. I appreciate it. I may be do something wrong (if so, I'm uncertain what...) because the parse returns no data. Alteryx says "0 records were successfully parsed..." Do you have any tips? Thank you!

MarqueeCrew
20 - Arcturus
20 - Arcturus

here's my test module.

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Empower49
8 - Asteroid

You are awesome! Thanks for your help. This is fabulous because it extracts the first string, but unfortunately the string I am seeking is not necessarily the first string in the field, but it is always preceded by "Originator To Beneficiary Info:"  Sorry for so much back and forth, here's an example of the entire field that I am searching. I am trying to extract the string = 590283-01

 

R Number:                        181117498
Fed Ref:                           00905898724
Date/Time Received:               11/15/2018 10:02:00 AM
Originator:                        Amigonnafun                            2600 SHERIDAN DRIVE                TONAWANDA NY 14250 
Receiving Bank:                    1022349021 Chicago
Beneficiary:                       /10-9-9133286049                     
Beneficiary Ref:                   124264289

 

Thank you!
Originator To Beneficiary Info:    /ROC/PLAN15987 OTCFGW 590283-01    Pumpkintime. DXXXXX,,

MarqueeCrew
20 - Arcturus
20 - Arcturus

@Empower49,

 

I think that this is what you need:

 

REGEX_Replace([Field1], ".*Originator To Beneficiary Info:.*?([0-9]{5,8}\-{0,1}[0-9]{2}).*", '$1')

you can modify my prior workflow and use this formula.

 

Cheers,

 

Mark

 

 

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Empower49
8 - Asteroid

You are a rock star/Alteryx star! This works. Thanks so very much. I'm very new to Alteryx and this is a huge help.

 

Many thanks to you both!

 

 

Labels