Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Alteryx is here to help you solve your biggest data challenges. Read about the new Virtual Solution Center here.
SOLVED

Modifying and Extracting a Unique ID from a Body of Text

Highlighted
10 - Fireball

 

Reference the sample body of text below.  I need to extract the contract number from it.  The pattern to the contract number is ([A-Z]{1}.{5}-[0-9]{2}-[A-Z]{1}-.{4}), with hyphens separating the different sections.  Sometimes the contract number lacks the hyphens.  In the sample below, the contract number N0018919D0004 lacks the hyphens.  

 

How can I extract the number and add the hyphens or add the hyphens and extract the number?   I need to keep the body of text.  There can be more than one contract number -- they will all have or not have hyphens, there is never a mix.

 

If the hyphens exist, I use the RegEx tool set using the expression ([A-Z]{1}.{5}-[0-9]{2}-[A-Z]{1}-.{4}) to tokenize and split to rows. 

 

Thanks

 

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Tempus quam pellentesque nec nam. Sapien pellentesque habitant morbi tristique senectus et netus et. Convallis posuere morbi leo urna molestie. Enim sed N0018919D0004 turpis in eu mi bibendum neque egestas. Eu mi bibendum neque egestas congue. Mi ipsum faucibus vitae aliquet nec ullamcorper sit. Id ornare arcu odio ut. Fermentum dui faucibus in ornare quam viverra. Integer malesuada nunc vel risus commodo viverra maecenas accumsan. Consectetur lorem donec massa sapien faucibus et molestie ac. Et magnis dis parturient montes nascetur ridiculus mus mauris.

 

 

 

Highlighted
12 - Quasar

Hi @hellyars 

 

You could add a question mark after each hyphen, which basically means 0 or 1 hyphen: [A-Z]{1}.{5}-[0-9]{2}-[A-Z]{1}-.{4}

 

Hope this helps.

Jasper

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

Hi @hellyars 

 

Does this work well for you?

 

Solution1.PNG

 

First I add hyphens to your contract number, then extract them into Rows. Workflow appended.

 

Cheers,

Highlighted
10 - Fireball

@jasperlch 

Thanks.  Two things.

 

1.  This works.

 

2.  BUT, it reveals a flaw in my expression.   My expression is now picking up spaces, so its picking up word number combinations that meet the same criteria.    How do I change my expression to ignore spaces?  Is there a way to specify that the .{5} must be either a number or a letter but not a space?

Highlighted
10 - Fireball

@Thableaus   Thanks.  Same two things.

 

1.  It works.

 

2. But I am running into the same problem.   My expression is catching other text, space and number combinations that match the pattern.  How do I mod the expression to only inlaced alpha numeric characters and ignore spaces?  I am still new to RegEx expressions.  

 

Thanks again.

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

Hi @hellyars 

 

Instead of (.), use \w

 

REGEX_Replace([Field1], "([A-Z]{1}\w{5})([0-9]{2})([A-Z]{1})(\w{4})", "$1-$2-$3-$4")

 

\w brings only word characters (letters, digits and also the underscore) and excludes punctuation.


Cheers,

10 - Fireball

@Thableaus 

 

I'm running into some problems translating it to the real world.  I can't get it to catch everything.   And, I can't sort out why.

 

Here is some real world data.  It includes contract numbers with and without hyphens.  It also includes examples with more than one contract Id per entry.

 

Thanks.

Highlighted
Alteryx Certified Partner
Alteryx Certified Partner

Hi @hellyars 

 

For some weird reason, there is a hyphen difference between some records.

That means - some records have a kind of hyphen, others have another.

Example: 

GE Medical Systems Information Technologies Inc., Wauwatosa, Wisconsin, has been awarded a maximum $450,000,000 firm‐fixed‐price, indefinite‐delivery/indefinite‐quantity contract for patient monitoring systems, accessories and training. This was a competitive acquisition with 36 responses received. This is a five-year base contract with one five‐year option period. Location of performance is Wisconsin, with a Jan. 10, 2024, performance completion date. Using customers are Army, Navy, Air Force, Marine Corps, and federal civilian agencies. Type of appropriation is fiscal 2019 through 2024 defense working capital funds. The contracting activity is the Defense Logistics Agency Troop Support, Philadelphia, Pennsylvania (SPE2D1‐19‐D‐0010).

SPE2D1‐19‐D‐0010 has a CHAR(8208), a UNICODE hyphen.

 

On the other hand

Raytheon Co. (Raytheon) Space and Airborne Systems (SAS), San Diego, California, is being awarded a single award with a contract ceiling of $9,607,811 for an indefinite-delivery/indefinite-quantity contract for Multi-Spectral Targeting System (MTS) sensor support. Raytheon will provide subject matter expertise as a member of a government-led sensor development and demonstration team and will provide research, development, fielding and test support, operations, maintenance, and as-needed repairs on the government-owned MTS-class sensors. Raytheon is the sole designer, developer, and manufacturer of the MTS-class sensor. Work will be performed at El Segundo and San Diego, California. The ordering period and the period of performance is five years from the date of award. The first task order will be awarded at the same time the basic contract is awarded. Fiscal 2018 research, development, test and evaluation funds in the amount of $1,299,520 for the first task order is being obligated at time of award. The award to Raytheon SAS is the result of a proposal submitted in response to a sole-source solicitation (HQ0147-18-R-0013) one offer was received. The Missile Defense Agency, Redstone Arsenal, Alabama, is the contracting activity. (HQ0147-19-D-0013).

 

HQ0147-19-D-0013 has a CHAR(45) hyphen, an ASCII hyphen

 

This expression solves it for all - Tokenize - ([A-Z]{1}\w{5}[[:punct:]]?[0-9]{2}[[:punct:]]?[A-Z]{1}[[:punct:]]?\w{4})(?:.)

That's because [[:punct:]] refers to any kind of punctuation (ASCII and UNICODE).

 

This was a very technical and specific case, I'm sorry you were introduced to RegEX in such a complicated situation.

 

Workflow with final solution - and also the CHAR difference between hyphens is appended to you.

 

Cheers,

 

 

 

 

Highlighted
10 - Fireball

That's some impressive root cause analysis.  Thank you.  Everything is working.  

Labels