To Extract between words
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Dear Experts,
I would like to extract "PR8470" and "PR8475" and "PR8474" from the below string.
I have multiple row of data
OUR REF : PR8470YOUR REF : NST000ECI21000206 <(>&<)> NST000ECI21000213PO AMENDED as at 23 July 2021 1.
OUR REF : PR8475YOUR REF : T/CG/10/FY19PO TO BE READ WITH
OUR REF : PR8474YOUR REF : NST000ECI21000207OPTIONAL AWARD TO BE EXERCISED IF REQUIRED
Solved! Go to Solution.
- Labels:
- Regex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hey @jerometyl, will the rows always be in this format? At least the beginning, up until the part you want to extract, and with a 6-character reference? If so, you can use something like this but if you have any problems when applying this to a larger data set, let us know and we can revisit:
:\s(.{6})
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi,
If you will have the same format always, you can use also the below. If changes, you can change the formula at which character should start and how many characters to extract.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi DataNath, some time it is 6 characters, sometimes 7. Basically, behind will end with Your Ref: I need it to be after Our Ref: and before your ref:
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@jerometyl in that case, the following should work. I've updated the expression and shown an example of a reference containing loads of characters just to show it'll pick up anything between the Our Ref: and Your Ref:
:\s(.+)YOUR REF
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I believe that you have structured data when I see the examples. You start with a FIXED string. Follow it with a VARIABLE string. Follow it with a known (FIXED) string and follow that with "stuff". Here's what I see:
[STUFF to IGNORE][Good STUFF}[STUFF to IGNORE]
The variability of "Good Stuff" makes the challenge more interesting. I like where @AuraRautoiu was going with the Substring function. That function allows you to START at a fixed position and he/she then applied a FIXED length of 6. Heck, you could replace "YOUR REF:" (note the 2 spaces between YOUR and REF) with a PIPE (|) and this problem is easy to solve with a text to columns tool. But back to the string function ...
Substring([Field1],10,FindString([Field1], "YOUR REF :") -10)
The findstring() function allows you to search for "YOUR REF :" and see it in position 16 most of the time. That means that you want what STARTS in position 10 and ends 6 characters later (16 minus 10). When the target is 7 digits, YOUR will appear in position 17. We know that 17 minus 10 equals 7 and so you will get your longer/shorter values with this expression.
A regular expression, RegEx, will work for you as @DataNath provides. I've answered many RegEx posts and know it fairly well but in your case I wouldn't choose this as a solution path. I wonder how @Emil_Kos would solve the challenge. I've tagged him and we'll see how he goes about this. I don't disagree with DataNath. If the data was more irregular or if you wanted to learn RegEx you could explore something like:
Regex_Replace([Field1],".*?(PR\d{4,})\D.*",'$1')
Look for ANYTHING until you find the FIRST appearance of "PR" followed by 4 or more digits (\d). Grab that PR9999....9 as a group. When you see your first non-digit (\D), ignore the rest.
In that expression, the result doesn't depend on a fixed input.
Cheers,
Mark
Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @MarqueeCrew,
Thank you for flagging me. The answer always depends on the data. If the amount of data is relatively small, I will go with using find and replace tool. I like having all the fields I am searching for in the input tool and in my experience having a workflow that is easy to maintain is almost always my number one priority.
I am assuming that @jerometyl is interested in only those particular REF numbers regex might be quite tricky. If the data structure is always the same, it is very possible that I would propose the same approach as you but not only because of the speed but because it would be easier to maintain by the end user.
If it isn't the same I would go with writing IF contains formulas with several IF function.
![](/skins/images/18F3BF3EB54AD3C2739B5AA9B77A7F97/responsive_peak/images/icon_anonymous_message.png)