parse string using other strings
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello,
I have a list of strings with an underscore seperator like the following:
ftThreefinalgrps_brd4_co10
ftif5_co10
ftqn5_bra1
I also have a list of "identifiers", that tell me what each part of the string means.
please note that the number of identifiers and their name is variable
br
ft
co
The goal is to parse the string using the identifiers.
Using my first variable, I want to find everything after "br" and before the underscore (if there is any), in this case "d4".
Then I want everything after "ft" and before the underscore, being "Threefinalgrps"
Finally, I want everything after co and before the underscore, so "10"
My ideal final result would be:
variable | ft | br | co |
ftThreefinalgrps_brd4_co10 | Threefinalgrps | d4 | 10 |
ftif5_co10 | if5 | 10 | |
ftqn5_bra1 | qn5 | a1 |
Does anyone have a suggestion on how to start?
Thanks!
Mikis
Solved! Go to Solution.
- Labels:
- Transformation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I've attached an example solution that's a little different from the elegant route @ponraj took. We both relied on the '_' splitting each identifier, but I used the list of strings and their solution parsed all strings as identifiers.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I would jump to Regex:
REGEX_Replace( [variables], ".*?(" + [Field1] + "([^_]+))?.*?", "$2")
But it is pretty hard so an alternative is:
IIF(Contains([variables], [Field1]), Substring([variables], FindString([variables], [Field1])+length([Field1]),1000), NULL())
This will trim the string down and then
IIF(Contains([Match], "_"), Left([Match], findstring([Match],"_")-1), [Match])
To chop off at the first _
Sample of both attached.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Woah, some excellent answers.
They all worked, but I think I'm going with @jdunkerley79's suggestion because it doesn't rely on the underscore splitting.
The problem is that I have absolutely no idea how many underscores the field will have, and not having to make the assumption sounds lovely.
It'll take some time before I understand why that regex thing works, but it works great!
What do you mean however with "its pretty hard"? Do you mean it's a difficult formula? Because you already wrote it, copying it is easy :)
Cheers!
Mikis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@jdunkerley79 there's still a small issue in the script.
If a string by coincidence contains one of the codes, then it would be split incorrectly.
for example the following string
ftThrebrefinalgrps_brd4_co10
Then the "br" would split "efinalgrps"
So I would actually want to split on "_br" unless it's the first 2 characters.
Does that make sense?
I changed your code to
REGEX_Replace( [variables], ".*?(" + "_" + [Field1] + "([^_]+))?.*?", "$2")
But I would want to make an exception if it's the first 2 letters :)
I hope I explained this properly...
Regards,
Mikis
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Got it :)
It's not very graceful, but it works.
(I renamed "field1" to "dimensions")
i added a helper column "before_dimension" to see what comes before the field
TRIM(Left([variables], FindString([variables], [dimensions])))
And then I check: "before_dimension" should either be null (if it's the first word) or the last symbol should be an "_"
if Length([before_dimension]) = 0 then [Match] elseif Right([before_dimension], 1) = "_" then [Match] else null() endif
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Sorry to be slow replying:
REGEX_Replace( [variables], ".*?((?=^|_)" + [dimensions] + "([^_]+))?.*?", "$2")
Should fix the issue. It does a look behind to check at a _ or the start of a string
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
[I said it worked perfectly, but I found an issue later]
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hm, now I have a new issue.
It no longer finds anything after the underscore.
i don't want to be lazy and find the solution myself, but **bleep** regular expressions are complicated :O