Having a little moment here!
I am trying to extract everything before the comma, and I need to match on the below expressions and pull out the blue sections:
[Standard][Monthly][2][Post]
[OCS][Annual][6][Pre],[Standard][Monthly][2][Post]
I have written the below regex expression, however in the second pattern it is not just pulling out the first bit but returning the whole input.
What am I missing here?
Thank you!
Solved! Go to Solution.
Well that is beautifully simple!
Thank you!
@klambert if you want a non-regex alternative:
Basically, if it contains a comma, take the information from the start up until that point, if not, take all the information.
I prefer RegEx, but it's nice to know the more manual alternatives.
Formulas used:
FindString: used to locate the position of the comma
Substring: used to take a portion of the string. You just need to provide the starting position, and how many characters to take from the start. In this instance, we use the Findstring to tell us how long the length needs to be.
Last thing, you'll notice I have !=-1 in the if statement, this is because of how the FindString function behaves. If it doesn't find the comma in this case, it returns -1 to indicate the comma wasn't found, and if it does find the comma, it'll return the position in the string:
Hope that's useful.
Hi @klambert, do you know that there's only one comma in a string? I'm asking because the suggested expression will match up to the last comma and in case of multiple commas per string, you'll have to decide whether you want to extract the part before the first one, the last one or something in between. If you choose the first one, an expression like this should work for you:
([^,]+).*
This is perfect, would you mind explaining the breakdown?
I read this as "starts with at least one comma" aka ^,+ followed by any combination of characters .*
But that is obviously not what its doing!
Sure, no problem. The brackets are marking a capturing group - that is what @ShankerV is referencing as $1 in the replacement text. The "[^,]" is a negated (because it starts with a "^") character set, meaning that it will match any character that is not a comma. The "+" means that we want one or more such characters. The ".*" means zero or more of any character.
So, this could be read as give me one or more characters (as many as possible) that are not a comma, followed by anything. The trailing part can be empty.
Ah perfect, that is a very helpful explanation. Thank you!