Extracting text to the right of a keyword
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Team,
I am looking through research papers and would like to extract all text to the right of the word 'Conclusion' or 'Conclusions'. I have only just started to learn how to use the regex tool and am using the formula below:
.*\.(.*conclusion.*$)
Unfortunately it is not working for all the fields. I'm not sure what it is I'm doing wrong. Can anyone help me understand and modify the formula so it works in any scenario. Ideally I would like to parse everything to the right regardless of case and character when the word appears for the first time in the string.
Many Thanks
B
- Labels:
- Regex
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi there @Bushra_Akhtar
I actually prefer to use the regular formula tool when I do RegEx.
In this case, I would do something like this:
regex_replace([Field], ".*?conclusion(.*)", "$1")
This is saying, in Field, look for anything then the first instance of the word conclusion then keep anything after that, designated by the (.*)
Hope this helps!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
That is great, thanks so much for that - that really works. I have one more modification I would like to make. Due to the variation of the word 'conclusion' and the characters I get after it I need to strip these out. For example I am left with sentences starting with 'S:', 's:', ' ' or ':'
Example of a output sentence is:
'S: High baseline NP is associated with...'.
I just want the sentence to start from 'High baseline NP is associated with....'
How can I strip these characters out. They are at the start of the field.
Many Thanks
B
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
What if you did something like this:
trim(replace(replace(replace([Field], "S:", ""), "s:", ""), ":", ""))
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks,
I have modified your formula so it to the below so it also trims the random 'S' or 's' that are appearing at the start of the sentence. Only thing is it is trimming all the s's in the text - I only want it to trim those at the start of the sentence. How can I modify this?
trim(replace(replace(replace(replace(replace([conclusion_text], "S:", ""), "s:", ""), ":", ""), "s", ""), "S", ""))
Many Thanks
B
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Instead of modifying this formula, I decided to just go back to my original formula and edit that one slightly. Here is the new expression you can use from your input data:
trim(regex_replace([Field1], ".*?conclusion(s?)(\:?)(.*)", "$3"))
Starting with the inner expression, the regex_replace - this is building the regex expression that says there could possible be something before the word conclusion, there could possibly be a "s" and there could possibly be a ":" then there are any number of characters after that. These are split into groups using the parenthesis. The last part is saying to only keep the third group, i.e. the part after the ":", if there is one.
Then this is just wrapped in a trim() function so that you don't have any trailing spaces.
Hope this helps! If you have any additional questions or want more clarification, let me know!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Here is a good resource if you'd like to practice regex sometime, @Bushra_Akhtar
