Hi Team,
I am looking through research papers and would like to extract all text to the right of the word 'Conclusion' or 'Conclusions'. I have only just started to learn how to use the regex tool and am using the formula below:
.*\.(.*conclusion.*$)
Unfortunately it is not working for all the fields. I'm not sure what it is I'm doing wrong. Can anyone help me understand and modify the formula so it works in any scenario. Ideally I would like to parse everything to the right regardless of case and character when the word appears for the first time in the string.
Many Thanks
B
Hi there @Bushra_Akhtar
I actually prefer to use the regular formula tool when I do RegEx.
In this case, I would do something like this:
regex_replace([Field], ".*?conclusion(.*)", "$1")
This is saying, in Field, look for anything then the first instance of the word conclusion then keep anything after that, designated by the (.*)
Hope this helps!
That is great, thanks so much for that - that really works. I have one more modification I would like to make. Due to the variation of the word 'conclusion' and the characters I get after it I need to strip these out. For example I am left with sentences starting with 'S:', 's:', ' ' or ':'
Example of a output sentence is:
'S: High baseline NP is associated with...'.
I just want the sentence to start from 'High baseline NP is associated with....'
How can I strip these characters out. They are at the start of the field.
Many Thanks
B
What if you did something like this:
trim(replace(replace(replace([Field], "S:", ""), "s:", ""), ":", ""))
Thanks,
I have modified your formula so it to the below so it also trims the random 'S' or 's' that are appearing at the start of the sentence. Only thing is it is trimming all the s's in the text - I only want it to trim those at the start of the sentence. How can I modify this?
trim(replace(replace(replace(replace(replace([conclusion_text], "S:", ""), "s:", ""), ":", ""), "s", ""), "S", ""))
Many Thanks
B
Instead of modifying this formula, I decided to just go back to my original formula and edit that one slightly. Here is the new expression you can use from your input data:
trim(regex_replace([Field1], ".*?conclusion(s?)(\:?)(.*)", "$3"))
Starting with the inner expression, the regex_replace - this is building the regex expression that says there could possible be something before the word conclusion, there could possibly be a "s" and there could possibly be a ":" then there are any number of characters after that. These are split into groups using the parenthesis. The last part is saying to only keep the third group, i.e. the part after the ":", if there is one.
Then this is just wrapped in a trim() function so that you don't have any trailing spaces.
Hope this helps! If you have any additional questions or want more clarification, let me know!
Here is a good resource if you'd like to practice regex sometime, @Bushra_Akhtar