I’m trying to figure out how to perform what I believe to be a difficult parsing problem. I’m new to regex, but I’m not even sure if regex is the way to go here. I have a full paragraph of legal text that looks something like this:
BLAH BLAH 26 U.S.C.S. § 115, blah blah blah blah. BLAH BLAH 26 U.S.C.S. § 116, blah blah blah blah. Blah blah blah blah. Blah blah blah. BLAH BLAH 26 U.S.C.S. § 115 blah blah blah blah. Blah. Blah. Blah.
My goal is to determine if a sentence contains the 116 statute in it, get that sentence and all proceeding sentences, until you hit a sentence with the 115 statute. The bold would be what I’m looking to extract from the paragraph.
Similarly, sometimes the cell contains a paragraph that looks like this.
BLAH BLAH 26 U.S.C.S. § 116, blah blah blah blah. Blah blah blah blah. Blah blah blah. BLAH BLAH 26 U.S.C.S. § 115, blah blah blah blah. According to 26 U.S.C.S. § 116 blah blah blah blah. Blah. Blah. Blah.
In this instance, I still need all sentences from all 116 statutes onward until the 115 stopword (if you will) triggers stoppage. See bold above.
Any ideas how I would approach this? I’m a little overwhelmed. Thanks for any and all help!
Good morning @theinsideguy,
I've made an example using a formula, a text-to-columns and a filter. I'm changing "According to" to a delimiter ("|"), then breaking down everything to rows and filtering only rows that contain 116.
Please let me know if this worked or if you have any questions!
Thank you so much for that quick response. I shouldn't have put "according to" in the example. The truth is, I have no idea what words will precede the statute. I edited my question.
This isn't perfect, as it's hard to code where sentences end, but is this close enough for you?
Joe and Ollie, your solutions are really close, but neither include the text BEFORE the trigger word. With OIlie's solution, I would have to include the previous "FALSE" record to get all of the text BEFORE the 116 statue. I'm going to try and fool around with Ollie's example to see if I can get it to work and report back a solution. Of course, any additional input is very much appreciated Joao.
Hey @theinsideguy do you want the text before, or the text after, or both?
If you want before then this should do the trick (hopefully)
Was hoping for the entire sentence.
So, "BLAH BLAH 26 U.S.C.S. § 115, blah blah blah blah. BLAH BLAH 26 U.S.C.S. § 116, blah blah blah blah. Blah blah blah blah. Blah blah blah. BLAH BLAH 26 U.S.C.S. § 115 blah blah blah blah. Blah. Blah. Blah." returns "BLAH BLAH 26 U.S.C.S. § 116, blah blah blah blah. Blah blah blah blah. Blah blah blah."
One of the issues I'm running into is what you've already mentioned—it's hard to determine what the beginning and end of a sentence is in legal writing with all of the U.S.C.S stuff (sometimes U.S.C.S is in the unstructured text, and sometimes not).