Trying to parse an outlined document.
The bulk of my rows include [Orig] between the outline and the text I want to parse. I can parse this with a Regex Tool using (\d.+) (\[Orig\] .+) (thanks to the Community). This is NOT my question.
My problem is outlined below. Pun totally intended.
There are a few "reject" rows that lack the [Orig].
How can I insert a "[Add]" to match the standard configuration and enable it to be parsed with a similar Regex Tool (\d.+) (\[Add\] .+)????
The outline is 1-N. So, its not just a matter of inserting it X positions from the left.
Thanks
Solved! Go to Solution.
I might think about this differently. Rather than specifically calling out "Orig" in the regex pattern, I would ask about the spacing.
Are the outline values always consecutive characters with no spacing? If so, then your delimiter isn't the [Orig], it's the space between them.
Your regex string could be ^.+\s.+
Then you wouldn't need the configuration for the [Orig]
So, I tried with marked groups. I want the outline in Col 1 and the Text that either begins or does not begin with [Orig] in Col 2.
( ^.+)\s(.+)
No joy. It did not throw an error, just [Null]s.
I also tried a variation of the expression I am currently using (.+) (\[Orig\] .+), without the "Orig" reference.
(.+) \s (.+)
No joy, more [Null]s.
apologies, as sometimes regex is difficult to do off the top of my head without a few testing runs.
([^\s]+)\s(.*$)
should work. This would parse the full outline to begin the string, and everything AFTER the first space, regardless of the presence of [Orig]
([^\s]+)\s(.*$)
Sorry, still no joy. All [Nulls] and I tried a few mods.
Here is real data.
075-1.9 [Orig] All stainless steel fasteners including bolts, threaded nuts, holes and inserts shall be lubricated with a thin coat of anti-seize lubricant, Tef-Gel or equal, prior to assembly. |
043-10.1.1 Dimensions given in inches and fractions: blah blah. More blah blah blah. |
It works. But, what is the significance "\S" vs. "\s"?
I can decipher...
^ start at the beginning of line
\S ???
+ one or more
\s space - (the space between the outline # and what follows)
. any single character (this the start of the text)
*zero or more (characters)
$ until the end