I am trying to parse paragraphs of text that appear in a larger HTML document.
I want to extract the target paragraphs as rows. I assume I need a multi-row formula tool. But, I don't know how to write the expression and then parse.
other html. |
<br /> |
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <br /> |
<br /> |
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. <br /> |
<br /> |
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. <br /> |
other html |
Solved! Go to Solution.
Hi @hellyars
Are the rows coming in this format?
Like <br /> in one row,
and the other the whole paragraph?
Could you share a part of your original html document? Where are you getting it from?
Cheers,
OBE
If it's only this:
<br /> |
Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. <br /> |
<br /> |
Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. <br /> |
<br /> |
Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum. <br /> |
You could use a Replace Function (<br /> to "") and then Filter out empty records, and you'd have your paragraphs into rows. Plain and simple.
Cheers,
But, it is not only that. There is other html before and after the target paragraphs. That's what I tried to represent in the sample table with the first and last records that are labeled "other html". ..which reflects hundreds of lines after the initial post download parse.
I need to isolate the paragraphs. The problem is the paragraphs don't start with a tag. It's just straight text. I need something that plays off the fact that the target paragraphs are always preceded by a record that only contains <br />. That's how I can isolate the target paragraphs from al the html and random text.
Would that work?
Create a Flag that the row before is <br />
Filter paragraphs with that Flag.
Cheers,
Cool. I have to remember this little trick. I paired it with another if statement so that I can capture both the standard and non-standard constructs. Thanks!