I've tried a myriad of other similar posts, but I'm not familar at all with HTML and an certainly no expert using RegEx, so I am at a loss for trying to figure this out.
I have 2 fields prjct.Objective and prjct.Background that have HTML(see attached sample) and I only need the actual text that would be displayed in the browser.
I entered this as "I NEED THIS PART AND IT CAN BE ANY TEXT OR CHARACTERS" to those parts of the code in each field.
The text can appear numerous times in each field. I think the overall HTML would be the same all the way through the data-set, but it's hard to tell given the size of it.
prjct.Objective
"<HTML><head><meta http-equiv=""Content-Type"" content=""text/html; charset=utf-8"" /><title>Untitled</title><style type=""text/css"">
p { margin-top: 0px;margin-bottom: 0px;line-height: 1.15; }
body { font-family: 'Segoe UI';font-style: Normal;font-weight: normal;font-size: 13.3333333333333px; }
.Normal { telerik-style-type: paragraph;telerik-style-name: Normal;border-collapse: collapse; }
.TableNormal { telerik-style-type: table;telerik-style-name: TableNormal;border-collapse: collapse; }
.NormalWeb { telerik-style-type: paragraph;telerik-style-name: NormalWeb;margin-top: 6.66px;margin-bottom: 6.66px;border-collapse: collapse; }
.p_A43897F6 { telerik-style-type: local;text-align: left; }
.s_1858219 { telerik-style-type: local;font-family: 'Arial';font-style: Normal;font-weight: normal;font-size: 16px;color: #222222;background-color: #FFFFFF; }
.s_4D7243C3 { telerik-style-type: local;font-family: 'Segoe UI';font-size: 13.3333333333333px;color: #000000; } </style></head><body><p class=""NormalWeb p_A43897F6""><span class=""s_1858219"">I NEED THIS PART AND IT CAN BE ANY TEXT OR CHARACTERS</span></p><p class=""Normal ""><span class=""s_4D7243C3""> </span></p></body></HTML>"
prjct.Background
"<HTML><head><meta http-equiv=""Content-Type"" content=""text/html; charset=utf-8"" /><title>Untitled</title><style type=""text/css"">
p { margin-top: 0px;margin-bottom: 0px;line-height: 1.15; }
body { font-family: 'Segoe UI';font-style: Normal;font-weight: normal;font-size: 14.6666666666667px; }
.Normal { telerik-style-type: paragraph;telerik-style-name: Normal;border-collapse: collapse; }
.TableNormal { telerik-style-type: table;telerik-style-name: TableNormal;border-collapse: collapse; }
.p_3207D3C4 { telerik-style-type: local;font-family: 'Verdana';font-style: Normal;font-weight: normal;font-size: 16px;color: #000000; }
.li_8F34398 { telerik-style-type: local;margin-left: 24px;text-indent: 0px;font-family: 'Symbol';font-style: Normal;font-weight: normal;font-size: 14.6666666666667px;color: #000000; }
.s_2CC9B3CB { telerik-style-type: local;font-family: 'Segoe UI';font-size: 14.6666666666667px;color: #000000; } </style></head><body><ul style=""list-style-type:disc""><li value=""1"" class=""li_8F34398""><p class=""Normal p_3207D3C4"">1. I NEED THIS PART AND IT CAN BE ANY TEXT OR CHARACTERS</p></li></ul><p class=""Normal "">2. I NEED THIS PART AND IT CAN BE ANY TEXT OR CHARACTERS.</p><p class=""Normal "">3. I NEED THIS PART AND IT CAN BE ANY TEXT OR CHARACTERS.</p><p class=""Normal ""4. I NEED THIS PART AND IT CAN BE ANY TEXT OR CHARACTERS</p><p class=""Normal "">5. I NEED THIS PART AND IT CAN BE ANY TEXT OR CHARACTERS</p><p class=""Normal ""><span class=""s_2CC9B3CB""></span></p></body></HTML>"
Based on the snippets above the output should look like this
prjct.ID | prjct.Title | prjct.Code | prjct.ActStartDate | prjct.ActEndDate | prjct.Objective | prjct.Background |
1779 | Alpha | 20200182 | 8/31/2020 | 12/3/2020 | I NEED THIS PART AND IT CAN BE ANY TEXT OR CHARACTERS | 1. I NEED THIS PART AND IT CAN BE ANY TEXT OR CHARACTERS. 2. I NEED THIS PART AND IT CAN BE ANY TEXT OR CHARACTERS. 3. I NEED THIS PART AND IT CAN BE ANY TEXT OR CHARACTERS. 4. I NEED THIS PART AND IT CAN BE ANY TEXT OR CHARACTERS. 5. I NEED THIS PART AND IT CAN BE ANY TEXT OR CHARACTERS. |
Any help is greatly appreciated!
-Craig
Solved! Go to Solution.
For something like this, I usually look for the pattern and then adjust or make macros around finding the specific tags.
First, I'll split all tags into new lines. Then figure out what HTML text came before and after the one I want. Usually if you're going to crawl a whole site it stays somewhat consistent.
You may also get things like unicode markup that requires some conversion to get rid of (assuming you don't want it).
Let me know if this attached flow makes sense.
AARRGGH.. Forgot, my company hasn't deployed the most recent version of Designer desktop. I don't supposed you could save/export it as a V 2020.2.3?
Sure. But technically there's no need.
To downgrade a workflow version (works as long as all tools used in the workflow are available in both the source and target version), unzip that package with WinZip, 7Zip, etc.
Open the yxmd in Notepad and change the version:
Then, click save and it should open just fine.
Thanks for the Version trick! That was very helpful. The parsing doesn't quite work though.
A Co-worker was able to help me out.
This worked
REGEX_Replace(REGEX_Replace(REGEX_Replace(REGEX_Replace([FIELD],'&[^&]*;',''),'p[^*]*; \}',''),'Untitled',''),'<[^>]*>','')
Hi. Came across this post and this little code helped me solve an issue I worked on all day yesterday. Thank your colleague for me!