Hi,
I have a text field in SharePoint that I am trying to pull data from. The field not only includes the text but a bunch of other html tags and markup stuff. Is there any easy way to extract just the content from this field and ignore the noise. Below is a few examples and attached is a sample data set.
Raw Input:
1. <div class="ExternalClass4B42A9A42C6147E282243445F9319F2F"><p>Added more exception hadling logic for unanamous votes<br></p></div>
2. <div class="ExternalClassB558E6964BAC4AC1A6A4A7B40B0C2DE1"><p></p><ul style="list-style-type:disc;"><li>Fixed issue on multiple contacts for Trustees (greater than 2)</li><li>Added middle names for Plan Sponsor/Plan Admin</li><li>Added logic for using City and Branch ID to narrow down single result</li><li>Fixed an issue on plan effective date not within range (happened when plan effective date is next year and recordkeeping start date is on current year)<br></li></ul><p><br></p></div>
3. <div class="ExternalClass14CED75ED35F4E04BB911A9D639105D4"><p><span style="font-family:arial, sans-serif;font-size:10pt;">Added
Environment variable for processing time</span><br></p></div>
Desired Output:
1. Added more exception hadling logic for unanamous votes
2. Fixed issue on multiple contacts for Trustees (greater than 2), Added middle names for Plan Sponsor/Plan Admin, Added logic for using City and Branch ID to narrow down single result, Fixed an issue on plan effective date not within range (happened when plan effective date is next year and recordkeeping start date is on current year)
3. Added Environment variable for processing time
Thanks for your help.
Solved! Go to Solution.
Hi @mclane20 ,
this requires a bit of regex wizardry.
Regex makes me happy.
I've attached the workflow, and you need to apply a bit of logic to clean up the reserved characters, but you're 90% of the way there.
Basically, you simply need to replace everything in between the XML open and close tags with nothing.
Hope this helps,
M.
Hi @mclane20 ,
I've attached a further version which removes the &# instances as well.
With the remaining hidden characters, such as & etc., I would maintain them on a text input and couple that with the Find and Replace tool.
M.
This works perfect. Thanks so much for your help. Regex really is so powerful.
User | Count |
---|---|
19 | |
14 | |
13 | |
9 | |
8 |