Hi all,
I have a string which has the format:
<something modificationdate="D:20210316053656-07'00'" name="abcdefg-53abc-321" title="Person (ABC)" coords="39.018093,729.771500,221.102520,729.771500,39.018093,688.728150,221.102520,688.728150" subject="2021 test">
I am trying to extract the various attributes (or values between the two double quotes) for all of these and have struggled to get a clean output (either through TexttoColumns or RegEx - both of which I'm still pretty new to using e.g. tokenize or various expressions), so thought I'd reach out to the experts. I should note that fields such as e.g. subject, doesn't always exist in the dataset. My thought was to use the formula tool as well e.g. find name=" and then return the bit after it up to the next quotation mark, but had yet to find a successful solution.
I can extract all data in quotation marks using RegEx:
"(.*?)"
but am not quite sure how to get the attribute just before that in the cleanest way.
To clarify, the output would hopefully be (for each column with a pipe separating the value between quotes):
modificationdate | D:20210316053656-07'00'
name | abcdefg-53abc-321
title | Person (ABC)
coords | 39.018093,729.771500,221.102520,729.771500,39.018093,688.728150,221.102520,688.728150"
subject | 2021 test (if present or null if not)
The output can be rows of data e.g. data type and data value, or the column name being e.g. title and its value (for that row) being Person (ABC).
Thanks in advance!
p/s: I think I may be overthinking this, so help would be appreciated.