Extract attribute and the value between two quotation marks from a string
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi all,
I have a string which has the format:
<something modificationdate="D:20210316053656-07'00'" name="abcdefg-53abc-321" title="Person (ABC)" coords="39.018093,729.771500,221.102520,729.771500,39.018093,688.728150,221.102520,688.728150" subject="2021 test">
I am trying to extract the various attributes (or values between the two double quotes) for all of these and have struggled to get a clean output (either through TexttoColumns or RegEx - both of which I'm still pretty new to using e.g. tokenize or various expressions), so thought I'd reach out to the experts. I should note that fields such as e.g. subject, doesn't always exist in the dataset. My thought was to use the formula tool as well e.g. find name=" and then return the bit after it up to the next quotation mark, but had yet to find a successful solution.
I can extract all data in quotation marks using RegEx:
"(.*?)"
but am not quite sure how to get the attribute just before that in the cleanest way.
To clarify, the output would hopefully be (for each column with a pipe separating the value between quotes):
modificationdate | D:20210316053656-07'00'
name | abcdefg-53abc-321
title | Person (ABC)
coords | 39.018093,729.771500,221.102520,729.771500,39.018093,688.728150,221.102520,688.728150"
subject | 2021 test (if present or null if not)
The output can be rows of data e.g. data type and data value, or the column name being e.g. title and its value (for that row) being Person (ABC).
Thanks in advance!
p/s: I think I may be overthinking this, so help would be appreciated.
Solved! Go to Solution.
- Labels:
- Parse
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks @PhilipMannering! That has to be record for the quickest solution!
I hadn't thought to use the spaces as the initial split and this RegEx looks nifty.
(\w+)="(.*?)"
I think this means the following?
(\w+)= equals ANYWORD=
"(.*?)" equals any value between two double quotes
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
@flick You'd be surprised...
Yeah, pretty much. Specifically,
The brackets specify what we're capturing.
\w+ is 1 or more alphanumeric characters (a letter, number or underscore)
"(.*?)" is, like you say, anything between quotation marks. The "?" makes it 'non-greedy'. That means that it stops at the second quotation mark (as opposed to finding everything between the first and very last quotation mark... don't think it makes a difference in this case.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thanks @PhilipMannering for the additional clarification! 🙂
