Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Extract attribute and the value between two quotation marks from a string

flick
5 - Atom

Hi all,

 

I have a string which has the format: 

 

<something modificationdate="D:20210316053656-07'00'" name="abcdefg-53abc-321" title="Person (ABC)" coords="39.018093,729.771500,221.102520,729.771500,39.018093,688.728150,221.102520,688.728150" subject="2021 test">

 

I am trying to extract the various attributes (or values between the two double quotes) for all of these and have struggled to get a clean output (either through TexttoColumns or RegEx - both of which I'm still pretty new to using e.g. tokenize or various expressions), so thought I'd reach out to the experts.   I should note that fields such as e.g. subject, doesn't always exist in the dataset. My thought was to use the formula tool as well e.g. find name=" and then return the bit after it up to the next quotation mark, but had yet to find a successful solution.

 

I can extract all data in quotation marks using RegEx: 

"(.*?)"

but am not quite sure how to get the attribute just before that in the cleanest way.

 

To clarify, the output would hopefully be (for each column with a pipe separating the value between quotes):

modificationdate | D:20210316053656-07'00'

name | abcdefg-53abc-321

title | Person (ABC)

coords | 39.018093,729.771500,221.102520,729.771500,39.018093,688.728150,221.102520,688.728150"

subject | 2021 test    (if present or null if not)

 

The output can be rows of data e.g. data type and data value, or the column name being e.g. title and its value (for that row) being Person (ABC).

 

Thanks in advance!

 

p/s: I think I may be overthinking this, so help would be appreciated.

4 REPLIES 4
PhilipMannering
16 - Nebula
16 - Nebula

Hi @flick 

 

Perhaps something like this,

PhilipMannering_0-1620070477930.png

 

flick
5 - Atom

Thanks @PhilipMannering!  That has to be record for the quickest solution!

 

I hadn't thought to use the spaces as the initial split and this RegEx looks nifty. 

 

(\w+)="(.*?)"

 

I think this means the following?

 

(\w+)=  equals ANYWORD=

"(.*?)"  equals any value between two double quotes

PhilipMannering
16 - Nebula
16 - Nebula

@flick You'd be surprised...

 

Yeah, pretty much. Specifically,

The brackets specify what we're capturing.

\w+ is 1 or more alphanumeric characters (a letter, number or underscore)

"(.*?)" is, like you say, anything between quotation marks. The "?" makes it 'non-greedy'. That means that it stops at the second quotation mark (as opposed to finding everything between the first and very last quotation mark... don't think it makes a difference in this case.

 

 

flick
5 - Atom

Thanks @PhilipMannering for the additional clarification! 🙂

Labels