Alteryx Designer Desktop Discussions

flick · ‎05-03-2021

Hi all,

I have a string which has the format:

I am trying to extract the various attributes (or values between the two double quotes) for all of these and have struggled to get a clean output (either through TexttoColumns or RegEx - both of which I'm still pretty new to using e.g. tokenize or various expressions), so thought I'd reach out to the experts. I should note that fields such as e.g. subject, doesn't always exist in the dataset. My thought was to use the formula tool as well e.g. find name=" and then return the bit after it up to the next quotation mark, but had yet to find a successful solution.

I can extract all data in quotation marks using RegEx:

"(.*?)"

but am not quite sure how to get the attribute just before that in the cleanest way.

To clarify, the output would hopefully be (for each column with a pipe separating the value between quotes):

modificationdate | D:20210316053656-07'00'

name | abcdefg-53abc-321

title | Person (ABC)

coords | 39.018093,729.771500,221.102520,729.771500,39.018093,688.728150,221.102520,688.728150"

subject | 2021 test (if present or null if not)

The output can be rows of data e.g. data type and data value, or the column name being e.g. title and its value (for that row) being Person (ABC).

Thanks in advance!

p/s: I think I may be overthinking this, so help would be appreciated.

PhilipMannering · ‎05-03-2021

Hi @flick

Perhaps something like this,

flick · ‎05-03-2021

Thanks @PhilipMannering! That has to be record for the quickest solution!

I hadn't thought to use the spaces as the initial split and this RegEx looks nifty.

(\w+)="(.*?)"

I think this means the following?

(\w+)= equals ANYWORD=

"(.*?)" equals any value between two double quotes

PhilipMannering · ‎05-03-2021

@flick You'd be surprised...

Yeah, pretty much. Specifically,

The brackets specify what we're capturing.

\w+ is 1 or more alphanumeric characters (a letter, number or underscore)

"(.*?)" is, like you say, anything between quotation marks. The "?" makes it 'non-greedy'. That means that it stops at the second quotation mark (as opposed to finding everything between the first and very last quotation mark... don't think it makes a difference in this case.

flick · ‎05-03-2021

Thanks @PhilipMannering for the additional clarification! 🙂

Alteryx Designer Desktop Discussions

Extract attribute and the value between two quotation marks from a string

Example workflow for setting up a custom list to u...

Re: Firm names parse

Re: Help with Multi-Row formula

Re: Assign Random data to Executive with limited p...

Re: cmd tool wokring fine in designer not server