Alteryx Designer Desktop Discussions

gtg925j · ‎10-25-2021

HI Everyone,

I am pretty new to Alteryx and to the community. The community has been a great resource for learning but I can't seem to apply other examples to this case. I could use some help with parsing an XML file. There is one column with all the text between the <p> tags in each row. Id like to pull out section number and their titles in separate columns. It looks like the chapters go down to 4 levels max. Thanks for your help!

Here is an example of what the row contains:

1.0 Main Chapter Title

1.1 Subtitle 1

1.1.1 Subtitle 2

1.1.1.1 Subtitle 3

MarqueeCrew · ‎10-25-2021

@gtg925j ,

If the data looks like above, then:

([\d\.]+)\s(.*)

Use the RegEx Tool (set to parse) with that formula....

That will get you the results (I think)

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

gtg925j · ‎10-25-2021

Thanks Mark, for the quick reply! This is definitely a step in the right direction. It works for all of the lines with sections but it looks like it also pulls out the first number in a line as the section and the following text as the title or any text behind the first period in the row.

for example:

March 15, 1997 some more text

Pages 55-57 appendix

some company, INC. 1234 Address

full sentence.

see attachment 14.

Results in:

Section	Title
1998	Some other text here
1997	Some more text
57	appendix
.	1234 Address
.
14.

Thanks again for your inputs!

Josh

MarqueeCrew · ‎10-25-2021

Is it possible to post sample data that contains all forms of inputs so that a PATTERN can be returned that does the right thing to the right records? Given your original input, that formula appears to work.

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

apathetichell · ‎10-25-2021

I don't think anything is going to be perfect but perhaps regex_replace([field],"^([\d\.]+\s)","")

Alteryx Designer Desktop Discussions

RegEx Question

Zero to Advanced in 20 days

Re: Zero to Advanced in 20 days

Re: Zero to Advanced in 20 days

Re: How separate channel ID

Re: How to separate IDs from the text