We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

RegEx Question

gtg925j
5 - Atom

HI Everyone, 

 

I am pretty new to Alteryx and to the community. The community has been a great resource for learning but I can't seem to apply other examples to this case. I could use some help with parsing an XML file. There is one column with all the text between the <p> tags in each row. Id like to pull out section number and their titles in separate columns. It looks like the chapters go down to 4 levels max. Thanks for your help!

 

Here is an example of what the row contains:

 

1.0 Main Chapter Title

1.1 Subtitle 1

1.1.1 Subtitle 2

1.1.1.1 Subtitle 3

4 REPLIES 4
MarqueeCrew
20 - Arcturus
20 - Arcturus

@gtg925j ,

 

If the data looks like above, then:

 

([\d\.]+)\s(.*)

 

Use the RegEx Tool (set to parse) with that formula....

That will get you the results (I think)

 

Cheers,

 

Mark

 

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
gtg925j
5 - Atom

Thanks Mark, for the quick reply! This is definitely a step in the right direction. It works for all of the lines with sections but it looks like it also pulls out the first number in a line as the section and the following text as the title or any text behind the first period in the row.

 

for example: 

Copyright 1998 some other text here

March 15, 1997 some more text

Pages 55-57 appendix 

some company, INC. 1234 Address

full sentence. 

see attachment 14.

 

Results in:

SectionTitle
1998Some other text here
1997Some more text
57appendix
.1234 Address
. 
14. 

 

Thanks again for your inputs!

 

Josh

MarqueeCrew
20 - Arcturus
20 - Arcturus

Is it possible to post sample data that contains all forms of inputs so that a PATTERN can be returned that does the right thing to the right records?  Given your original input, that formula appears to work.

 

Cheers,

 

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
apathetichell
20 - Arcturus

I don't think anything is going to be perfect but perhaps regex_replace([field],"^([\d\.]+\s)","")

Labels
Top Solution Authors