HI Everyone,
I am pretty new to Alteryx and to the community. The community has been a great resource for learning but I can't seem to apply other examples to this case. I could use some help with parsing an XML file. There is one column with all the text between the <p> tags in each row. Id like to pull out section number and their titles in separate columns. It looks like the chapters go down to 4 levels max. Thanks for your help!
Here is an example of what the row contains:
1.0 Main Chapter Title
1.1 Subtitle 1
1.1.1 Subtitle 2
1.1.1.1 Subtitle 3
@gtg925j ,
If the data looks like above, then:
([\d\.]+)\s(.*)
Use the RegEx Tool (set to parse) with that formula....
That will get you the results (I think)
Cheers,
Mark
Thanks Mark, for the quick reply! This is definitely a step in the right direction. It works for all of the lines with sections but it looks like it also pulls out the first number in a line as the section and the following text as the title or any text behind the first period in the row.
for example:
Copyright 1998 some other text here
March 15, 1997 some more text
Pages 55-57 appendix
some company, INC. 1234 Address
full sentence.
see attachment 14.
Results in:
Section | Title |
1998 | Some other text here |
1997 | Some more text |
57 | appendix |
. | 1234 Address |
. | |
14. |
Thanks again for your inputs!
Josh
Is it possible to post sample data that contains all forms of inputs so that a PATTERN can be returned that does the right thing to the right records? Given your original input, that formula appears to work.
Cheers,
Mark
I don't think anything is going to be perfect but perhaps regex_replace([field],"^([\d\.]+\s)","")