Missed the Q4 Fall Release Product Update? Watch the on-demand webinar for more info on the latest in Designer 24.2, Auto Insights Magic Reports, and more!
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Parsing with Regex - A newbie dilemma 2

l_blumberger
7 - Meteor

Hi everyone, 


I am attempting to do some web scraping to save some time when it comes to updating a database I created, but I have never used RegEx before. I am wondering if/how I can parse out "Text" from the following:

 

<a href="https://www.Example.text.2089A.html">

<span class="icon="></span>
Text

</a>

 

 

I hope my question my sense. I can't seem to find a lot of resources that are useful for learning this in Alteryx.


Thanks!

3 REPLIES 3
jdunkerley79
ACE Emeritus
ACE Emeritus

I would suggest trying something like the Regex Coach (http://www.weitz.de/regex-coach/) to help write the RegEx.

 

Alteryx supports the standard Perl syntax, so you can find various resources for this online.

 

Looking at your specific text. Something like:

<a[^>]*>\s*(<span[^>]*>\s*</span>)?\s*(.*)\s*</a>

will parse the string.

 

Breaking it down:

  • <a[^>]*> reads the first a tag
  • \s* ignore any white space
  • (<span[^>]*>\s*</span>)? reads the span open and close tag if it exists (into $1)
  • (.*) greedily reads anything
  • </a> matches the closing a tag

 

The 'Text' part will be in $2.

 

You can use a Regex tool in Parse mode to read this out.

 

Sample attached 

MarqueeCrew
20 - Arcturus
20 - Arcturus

Hi @l_blumberger,

 

How about something like:

 

regex_replace([text],".*\W(http.*?)\W>.*","$1")

The (http.*?) looks for a non-greedy web address.  $1 is the first (and only) group.

 

I tested with your sample data and got this

 

parse results

https://www.Example.text.2089A.html

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.
Joe_Mako
12 - Quasar

If your actual data is like your example, with the a tag not inside any other tag, then you can use the XML Parse tool configured to Root and Ignore XML Errors and Continue.

 

If the a tag is within other tags, then you can use the option Specific Child Name with a value of "a" (no quotes) instead of Root.

 

xml parse.png

Labels
Top Solution Authors