Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

HTML parsing help

JamesGray
7 - Meteor

Hi,

 

I have limited parsing / regex experience and am struggling due to the volume of HTML per the site. So would really appreciate someone with more experience to help.

 

 

I am looking to parse out the data from the following gov website links: https://www.applytosupply.digitalmarketplace.service.gov.uk/g-cloud/search

 

These include each search result in the link above. You will see in each page there is a section called "Reseller", see example around 2/3 down this page Analytics and Data Science Service 

 

I would like to be able to parse out into a column with header "Supplier Type" and then the text shown in the corresponding field next to this per the web page.

 

Another example: FourNet (4net) Cloud Unified Communications (UCaaS) . Shows that there is a subfield underneath Reseller called "Organisation whose services are being resold". I would like if this field exists to also parse out the data for these into another field.

 

Thank you for any help.

 

1 REPLY 1
OllieClarke
15 - Aurora
15 - Aurora

Hi @JamesGray 

 

Based on your post I've made this:

OllieClarke_0-1678880573358.png

 

The first RegEx isolates just the resellers table (grabbing anything after the resellers scroll tracking, but before the next scroll tracking)

OllieClarke_1-1678880593974.png

The next RegEx takes this isolated table, and tokenises out the information in it - basically anything immediately before the </dt or </dd closing html tags 

OllieClarke_2-1678880814532.png

I keep those closing tags in the output so we can use them to isolate what's a header (</dt) and what's a value (</dd). We do a bit of cleaning, and then we just create a record ID within each type and url, and then we can transform into the structure you want.

OllieClarke_3-1678880904458.png

 

Hope that helps,

 

Ollie

 

 

 

 

Labels
Top Solution Authors