Alteryx Designer Desktop Discussions

jt_edin · ‎03-27-2018

I am quite experienced in Alteryx, but I've never used any online API or parsing tools. I want to extract some spatial point fields from this website which shows the location of special post boxes in the UK: https://www.goldpostboxes.com/

If I right-click on the page and choose 'view source', I can see that data points and descriptive fields behind the map are structured in quite an ordered way in the html source, but particularly in the javascript section pictured below.

How would I go about parsing this data into Alteryx? I could save the html to a text file and attack it from there, but I like the idea of doing this live from the url. Which tool would help me, and how would I configure it to separate the lines and records from the html/javascript structure? Thanks

jamielaird · ‎03-27-2018

Hi @jt_edin

This is a perfect situation for the Download tool. Put the URL into a Text Input tool and pass it through the Download tool, then use Text to Columns on the downloaded data to split to rows on the newline character \n. From there you should find it pretty easy to parse out the lat and lon from the relevant rows (especially if you use Regex).

Good luck!

Jamie Laird
Principal Consultant, Keyrus UK
Mob: +44(0)757 699 0053

This message has been scanned for malware by Websense. www.websense.com

jt_edin · ‎03-27-2018

Thank you @jamielaird that's a great suggestion!

Here is my attempt #1. It gets me what I need, but it's long and ugly. I struggled with using multi-character delimiters in Text to Columns, so I had to do a Find Replace to identify certain key parts of the html/javascript, using the @ symbol as my (hopefully) unique delimiter.

I would be really grateful if anyone could suggest how this could be tidied up in Regex. I'm sure it can be done in 3 tools instead of 23! Thanks

MarqueeCrew · ‎03-28-2018

@jamielaird,

Good morning. 3 was a challenge. I settled for 4. But since I believe in dropping data early, I have a couple of SELECT tools thrown in for good measure. The KEY tools are:

INPUT
DOWNLOAD
RegEx

Once you get the data parsed to this point, you have data that appears as -0.1315429 51.4995718. Then you have an easy task to break that into two separate fields with a TEXT to COLUMNS tool. After that I used a SELECT to drop the data and rename Lon and Lat fields.

I haven't reviewed your workflow and see that you are creating points at the end of the process. This should get you where you want to be quickly. There may be other data that you are looking for, but this answers the root question of getting to the POINT data.

Cheers,

Mark

Alteryx ACE & Top Community Contributor

Chaos reigns within. Repent, reflect and restart. Order shall return.
Please Subscribe to my youTube channel.

jt_edin · ‎03-28-2018

Whoa, this is amazing! Thank you. I don't have the slightest idea how the regex works, but work it does.

In my version I managed to scrape another field with a name or title. I found that this information preceded the POINT tag as follows:

So my inelegant long hand solution could look for the text immediately before the title, eg /a\u003E","title":", and then parse whatever comes next. The item order is clearly important here, because the title precedes the geometry. How would regex handle this order-specific operation?

Alteryx Designer Desktop Discussions

Best tool to extract lat / lon from webpage source

Re: Change Data Type of Input Data before Reading

Re: Change Data Type of Input Data before Reading

Re: Join versus Union

Re: Filter

Re: Regex help please - Parsing a big text area