I am quite experienced in Alteryx, but I've never used any online API or parsing tools. I want to extract some spatial point fields from this website which shows the location of special post boxes in the UK: https://www.goldpostboxes.com/
If I right-click on the page and choose 'view source', I can see that data points and descriptive fields behind the map are structured in quite an ordered way in the html source, but particularly in the javascript section pictured below.
How would I go about parsing this data into Alteryx? I could save the html to a text file and attack it from there, but I like the idea of doing this live from the url. Which tool would help me, and how would I configure it to separate the lines and records from the html/javascript structure? Thanks
Solved! Go to Solution.
Thank you @jamielaird that's a great suggestion!
Here is my attempt #1. It gets me what I need, but it's long and ugly. I struggled with using multi-character delimiters in Text to Columns, so I had to do a Find Replace to identify certain key parts of the html/javascript, using the @ symbol as my (hopefully) unique delimiter.
I would be really grateful if anyone could suggest how this could be tidied up in Regex. I'm sure it can be done in 3 tools instead of 23! Thanks
Good morning. 3 was a challenge. I settled for 4. But since I believe in dropping data early, I have a couple of SELECT tools thrown in for good measure. The KEY tools are:
Once you get the data parsed to this point, you have data that appears as -0.1315429 51.4995718. Then you have an easy task to break that into two separate fields with a TEXT to COLUMNS tool. After that I used a SELECT to drop the data and rename Lon and Lat fields.
I haven't reviewed your workflow and see that you are creating points at the end of the process. This should get you where you want to be quickly. There may be other data that you are looking for, but this answers the root question of getting to the POINT data.
Cheers,
Mark
Whoa, this is amazing! Thank you. I don't have the slightest idea how the regex works, but work it does.
In my version I managed to scrape another field with a name or title. I found that this information preceded the POINT tag as follows:
So my inelegant long hand solution could look for the text immediately before the title, eg /a\u003E","title":", and then parse whatever comes next. The item order is clearly important here, because the title precedes the geometry. How would regex handle this order-specific operation?