Alteryx Designer Desktop Discussions

Tibo · ‎02-25-2016

Hi,

I tried to use the download tool to extract a table from a wikipedia page: https://en.wikipedia.org/wiki/ISO_3166-1_alpha-2

I'm trying to get the big table with country code and country names.

I've used the download tool to get the data, used Text to column tool with "Split to Rows" and \n as delimiter.

It basically gives me the html code of the page.

What's next to isolate that specific table?

This would take seconds in powerquery and easily refresh, it'd be great to have a user friendly tool to grab data from web...

Thanks

Tibo

rdoptis · ‎02-25-2016

Hey Tibo,

In situations like this, I like to build a custom API via import.io and use it with the download tool in Alteryx. This is free, and can capture data from very complex webpages with little effort.

Cheers,

Ryan

michael_treadwell · ‎02-25-2016

This isn't perfect but it should get you started.

Using XML tool, dynamic select, and transpose you can pull the XML data for each specific row of the table you are after.

JohnJPS · ‎02-25-2016

The attached solution utilizes observed specifics in the HTML to extract the relevant rows generate the Abbr/CountryName list in straight-forward fashion. Granted, if they tweak the table, results may go bad, but that may well be the case with any solution.

(But as noted above, import.io is probably the way to go here: it captures every column in the table).

jdunkerley79 · ‎02-25-2016

Try using import.io to set up a parse process.

This then presents a JSON API endpoint you can use in Alteryx much more easily

For this page I tried the magic.import.io and it worked staright away. https://magic.import.io/?site=http:%2F%2Fen.wikipedia.org%2Fwiki%2FISO_3166-1_alpha-2

You need to create an account and save the API. After this go to Export and Simple API integration.

For this page my generated API was: https://api.import.io/store/connector/b53ea8c8-6851-4645-8522-bc3f2db005d2/_query?input=webpage/url:...

I set up a download tool to grab the JSON and then parse with standard tools.

Attached a sample

jdunkerley79 · ‎02-25-2016

Double post :(

KaneG · ‎02-25-2016

Hi Tibo,

There's a few different ways of Parsing the html table listed here, so I thought that I would include mine. The base of it is around using REGEX to parse each element to a new line (I used to replace each element with a pipe (eg Replace([DownloadData],"<table","|") and then parse on the pipe before I got used to REGEX).

I'll also jump on the love for Import.io, as once the page gets a little more complicated than a simple table, Import.io handles those elements better.

Kane

Tibo · ‎02-25-2016

Thanks Guys,

This is helpful, I'm familiar with import.io so will use that as it is the easiest here.

I just can't wait for such functionality to be embedded in a proper Alteryx tool because all the parsing methodologies outlined here are way too long for me, I'd be quicker copying and pasting the table in a text input and refresh it manually :-)

Anyway this gives me ideas for other stuff so thanks for spending the time

Tibo

cjones514 · ‎11-25-2019

Is it true that the API functionality of import.io is now a paid feature?

Alteryx Designer Desktop Discussions

Extract a table from Wikipedia