Alteryx Designer Desktop Discussions

smoskowitz · ‎10-11-2017

Hello --

I am trying to scrap some country/country code information from the following site: https://www.irs.gov/e-file-providers/foreign-country-code-listing-for-modernized-e-file

I can get the country using the regex tool, but can't seem to figure out how to get the next piece. Below is what I am trying to get:

Here is what I have done so far:

I have no regex skills so this is just Googling around. Let me know what I am missing. I should have about 258 rows of data.

Thanks,

Seth

Kenda · ‎10-11-2017

Hey @smoskowitz! I created a small workflow that may be able to help you out. I first split the DownloadData field into rows based on new lines. I used a Formula tool to parse out the parts of the field we wanted to keep. Then a Multi-Row Formula tool and and a Filter tool to get the country code next to the corresponding country. Hope this helps!

GavinAttard · ‎10-11-2017

Hi @smoskowitz

Quick and crude but attached should do the trick

cheers

Gavin

Alteryx Everything, Leave no one behind.

smoskowitz · ‎10-11-2017

Thank you! What exactly is this doing:

REGEX_Replace([DownloadData], '(.*">)(.*)(<.*)', "$2")

Kenda · ‎10-11-2017

@smoskowitz I'm not sure how familiar you are with REGEX_Replace, but it has three necessary parameters: the field name, the pattern you're looking for, and the replacement value. Here, basically this is saying look for the words between ">...< and only keep that.

To be more specific, the pattern here groups your field into three parts, using parenthesis to separate each part. The first part is everything until (and including) the ">. The second part is everything after the "< and before the <. The third part is everything after (and including) the <. The $2 then tells Alteryx to only keep the second grouping. Hopefully that makes sense!

Alteryx Designer Desktop Discussions

Web Scraping -- Country and Country Code