Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Web scraping to create a currency conversion table

MikeLR
8 - Asteroid

Hi

I created this in response to a question from @laubena  on this blog post 

It would be great to know if anyone has suggestions on how to simplify it.

Thanks

Mike

 

Currency Conversion.jpg

2 REPLIES 2
danilang
19 - Altair
19 - Altair

Hi @MikeLR 

 

"Simple" is very broad word and shouldn't be the only criteria to use to determine if a workflow is good or even great(@clmc9601 might want to chime in here) .  Does it produce the correct output?  Does it scale nicely(if required)?  Is it easy to maintain? Does it handle errors gracefully?         

 

The workflow as it stands appears to be as simple as it should be.  The output is correct.  Scaling won't be an issue since it's unlikely that we'll have millions of currencies.   Every step is well documented and each performs a specific task.  Someone looking at it once you become the CEO of your company, will have no problem determining how it works. 

 

The only place where you could add some complexity is around error checking.  What if the layout of the webpage changes or the site is unavailable?           

 

Technically, you could replace the ahref regex tools with a single formula tool with either Regex_Replace() or case-insensitive Replace() functions.  This would reduce your tool count, and if you use Replace() it will run faster than the equivalent Regex_Replace functions.  However, since you only have ~200 rows in your dataset, there won't be an appreciable increase in speed.  

 

It's probably possible to amalgamate most of the regex operations into a single expression, but this would increase the overall complexity of the regex, while reducing maintainability.      

 

To sum up, this is a good workflow.  Some error handling would turn it into a great one.

 

Dan

MikeLR
8 - Asteroid

Hi @danilang , thanks for the feedback. Good call on the error checking.

I have to admit I was stumped on how to do the entire ahref removal in one tool whilst keeping the currency code contained within it. 

 

e.g. just removing the pieces in bold:

<th scope="row"><a href="/currency/inr-indian-rupee/" class="sc-77d4b8b-0 dhfRzz">INR</a></th><td>Indian Rupee</td><td>82.94814598102036</td><td>0.012055724551442215</td></tr>

 

I also thought about making the workflow more generic by allowing the user to input a desired currency rather than the current input URL that is USD only.

Cheers

Mike

 

Labels