Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Web Scraping -Almost there

msve
8 - Asteroid

Hello,

 

I have created workflow that gets data from WSj website (https://www.wsj.com/market-data/bonds/treasuries?mg=prod/com-wsj). I have noticed that its brining in more columns than expected. Ex- Website has 312 rows of data but Alteryx is bringing in what looks like duplicate data but its not. How do I remove rows that I don't need (in this case I don't need rows after record 312). Also, how do I export this as an word. I tried using render tool but was not successful.

 

 

 

6 REPLIES 6
TheOC
16 - Nebula
16 - Nebula

hi @msve 

if you only need tthe first 312 rows of data, the sample tool is best for this:

TheOC_0-1603378985096.png


Configured with first N rows, you can specify N to be 312!

Cheers,
TheOC
Connect with me:
LinkedIn Bulien
TheOC
16 - Nebula
16 - Nebula

@msve 
I believe I got this working, as seen below:

TheOC_0-1603379122383.png


This did output to a word file, i believe its best to use a table tool first and then plug that into your render tool, I've attached the workflow!

Let me know if i can help any further 🙂

Cheers,
TheOC
Connect with me:
LinkedIn Bulien
msve
8 - Asteroid

Hi @TheOC ,

 

Thanks for your response. This is going to run daily where new rates are added to WSJ almost everyday. So, today we 312 rows of data but it could be more tomorrow. Is there a way to ignore rows after the "timestamp" column is populated?

 

msve_0-1603379270991.png

 

TheOC
16 - Nebula
16 - Nebula

Hi @msve 
Yeah ofcourse! Do you want the row with the timestamp too? Or is that part of the ones you don't want?


If you do want it, I use a multi-row formula to apply a "unneeded row" value to each timestamp after the timestamp, then a simple filter to get rid of all of those!

TheOC_0-1603380224480.png


I have attached the workflow below!

Cheers,
TheOC
Connect with me:
LinkedIn Bulien
msve
8 - Asteroid

Hi @TheOC ,

 

This is exactly what I wanted (wanted data until the row with timestamp) :).  Thank you so much for your help  😀

TheOC
16 - Nebula
16 - Nebula

no worries at all! Glad i could be a help, have a great day ahead!

Cheers,
TheOC
Connect with me:
LinkedIn Bulien
Labels
Top Solution Authors