This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
I am unsure whether this can be done using Alteryx, but perhaps one of you has a ingenious solution.
What I want to achieve is to scrape information from the below website into a readable excel table (i.e. list up all announcements in a tabular format).
It does not stop here though, at the same time I want Alteryx to download the corresponding PDF files and store these for me in a certain folder on my laptop. If the file names of these PDF files could be the concatenation of columns "Date" and "Headline", that would be perfection. However, I would be very happy already if the workflow could automatically extract all PDF files.
Using the download tool and the webpage, you can pull all of the href links from the HTML in the download data. Then you can feed these links into another download tool where it downloads the PDF files to a location.
You guys are amazing, the solution works as a charm!
Just for educational purposes, can you please explain why you used href="([^"]+)" in the RegEx tool? (especially the ([^"]+) part)
As the cherry on the cake I was hoping to incorporate the part in red in the file name of the PDFs that get extracted, but first I'd need to fully understand your great solution. If I add the description in the split, then I can parse it into a new column, and then use it in the file names.
href="/asx/statistics/displayAnnouncement.do?display=pdf&idsId=02165227"> 2020 First Quarter Sales Results