Hi,
How can you web scrape an academic calendar from a website? I am trying to output the bold letters into a usable data table.
Here is the website: https://www.umassglobal.edu/news-and-events/academic-calendar
Here is what I want to extract from the website and into this format.
Summer Session I – 2024 – April 29, 2024 – June 23, 2024 |
Summer Session II– 2024 – June 24, 2024 – August 18, 2024 |
Fall Session I – 2024 – August 26, 2024 – October 20, 2024 |
Fall Session II – 2024 – October 21, 2024 – December 15, 2024 |
Spring Session I – 2025 – January 6, 2025 – March 2, 2025 |
Spring Session II – 2025 – March 3, 2025 – April 27, 2025 |
Summer Session I – 2025 – April 28, 2025 – June 22, 2025 |
Summer Session II– 2025 – June 23, 2025 – August 17, 2025 |
Best,
Bryan
There's always a "what's it worth" that you should think about when do any work like this.
As a consultant I would say .... you've got the table already (because you posted it) and it won't change (unless you want to pick up 2026 dates), but the manual effort to do this is minimal and the effort to get a workflow running and perfect is a lot more. Usually it's the other way around the the effort to get the workflow running and execute to get different results will far outweigh the manual effort.
As a learning exercise (as in, who cares I want to write the workflow anyway) I'd use the Download tool and them process the HTML. You want to extract all the H3 formatted text and then further process that to only get the H3's that you want.
@bryanmac_92 take a look at the workflow attached and let me know how you get on
@bryanmac_92
find the workflow attached
mark done if solved.