I am running into some challenges trying to scrape HTML data.
Basically, I want to extract all the field+ response pairs depicted in the attached page/table image below. (Each hull has its own page/table.)
I am in the process of building an iterative macro to process each URL (page), download the page HTML, and extract the table fields and responses.
There will not be a response to each field. Some fields will be blank.
The attached workflow depicts two ways I was trying to get to the data. The problem is I need to account for the blank responses. (The workflow includes 10 different page downloads.)
(Note: This is all open-source data.)
Solved! Go to Solution.
@dougperez Yep. That nails it. Nice approach. I will have to remember this one. I made a slight edit. I added a formula tool with 3 regex_replace expressions to add a ":" after Years since Launch, Years since Delivery, and Years from Commission and (with your solution) everything snapped into place. It just worked with the first 10 entries. I am going to try it against the first few hundred. THANKS!