Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Download, Parse, & Account for Empty Fields

hellyars
13 - Pulsar

I am running into some challenges trying to scrape HTML data.

 

Basically, I want to extract all the field+ response pairs depicted in the attached page/table image below.  (Each hull has its own page/table.)

 

I am in the process of building an iterative macro to process each URL (page), download the page HTML, and extract the table fields and responses.

 

There will not be a response to each field.  Some fields will be blank.

 

The attached workflow depicts two ways I was trying to get to the data.   The problem is I need to account for the blank responses. (The workflow includes 10 different page downloads.)

 

(Note: This is all open-source data.)

 

alteryx_sample_all_fields.pngalteryx_html_parse.png

3 REPLIES 3
dougperez
12 - Quasar

This helps you? I used multirow formula (with one Hull to test, just group by Hulls)

dougperez
12 - Quasar

I was looking into my example and I found a problem: the headers hahahah

Now I think its more accurated

Try to filter that headers into another way (i used a filter and wrote down those, assuming that is standardized)

hellyars
13 - Pulsar

@dougperez  Yep. That nails it.  Nice approach.  I will have to remember this one.  I made a slight edit.  I added a formula tool with 3 regex_replace expressions to add a ":" after Years since Launch, Years since Delivery, and Years from Commission and (with your solution) everything snapped into place.  It just worked with the first 10 entries.  I am going to try it against the first few hundred.  THANKS!

Labels