Get Inspire insights from former attendees in our AMA discussion thread on Inspire Buzz. ACEs and other community members are on call all week to answer!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Download, Parse, & Account for Empty Fields

hellyars
13 - Pulsar

I am running into some challenges trying to scrape HTML data.

 

Basically, I want to extract all the field+ response pairs depicted in the attached page/table image below.  (Each hull has its own page/table.)

 

I am in the process of building an iterative macro to process each URL (page), download the page HTML, and extract the table fields and responses.

 

There will not be a response to each field.  Some fields will be blank.

 

The attached workflow depicts two ways I was trying to get to the data.   The problem is I need to account for the blank responses. (The workflow includes 10 different page downloads.)

 

(Note: This is all open-source data.)

 

alteryx_sample_all_fields.pngalteryx_html_parse.png

3 REPLIES 3
dougperez
12 - Quasar

This helps you? I used multirow formula (with one Hull to test, just group by Hulls)

dougperez
12 - Quasar

I was looking into my example and I found a problem: the headers hahahah

Now I think its more accurated

Try to filter that headers into another way (i used a filter and wrote down those, assuming that is standardized)

hellyars
13 - Pulsar

@dougperez  Yep. That nails it.  Nice approach.  I will have to remember this one.  I made a slight edit.  I added a formula tool with 3 regex_replace expressions to add a ":" after Years since Launch, Years since Delivery, and Years from Commission and (with your solution) everything snapped into place.  It just worked with the first 10 entries.  I am going to try it against the first few hundred.  THANKS!

Labels