We are celebrating the 10-year anniversary of the Alteryx Community! Learn more and join in on the fun here.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Prevent HUGE amount of data when scraping HTML

ianjonnaCAA
8 - Asteroid

Good morning

 

We are collecting HTML data from this this web page  using a Download tool. When we run the workflow the Download tool tells us the data we are downloaded is a huge  1.1TB - which is obviously not good   😧   - but when we look at the raw data there are only about 50,000 records.

 

Has anyone had to overcome this kind of thing before? Perhaps there are some tricks to avoid creating such a data volume in the download or possibly a way to avoid it happening in the first place.

 

Here's hoping 🤞

 

Thanks 

ianjonna

12 REPLIES 12
danilang
19 - Altair
19 - Altair

@ianjonnaCAA 

 

Uncheck Return Outer XML unless you need the xxx_outer_xml for further processing.  An example of this would be if you have 2 collections within a single element and you need to parse them both. 

 

Dan 

ChrisTX
16 - Nebula
16 - Nebula

@ianjonnaCAA we are testing now to upgrade our Alteryx version.

 

For the XML Parse tool, the checkbox for "Include in Output" does not exist in Version 2020.4.5

 

But it does exist in Version 2021.4.2

 

Chris

ianjonnaCAA
8 - Asteroid

thanks Re: Salesforce Input Tool 

I will close this off now until we get the newer version rolled out!!!!

 

Many thanks

Cheers🖐

 

@danilang 

@ChrisTX 

@Ben_H 

Labels
Top Solution Authors