Good morning
We are collecting HTML data from this this web page using a Download tool. When we run the workflow the Download tool tells us the data we are downloaded is a huge 1.1TB - which is obviously not good 😧 - but when we look at the raw data there are only about 50,000 records.
Has anyone had to overcome this kind of thing before? Perhaps there are some tricks to avoid creating such a data volume in the download or possibly a way to avoid it happening in the first place.
Here's hoping 🤞
Thanks
ianjonna
Solved! Go to Solution.
Hi @ianjonnaCAA,
I have absolutely no idea how you've got to 1.1TB!
Do you have an example of your workflow to share?
When I try it I get about 1.3MB!
Regards,
Ben
Hi Ben
Most grateful for you getting back to me. Having seen your workflow it looks like the way I configured the tool matches but ................ looks like my issue actually originates in a following xml parse tool instead. A copy of my workflow is attached if you have any ideas?
Thanks again
Cheers
Hi @ianjonnaCAA
Ah I see!, It's due to the "Include in Output" checkbox.
With it turned on you've ended up with the full download data duplicated 100,000+ times, hence the massive size.
Regards,
Ben
thanks @Ben_H , will look now 😀
Hi @ianjonnaCAA
If you need more examples check out, Weekly Challenge-116 A Symphony of Parsing Tools. It's input is a large, complex XML file. The file is large enough that using a straight linear expansion, causes your memory use to explode to the point where the workflow would take days to run. The submitted solutions demonstrate how to parse the required elements and also remove the resulting xxx_Outer_XML fields to keep your memory use in check.
Dan
Brilliant. thanks @danilang😊
@ben - The checkbox does appear for me in the config panel.
I am on 2021.3 - is is possible that the checkbox is in a more recent version
(nb - i can't upgrade, not yet scheduled by the organisation 😟)
Hi @ianjonnaCAA
The Return Outer XML check box has been there since I've been using Alteryx(2018). And according to the Help documentation it was there in 2021.3 as well. Can you post a screen shot?
Also to reduce memory use a Select tool after each XML Parse to deselect any xml fields that you no longer need.
Dan
here's the screen shot:
Am I doing something wrong here?
 
					
				
				
			
		
