In case you missed the announcement: The Alteryx One Fall Release is here! Learn more about the new features and capabilities here
ACT NOW: The Alteryx team will be retiring support for Community account recovery and Community email-change requests after December 31, 2025. Set up your security questions now so you can recover your account anytime, just log out and back in to get started. Learn more here
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Alteryx Download Tool - 400 Error

NexBK
7 - Meteor

I want to scrap a website.
(https://www.songpa.go.kr/www/index.do)

However, when I import a web page for that site from the Download Tool, a 400 Error appears.

(DownloadData)
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>400 Bad Request</TITLE>
</HEAD><BODY>
<H1>
400 Bad Request</H1>
</BODY></HTML>

(DownloadHeaders)
HTTP/1.1 400 Bad Request
Date: Tue, 21 Jun 2022 04:18:38 GMT
Content-Type: text/html; charset=EUC-KR
Connection: close
Content-Length: 157

Using the same URL in Python Tool with the "beautifulsoup4" and "requests" packages, page information is imported well.

How can I get the information using the Download Tool?

2 REPLIES 2
PhilipMannering
16 - Nebula
16 - Nebula

I think you have to add a User-Agent to the Header. Try the attached. Not that a lot of the data is loaded with JavaScript after the html has loaded so it might be tricky to scrape. Try [url] as well as [url2] in the Download Tool in my workflow.

 

Hope this helps,

Philip

NexBK
7 - Meteor

You're right. When I add User-Agent, the web page source is output well.
Thank you for your help.

Labels
Top Solution Authors