Bring your best ideas to the AI Use Case Contest! Enter to win 40 hours of expert engineering support and bring your vision to life using the powerful combination of Alteryx + AI. Learn more now, or go straight to the submission form.
Start Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Alteryx Download Tool - 400 Error

NexBK
7 - Meteor

I want to scrap a website.
(https://www.songpa.go.kr/www/index.do)

However, when I import a web page for that site from the Download Tool, a 400 Error appears.

(DownloadData)
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML 2.0//EN">
<HTML><HEAD>
<TITLE>400 Bad Request</TITLE>
</HEAD><BODY>
<H1>
400 Bad Request</H1>
</BODY></HTML>

(DownloadHeaders)
HTTP/1.1 400 Bad Request
Date: Tue, 21 Jun 2022 04:18:38 GMT
Content-Type: text/html; charset=EUC-KR
Connection: close
Content-Length: 157

Using the same URL in Python Tool with the "beautifulsoup4" and "requests" packages, page information is imported well.

How can I get the information using the Download Tool?

2 REPLIES 2
PhilipMannering
16 - Nebula
16 - Nebula

I think you have to add a User-Agent to the Header. Try the attached. Not that a lot of the data is loaded with JavaScript after the html has loaded so it might be tricky to scrape. Try [url] as well as [url2] in the Download Tool in my workflow.

 

Hope this helps,

Philip

NexBK
7 - Meteor

You're right. When I add User-Agent, the web page source is output well.
Thank you for your help.

Labels
Top Solution Authors