Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

cURL commands for HDFS

HenrietteH
Alteryx Alumni (Retired)
Created

How-To

Troubleshoot cURL commands for HDFS connections This article contains a list of cURL commands that match those that Alteryx sends when connecting to HDFS. These commands can be use to troubleshoot connection issues to HDFS (including in-DB Hive and Impala when using the HDFS Write option) outside of Alteryx.

See "How does the HDFS option work for Hive In-DB write?" for more details on how HDFS and Hive/Impala interact when using this option in the In-DB connection. See the webHDFS REST API documentation for more details on commands. 
 

Prerequisite:

Download and install cURL: https://curl.se/windows/


Procedure


Connecting to HDFS

These commands establish a connection and list all files available in the folder specified. This mirrors the commands we use to populate the input tool. See HDFS Connection - What happens on the backend?  for more details on how the connection works. 

Use the --verbose option to generate a verbose output with more detail.  
Note: the command always contains /webhdfs/v1/ even if the user is using HTTPFS. 


Connecting with a username

curl --verbose  "http://<host>:<port>/webhdfs/v1/<folderpath>?op=LISTSTATUS&user.name=<user>"
Example output (success):
The top part is establishing the connection. The bottom part lists the directories and/or files contained in the directory specified in the path. In this example, we are using the root, so it is showing the folders located in the root directory. 


Example error (connection refused):


Connecting using MIT Kerberos

curl -i -L -verbose --negotiate -u : "http://dncshwmk01.extendthereach.com:50070/webhdfs/v1/?op=LISTSTATUS"


Connecting using SSPI Kerberos

curl -i -verbose --negotiate -u : 
"http://den-it-cdh-demo:50070/webhdfs/v1/?op=LISTSTATUS"


Example output (success):


Connecting using SSL and username: 

curl --verbose "https://<host>:<port>/webhdfs/v1/<folderpath>?op=LISTSTATUS&user.name=<user>"
Example output (success): 
You can see the SSL handshake at the beginning. 

 


Reading from HDFS

These commands read a file from HDFS. 
Reading from HDFS happens in a multi step process. Steps 2 and 3 happen automatically. 
  1. Submit a GET request with automatically following redirects
    curl -i -L "http://<host>:<port>/webhdfs/v1/<filepath>?op=OPEN"
  2. The request is redirected to a datanode where the file can be read
  3. The client follows the redirect to the datanode and receives the file data
Example output (success): 

 


Writing to HDFS

These commands write a file to HDFS. 
The actual command used by Alteryx does not use the local file option which is needed when testing outside of Alteryx.
Writing to HDFS happens in a multi step process. Steps 1 and 3 are done by the client (Alteryx), Step 2 is the server response. 
  1. A PUT request is submitted to the namenode but without submitting any actual data
    curl -i -X PUT "http://<host>:<port>/webhdfs/v1/<filepath>?op=CREATE&user.name=<user>"
  2. The Hadoop cluster returns a 307 Temporary redirect to the datanode where the file is to be written
  3. The client (Alteryx) submits another PUT request using the URL in the Location header, this time with the file to be written
    curl -i -X PUT -T <local file> "<location header from redirect>"
Example output (success): 

Additional Information

cURL: https://ec.haxx.se/ webHDFS REST API: https://hadoop.apache.org/docs/r1.0.4/webhdfs.html