Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Knowledge Base

Definitive answers from Designer Desktop experts.

HDFS FAQ

AngelaO
Alteryx Alumni (Retired)
Created

Question

HDFS:

a) Killing the program in Alteryx, will it kill the Hadoop cluster?
b) How does distributed computing works in Alteryx?

c) How much data can Alteryx support while using HDFS?

d) Can you share macros on HDFS?

Answer

HDFS:

a) Killing the program in Alteryx, will it kill the Hadoop cluster?

No. Terminating an HDFS read or write in Alteryx will simply stop the transaction and will not impact the namenode or datanode(s) on your Hadoop cluster. Note, however, that if you were writing a file when you stop a workflow, you may have a partially-written file left on your HDFS filesystem that may need to be removed manually (for example, hdfs dfs -rm /path/to/partial_file.csv).

 

b) How does distributed computing works in Alteryx?

Alteryx can read data from HDFS and write data to HDFS. Using our In-Database ("In-DB") tools, you can take advantage of distributed processing in your Hadoop cluster via Impala, Hive or Spark. Alteryx modules themselves cannot be run in a distributed fashion.

 

c) How much data can Alteryx support while using HDFS ?

Alteryx has no pre-set limit to the amount of data that can be written to or read from HDFS. The rate-limiting factor is the amount of bandwidth available between Alteryx and your Hadoop cluster.

 

d) Can you share macros on HDFS?

No. We currently support reading and writing CSV and Avro data formats.

Comments
SPIDER61
5 - Atom

We have as client who acquired a license of Alteryx Server and Alteryx Designer.
Alteryx Server is installed in Azure on a VM.
In this case there are two problems:
1) When publishing a workflow from the designer on the server, it sends an error message, which indicates that it is possible that when executing that flow in the server some errors are obtained.
And effectively executing the workflow on the server, it sends errors. (It should be noted here that it is any flow that is published on the server, it is in the same conditions).
2) Requires connecting to Azure's Hadoop File System, and does not get it, getting a message that says:
"Invalid Host or Port":

 

Any ideas about it?