Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Using Blob Input to Hash at File level with a HDFS Input

R_L
7 - Meteor

Hello,

I'm wondering it it's possible to to create a way for the Blob Input to read a Entire File Path which is a HDFS file path?

 

I have a use-case whereby I want to output a file into HDFS. I also want to be able to do a checksum process against this same file, hence I thought to use the Blob Input.

However, the Blob Input doesn't seem to take kindly to HDFS as a file path. I'm wondering if it's a limitation on the tool itself or whether I just have access troubles.

 

1. Dataset has a Destination field appended, which updates the HDFS filepath to include changes to the file name, 

2. This is used in the Output node when the Destination field is used as the filepath, when outputting to HDFS.

Ideal Goal:

3. Once the above output is generated in HDFS, the second step of the Parallel Block Until Done begins. 

4. Destination field is also ingested into the Blob Input, so that I can get run a Blob Convert against the generated Blob Field.

5. End hash is then outputted against into a separate location in HDFS.

 

R_L_0-1612334964681.png

 

2 REPLIES 2
TrevorS
Alteryx Alumni (Retired)

Hello @R_L 

So I've been looking into this for you and wanted to follow up with a couple of questions to try and clarify a few things.


Are you able to share a copy of your workflow and a sample of your input data? 

Typically, the blob tool is used for images and binary data, but I have seen a post where it was used for Hashing at file level just like you are trying to do. 

 

I think it may come down to HDFS and how you are connected/ what types of files you are trying to push through Blob.

Let me know if you are still working on this issue, or if you have a solution please post it so others who experience this issue have a place to start!


Thanks,
TrevorS

Community Moderator
R_L
7 - Meteor

Hi @TrevorS ,

 

Unfortunately unable to share the workflow or a sample (sensitive nature of the data and the HDFS data connections).

Much like the post you included, I was attempting to do something similar - however I believe it's a limitation of the input path that the Blob Input will accept. In the example from the post you included in your reply, the Directory Tool is used to pull in a directory path located in in a Local Drive or a Network Attached Storage - this seems to be an acceptable format for Blob Input.

 

As you may know, the Blob Input can accept a field as the field path:

 

image.png

If I tried a directory-path which comes in a HDFS format, i.e. hdfsa:Hostname=example.uat-edge.cdp.example.au.non.c10.example.com:10000;Authenticate=true;KerbType=sspi;Tempdir=/tmp;URL=https://example.uat-edge.cdp.example.au.non.c10.example.com:10000/|||/collab/example/data/hive/data....- then it doesn't work.

 

I don't believe its a limitation with the avro file-type (it works fine when I try it outputting the data into a local drive and then running the Blob Input through it).

 

For now, I've had to opt out of using the checksum.

Labels