community
cancel
Showing results for 
Search instead for 
Did you mean: 

Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Community v19.9

The latest release includes several enhancements designed to improve your Community experience!

Learn More

HDFS CSV output file field type format

Meteoroid

I have an output file that contains more than 4M rows of data. This file is incorporated into Hadoop.

The problem I'm having is that the majority of the fields are formatted as String which automatically gets assigned a 32k field length as default.

Since this file is utiliazed as the source data for an external table I can't alter the field type within Hadoop. The table is used by Tableau and other SAS systems as well.

 

Is there a field type within Alteryx I should utilize so that Hadoop would recognize them as VarChar instead of String?

 

Alteryx
Alteryx

Are you using Alteryx to write to HDFS in a .csv format, and this file is used for the location of an External Hive Table? 

 

Can you try to save the .csv to a new location and create a new external table, defining the string columns as varchar in your CREATE TABLE Hive syntax?

 

Meteoroid

Andrew, yes, that what i'm doing. I create the table and set the field type to VarChar but remains as String.

 

On my output in Alteryx if I unselect 'Write BOM' in the options make any difference?

Alteryx
Alteryx

Can you share the following?

  1. Select Tool before writing to hdfs: See your column types and length
  2. Your syntax to create the Hive table
  3. In Hive, describe <table>

 

I just did a quick example and was able to get VarChars to work.

 

Here is my data from Alteryx I write to a hdfs csv. The file location is /DemoData_RW/ajkramer/campaign/test.csv. It is the only file in the campaign directory.

part1.PNG

 

I then create an External Hive Table

CREATE EXTERNAL TABLE results_hive (age int, duration int, balance int, marital varchar(10), education varchar(10), y varchar(10), X_no double, X_yes double) 
row format delimited fields terminated by ','
LOCATION '/DemoData_RW/ajkramer/campaign' tblproperties ("skip.header.line.count"="1");

My variables defined as varchar are still varchar

describe results_hive;
age                     int
duration                int
balance                 int
marital                 varchar(10)
education               varchar(10)
y                       varchar(10)
x_no                    double
x_yes                   double

Let me know what you are seeing on your end.

 

Thanks,

Andrew

Labels