if i connect to hive with hortonworks hive odbc versin 2.1.2.1002 (alteryx 10.6.9.21458), is there a way to get a count of how many records are in the original database so I know how long the workflow will take to run? right now, i limit the record count, just to write the logic of the code, but I wanted to get an idea of how long running the entire data set would take
thanks
becki kain
Solved! Go to Solution.
StackOverflow has some suggestions: https://stackoverflow.com/questions/21208574/hive-query-to-quickly-find-table-size-number-of-rows
-- gives all properties
show tblproperties yourTableName
-- show just the raw data size
show tblproperties yourTableName("rawDataSize")
and...
Here is the quick command
ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [noscan];
For Example,If table is partitioned
hive> ANALYZE TABLE ops_bc_log PARTITION(day) COMPUTE STATISTICS noscan;
output is
Partition logdata.ops_bc_log{day=20140523} stats: [numFiles=37, numRows=26095186, totalSize=654249957, rawDataSize=58080809507]
Partition logdata.ops_bc_log{day=20140521} stats: [numFiles=30, numRows=21363807, totalSize=564014889, rawDataSize=47556570705]
Partition logdata.ops_bc_log{day=20140524} stats: [numFiles=35, numRows=25210367, totalSize=631424507, rawDataSize=56083164109]
Partition logdata.ops_bc_log{day=20140522} stats: [numFiles=37, numRows=26295075, totalSize=657113440, rawDataSize=58496087068]
OK
Time taken: 5.252 seconds
Cheers,
Mark