Advent of Code is back! Unwrap daily challenges to sharpen your Alteryx skills and earn badges along the way! Learn more now.
Free Trial

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

When to use the String data type rather than the V_String data type - Test & Results

pantusot
7 - Meteor

The Issue

 

Currently, there is not a clear explanation on what the real difference between String and V_String is. The documentation states, "It is more efficient to store strings as variable-length strings." So, why use the String data type?

 

You can open up the attached workflow, and follow along.

 

Test

 

A CSV file containing fixed-length, 4-digit account numbers was saved as four different YXDB files:

 

  1. With the data type V_String 255
  2. With the data type V_String 4
  3. With the data type String 255
  4. With the data type String 4

The files were then compared by file size to see the difference.

 

PowerShell script to get file sizes:

 

 

 

Get-ChildItem "*.yxdb" | ForEach-Object {
    Write-Host ([string]::Format("{0}`t{1}", $_.Length, $_.Name))
}

 

 

 

Results

 

 

Here is the PowerShell output for the file sizes:

 

 

 

53815   String 255.yxdb
30527   String 4.yxdb
49549   V_String 255.yxdb
49545   V_String 4.yxdb

 

 

 

 

The String 4 data type was the smallest file at 31 KB. The V_String data type files were both 50 KB. And the largest of the four was the String 255 data type file.

 

So, when you have a field with a fixed number of Latin-1 characters, the String data type set to the number of characters (i.e. String 10) is the most efficient. Also, limiting the max length of the V_String only has a negligible effect on the file size, so it is best to leave the max value high to avoid truncating strings.

1 REPLY 1
ChrisTX
16 - Nebula
16 - Nebula

Try your test using strings of varying lengths.

 

The String data type will reserve a static amount of space for the each occurrence of the field (each "cell"), even if the data in the field doesn't not take up all of the space.

 

Using String(10) when a field holds a maximum length of 4 characters will "waste" the storage space of 6 characters.

 

 V_String adjusts the storage space to match the actual length of each "cell".  (or maybe it adjusts to match the maximum length of all values in the field, not cell-by-cell?)

 

Use V_WString if any of your "cells" will hold unicode data (2 bytes per character).

 

Chris

Labels
Top Solution Authors