Alteryx designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.

Text Analytics - Input Data VW-String Field Data Loss Due to Truncation - HELP PLEASE

Highlighted
5 - Atom

Significant data loss due to truncation from each record within the "Contents" string field.  

 

I am trying to perform text analytics on the contents of an Outlook pst. file.  I have exported the pst. into an Access database and have provided a list of the data fields below with my data loss occurring in the Contents field.  Each record with the Contents field contains the information from the body or bodies (if email string).  The size/length of each record ranges from a sentence to 20 plus pages worth of text.  I need a way to import all the text contained in each record.  I have checked the community and have already tried: auto field, select tool and increasing the data field size to 10,000,000; removing all duplicate white space, line breaks, etc.; setting the field type to string, vstring, vwstring, vwstring forced.  I have 30 gigs of emails to get through so any help is very much appreciated.  

 

Data Fields
ID
Importance
Icon
Priority
Subject
From
Message To Me
Message CC to Me
Sender Name
CC
To
Received
Message Size
Contents
Created
Modified
Subject Prefix
Has Attachments
Normalized Subject
Object Type
Content Unread
SOURCE_Email_Folder
SOURCE_Email_ID
Highlighted
ACE Emeritus
ACE Emeritus

It sounds to me like the truncation has taken place prior to loading the data into Alteryx. I'd be suspicious of the Access export in particular.

 

I would suggest that you export the PST file to a comma-delimited text format.

 

Then, instead of using Auto Field or Select to change the field size, you can specify the field length in the Input Data tool configuration (Option 7, the default is 254).

 

The trouble with using Auto Field/Select is that they are implemented after the input so if the input is being truncated (to 254 characters) you'll never fix the truncation.

Highlighted
Alteryx Certified Partner

Hi Mike,

Hope you are well! How are you inputting your data? You might be truncating the text at the very start... Could you try using:

csv file format 

\0 delimiters

2000 field length (increase this accordingly)

Hope this helps,

Sasha

Labels