This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
on 10-18-201605:29 PM - edited on 03-08-201901:09 PM by SydneyF
Truncated data is usually defined as numeric rounding or cut off, string shortening, or datum deletion - essentially any time information is lost. Since, as analysts, our insights are only as good as our data, we usually find ourselves trying to preserve the integrity of data as we’re processing it. That’s not to say we can’t also optimize our resources usage when our data will allow for it.
While keeping your data types as small as possible is important, and can serve to shorten run times, it is even more important to understand what data types are most accommodating to your data and in what situations they can be shortened without truncation. If you’ve ever seen Office Space, you understand just how important even fractions of a cent are – and accuracy doesn’t just apply to financial data.
To avoid truncation in your data you have to first explore your data types in the context of your analyses. What data types and sizes did you receive the data in and why? What format would we like to see our results in? Are there opportunities to reduce memory usage between those two? And finally - what operations will we perform on the fields in our workflows that may impact each data type and size? The answers to these questions will be unique to each dataset, but once they’re addressed you can use the same techniques to keep your data both optimized and accurate.
Start by identifying the data types that most closely fits your fields based on the questions above. If you want Alteryx’s best guess, try using the Auto Field Tool to assign optimized data types and sizes automatically. While this tool is immensely helpful, be sure to check that the output is not truncating data or leaving it in a form less conducive to your downstream analyses – the Auto Field Tool doesn’t know the answers to your questions above. You can have the best of both worlds by adjusting the assignments from the Auto Field Tool, where necessary, by placing a Select Tool (master it here) just afterwards. You’ll then have suggestions and be able to change the less accurate/accommodating assignments by hand. Some things to consider:
If performing string operations later in your workflow that may increase their length, pay close attention to see if they are being truncated after that maximum string value is reached. String and WString (accepts Unicode characters) types are set length and will drop any characters that exceed their size. On the other hand, V_string and V_WString (accepts Unicode characters) are of variable length, and will adjust to accommodate strings after assignment.
Numerics may seem the most intuitive of the bunch, but pay close, close attention to the precision of each type so as to avoid unintentional rounding. Fixed Decimal is the only type to have an adjustable length – the rest may force your data to fit if not assigned to the correct type.
Dates are not always input in the standard ISO format yyyy-mm-dd HH:MM:SS, and may require some converting in order to handle them as Date/Time types in the Designer (this is important when trying to sort or filter by date or apply Date/Time functions in formulas). Any other date format will be interpreted as a string, and treated as such, unless converted using the DateTimeParse() function, the DateTime Tool, the Parse Dates Macro, or the fan favorite BB Date Macro. If you need to return to your original formatting, or simply prefer another, you can always convert your dates back into another format string after they’ve been processed using the DateTimeFormat() function or the DateTime Tool.