How are data types chosen when inputted?
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello all. I am working on a data quality app that compares the metadata of two different inputs, a reference file and the file to be tested. I'm trying to better understand how data types are chosen when, for example, excel files are inputted so I can avoid creating false negatives.
Will string columns look at the length and just set it as 'String' with a specific character length?
Same with numbers will it look at the biggest number and determine the column is Int32?
I'm thinking to avoid these potential flags in the app (where one may be Int32 and the other Int64 which could still be fine for inputting the file into a workflow) I will give the data types high level labels to compare to as compared to directly comparing the data types, but any thoughts or comments are much appreciated.
Thanks and have a good day!
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Interesting question!
I've always wondered that myself. It seems the Alteryx engine recognizes the data by patterns, which is what I initially thought was it. But then if you add an Auto Field tool to the data, sometimes the data types change too.
If you look at the Auto Field Tool documentation, it states:
"Use Auto Field to read through all of the records of an input and set the field type to the smallest possible size relative to the data contained within the column.
The tool correctly assigns a numeric field to a string data type where any record starts with zero and not a number."
So in this case, whatever that is loaded as the source is read against common data types and Alteryx fetches its closest match. For example, some dates are instantly recognized as dates, but things like August 17 1992 is recognized as a V_WString. Stemming from this, the differences between strings is the length of strings, likewise with your example on Int.
One way to mitigate your potential flag is to add dummy data and union it to your main input dataset. This will force the data type to follow whatever you declare it to be with your dummy data, thereby minimizing the issues.
Another way would be if you are uploading multiple files as inputs, then it's good to have a multifile / multisheet Batch Macro that caters to the name of your fields, then change the data type thereafter to make it consistent. This is one way to minimize your issues, though that's all I can gather for now. Would be great to hear what others know or from Alteryx directly too!
References:
- https://help.alteryx.com/20231/designer/data-types#:~:text=Alteryx%20processes%20values%20based%20on....
- https://help.alteryx.com/20231/designer/auto-field-tool
- https://community.alteryx.com/t5/Alteryx-Designer-Desktop-Discussions/Data-type-String-V-String-V-WS...
Alteryx ACE
https://www.linkedin.com/in/calvintangkw/
