This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Here's my problem : I always need to check that the data files (.csv) sent by my client has a valid structure (number of columns, data type, mandatory values, max field value length, etc...) before applying my data blending workflow.
Today, this validation process is done semi-automatically with Alteryx by forcing the data types with a SELECT tool (and analyzing warnings in the log), by testing each max field length, or by using multiple formula tools for complex test.
I have a dream : A macro with 2 input :
- The data to check
- The expected structure (field type, min & max length, null value authorized)
and 1 output
- The full errors list (why not a table with only the values that does'nt match with the expected structure) with, ideally, the cause of the error
This kind of macro would be SOOOO useful !!! It's a really casual problem I encounter :)
I really don't know how to build this king of macro... could you please help me ?
In big productionised apps, a common method that I see is to have the field validations in a text file with 3 columns, (FieldName, Nullable?, REGEX format). That is then used to compare to a Field Info tool. One of the advantages of this is that the format can be adjusted, reviewed etc without editing the actual app. The comparison could be turned into a macro, but seeing as it is only a couple of tools (Field Info, Input, Join, Formula), it can just sit in a container most times.
There are a lot of things that could be tested for, attached is a macro that tests for what you have in your example so far and additionally tests for missing or unexpected fields.
The attached macro adds Record ID to the Values to Test, transposes all data fields, Formula tool to get length and test if number and test if has decimal, multi join in the Expected Structure, calculate result with expression below, keep on invalid values, select relevant fields.
Here is the formula that could be expanded upon to test for more situations:
IF IsNull([__RecordID__]) THEN 'Field Missing' ELSE '' ENDIF+
IF IsNull([FieldName]) THEN 'Extra Field' ELSE '' ENDIF+
IF [Mandatory] AND IsEmpty([Value]) THEN 'Missing|' ELSE '' ENDIF+
IF [FieldType] IN ('Double','Int32','Int64') AND ![Is Number] THEN 'Not Number' ELSE '' ENDIF+
IF [FieldType] IN ('Int32','Int64') AND [Has Decimal] THEN 'Not Integer' ELSE '' ENDIF+
IF [Length]<[MinLength] THEN 'Too Short|' ELSE '' ENDIF+
IF [Length]>[MaxLength] THEN 'Too Long|' ELSE '' ENDIF
I also attached a modification to your workflow that uses this macro.