Hi everybody,
Here's my problem : I always need to check that the data files (.csv) sent by my client has a valid structure (number of columns, data type, mandatory values, max field value length, etc...) before applying my data blending workflow.
Today, this validation process is done semi-automatically with Alteryx by forcing the data types with a SELECT tool (and analyzing warnings in the log), by testing each max field length, or by using multiple formula tools for complex test.
I have a dream : A macro with 2 input :
- The data to check
- The expected structure (field type, min & max length, null value authorized)
and 1 output
- The full errors list (why not a table with only the values that does'nt match with the expected structure) with, ideally, the cause of the error
This kind of macro would be SOOOO useful !!! It's a really casual problem I encounter :)
I really don't know how to build this king of macro... could you please help me ?
Have fun !
Julien
Solved! Go to Solution.
Hi Julien:
Have you tried the Field Info tool?
Setup your "dictionaries" accordingly to what the Field Info Tool uses, and then you can compare those values.
Hope this points you to the right direction.
Hi @Julien,
Just to drop this here,
In big productionised apps, a common method that I see is to have the field validations in a text file with 3 columns, (FieldName, Nullable?, REGEX format). That is then used to compare to a Field Info tool. One of the advantages of this is that the format can be adjusted, reviewed etc without editing the actual app. The comparison could be turned into a macro, but seeing as it is only a couple of tools (Field Info, Input, Join, Formula), it can just sit in a container most times.
Kane
Hi Guys,
Thanks for your replies.
The Field Info could be very useful for my problem.
However, for instance, when I expect a double and my field only contains integers (i.e Int16), it has not to be considered as an error.
I attach a very small workflow to illustrate what I've understood from your advices... but I struggle with new problems :
Does it means I have to set a table of acceptable field types for each expected field types ?
How to manage manage mandatory fields ? Field Info indicates the max size of the field... not the min size :(
You advise me to use regular expressions but I don't see how to implement them into my expected structure. How to use/apply a regexp specified into a string field into a formula tool ?
I hope my english is good enough to be understood... Really sorry if it is not :-S
Hi julien,
Maybe you want to evaluate within a formula, if the field is "Numeric" (or any other Type). The same can be done for the mandatory-ness or not of the field:
Also, you may want to consider standarizing the case of your field names (Uppercase or lowercase for all cases).
There are a lot of things that could be tested for, attached is a macro that tests for what you have in your example so far and additionally tests for missing or unexpected fields.
The attached macro adds Record ID to the Values to Test, transposes all data fields, Formula tool to get length and test if number and test if has decimal, multi join in the Expected Structure, calculate result with expression below, keep on invalid values, select relevant fields.
Here is the formula that could be expanded upon to test for more situations:
Trim( IF IsNull([__RecordID__]) THEN 'Field Missing' ELSE '' ENDIF+ IF IsNull([FieldName]) THEN 'Extra Field' ELSE '' ENDIF+ IF [Mandatory] AND IsEmpty([Value]) THEN 'Missing|' ELSE '' ENDIF+ IF [FieldType] IN ('Double','Int32','Int64') AND ![Is Number] THEN 'Not Number' ELSE '' ENDIF+ IF [FieldType] IN ('Int32','Int64') AND [Has Decimal] THEN 'Not Integer' ELSE '' ENDIF+ IF [Length]<[MinLength] THEN 'Too Short|' ELSE '' ENDIF+ IF [Length]>[MaxLength] THEN 'Too Long|' ELSE '' ENDIF ,'|')
I also attached a modification to your workflow that uses this macro.
It's great Joe !
It works fine and it looks pretty easy to extend :)
Christmas in the middle of August !
Thanks a lot for this new toy.
P.S : I didn't see your answer before, really sorry to take so long to reply :)
We have created a new thread out of this question. Please click the link to view the discussion
Hi all !
I've upgraded the macro. In addition, it can now manage :
You can find my "DataValidation" macro here and a sample workflow here.
Edit : Here my blog post (in french sorry) to present the macro.
User | Count |
---|---|
17 | |
15 | |
15 | |
8 | |
6 |