Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.
SOLVED

Challenge proposal : Build a macro to validate data with an expected data structure

Julien
6 - Meteoroid

Hi everybody, 

 

Here's my problem : I always need to check that the data files (.csv) sent by my client has a valid structure (number of columns, data type, mandatory values, max field value length, etc...) before applying my data blending workflow.

 

Today, this validation process is done semi-automatically with Alteryx by forcing the data types with a SELECT tool (and analyzing warnings in the log), by testing each max field length, or by using multiple formula tools for complex test.

 

I have a dream : A macro with 2 input : 

 - The data to check

- The expected structure (field type, min & max length, null value authorized)

and 1 output

 - The full errors list (why not a table with only the values that does'nt match with the expected structure) with, ideally, the cause of the error

 

This kind of macro would be SOOOO useful !!! It's a really casual problem I encounter :)

 

I really don't know how to build this king of macro... could you please help me ?

 

Have fun !

 

Julien

9 REPLIES 9
Aguisande
15 - Aurora
15 - Aurora

Hi Julien:

Have you tried the Field Info tool?

 

Setup your "dictionaries" accordingly to what the Field Info Tool uses, and then you can compare those values.

 

Hope this points you to the right direction.

KaneG
Alteryx Alumni (Retired)

Hi @Julien,

 

Just to drop this here,

 

In big productionised apps, a common method that I see is to have the field validations in a text file with 3 columns, (FieldName, Nullable?, REGEX format). That is then used to compare to a Field Info tool. One of the advantages of this is that the format can be adjusted, reviewed etc without editing the actual app. The comparison could be turned into a macro, but seeing as it is only a couple of tools (Field Info, Input, Join, Formula), it can just sit in a container most times.

 

Kane

 

 

Julien
6 - Meteoroid

Hi Guys,

 

Thanks for your replies.

The Field Info could be very useful for my problem.

However, for instance,  when I expect a double and my field only contains integers (i.e Int16), it has not to be considered as an error.

 

I attach a very small workflow to illustrate what I've understood from your advices... but I struggle with new problems :

 

Does it means I have to set a table of acceptable field types for each expected field types ?

How to manage manage mandatory fields ? Field Info indicates the max size of the field... not the min size :(

You advise me to use regular expressions but I don't see how to implement them into my expected structure. How to use/apply a regexp specified into a string field into a formula tool ?

 

I hope my english is good enough to be understood... Really sorry if it is not :-S

 

 

Julien
6 - Meteoroid

The "famous" attached workflow :)

Aguisande
15 - Aurora
15 - Aurora

Hi julien,

Maybe you want to evaluate within a formula, if the field is "Numeric" (or any other Type). The same can be done for the mandatory-ness or not of the field:

 

isnumeric.PNG

 

Also, you may want to consider standarizing the case of your field names (Uppercase or lowercase for all cases). 

Joe_Mako
12 - Quasar

There are a lot of things that could be tested for, attached is a macro that tests for what you have in your example so far and additionally tests for missing or unexpected fields.

 

The attached macro adds Record ID to the Values to Test, transposes all data fields, Formula tool to get length and test if number and test if has decimal, multi join in the Expected Structure, calculate result with expression below, keep on invalid values, select relevant fields.

 

DataValidation.png

 

Here is the formula that could be expanded upon to test for more situations:

 

Trim(
IF IsNull([__RecordID__]) THEN 'Field Missing' ELSE '' ENDIF+
IF IsNull([FieldName]) THEN 'Extra Field' ELSE '' ENDIF+
IF [Mandatory] AND IsEmpty([Value]) THEN 'Missing|' ELSE '' ENDIF+
IF [FieldType] IN ('Double','Int32','Int64') AND ![Is Number] THEN 'Not Number' ELSE '' ENDIF+
IF [FieldType] IN ('Int32','Int64') AND [Has Decimal] THEN 'Not Integer' ELSE '' ENDIF+
IF [Length]<[MinLength] THEN 'Too Short|' ELSE '' ENDIF+
IF [Length]>[MaxLength] THEN 'Too Long|' ELSE '' ENDIF
,'|')

I also attached a modification to your workflow that uses this macro.

Julien
6 - Meteoroid

It's great Joe !

It works fine and it looks pretty easy to extend :)

Christmas in the middle of August !

Thanks a lot for this new toy. 

 

P.S : I didn't see your answer before, really sorry to take so long to reply :)

 

JessicaS
Alteryx Alumni (Retired)

We have created a new thread out of this question. Please click the link to view the discussion

Julien_B
8 - Asteroid

Hi all !

I've upgraded the macro. In addition, it can now manage :

  • specific format (via regular expression)
  • field names comparison
  • production of a detailed anomaly report (and a summary)

You can find my "DataValidation" macro here and a sample workflow here.

 

Edit : Here my blog post (in french sorry) to present the macro.

Labels