Belgium

message blocking test

PatrickVandeputte
7 - Meteor

Hey all,

 

Jordan Barker sent his flow with predictive and API. I'll share next week. Plus plans for the next (2) meetings gradually take shape (keep you posted).

 

This week, we were working on flows with missing data, ie where we manually need to include missing data fields. To stop the flow running without us having added these missing data fields, we're testing "message" and the crew macro "blocking test". Anyone has an experience to share on those?

 

Best,

5 REPLIES 5
Hakimipous
10 - Fireball
Hey Patrick!

By missing data do you mean Null values and how to deal with them?

I don't really have any experience with those tools but I proceed in different approach and that might still interest you

Deleting missing data is often the default method because of its simplicity. No decisions that need to be made that might confuse the data. You just get rid of records where there are missing values.
However, you should make sure that deleting missing data doesn't have adverse effects on your analysis. For example, if a particular demographic tended to leave a response blank in a survey, then removing records with blank entries will mean that a part of the population is underrepresented.

To remove them there are many solutions. You can use a filter tool, set it to NotNull for the variables that interest you and you'll see the results in the T


A second method to deal with missing data impute a value ( it's like making them up), but it's tricky because the values you'll impute should be as close to reality as possible (Mean, Median or mode are commonly used for those cases).

One important question you need to consider in this case is "How much data is really missing?". That should give you a hint on how to proceed.
Another question is "How the missing data is distributed?". That means if you have 10 predictor variables, and most of the missing values are just in 2 of these 10, then we could deal with them on a variable by variable basis. You should also consider if these specific variables are actually significant to our analysis and model making process.
It's a bit more statistically relevant but you can check their correlation with predictive variables if you have one


Those are simple methods, but there is also more advanced methods
Multiple Imputation is off of those. It consists of using a regression of others variables and then adding in some random error.

Or you have the Full Information Maximum Likelihood. This one sounds intimidating but here the missing values aren't replaced but handled within the modeling process itself. It requires special software that can handle this methodology

you can read more about these here and here

I personally haven't used those 2 methods in Alteryx yet, I'm just aware they exist so it's just for information if you want to dig deeper.


That's how I deal with Missing data knowing my Statistics background but I'm curious to know how everyone deals with them

Hope that helps x)
 
 
 
EDIT: In case you want to remove values, this post a must https://community.alteryx.com/t5/Data-Preparation-Blending/Removing-Null-Lines/td-p/17846
 
Vadim
5 - Atom

Thanks for your reply Hakimipous!

 

Yet we are looking for a way to stop/halt a flow when a certain condition is met. The blocking test should work, but when we use !Null() as expression, the test does not return errors. It seems the CReW macro "Blocking Test" does not recognize the NOT (!) operator (yet)?

 

Another possibility would be the Message tool (OPTION 1 "When To Send Message" = Before Rows Where Expression is True; OPTION 2 "Message Type" = Error - And Stop Passing Records Through This Tool) but this only takes the first column into consideration. The rest of the table/data frame seems to be ignored.

 

At this moment we haven't found an easy way to alert Alteryx users when data input contains NULL values. (Apart from the Field Summary tool, but this is slow on large data sets and requires the user to go check whether there are values missing. Additionally, it does not stop the flow from running.)

 

If anyone knows a smart method for blocking flows and/or alerting users when data sets contain NULL values (or other conditions), please let us know!

 

Thanks,

Vadim

 

Hakimipous
10 - Fireball
KoCo
7 - Meteor
Perhaps an option to explore the bypass tool, based on a condition if count of null records, bypass the output and produce an error with the info tool? Never tried it though...
JoBa
7 - Meteor

Hey all

 

With some delay because of other priorities ...

 

Not sure if this helps, but I tend to use the standard tools "block until done" and "test" instead of the CReW macros, which I expect comes down to the same.

 

For example, in a flow doing some complex allocation logic etc, I make sure that in = out ; if not OK, then the Test tool will fail the flow before I start writing to the target table.

 

Capture.PNG