I came across something today using the In-DB tools that I had not seen before. I wrote the following logic in an In-DB filter for one of the text fields in my data set:
`TextField` NOT LIKE '%Test%'
Before this filter, my dataset had about 197K rows. After the filter, I had 117K rows for TRUE and about 2K for FALSE. Somehow this filter managed to make 78K rows magically disappear.
Out of curiosity, I update my logic to this:
`TextField` = `TextField`
Now I had 119K rows for TRUE and 0 for FALSE.
I suspected it had to do with null values and I was able to verify that the field did have 78K rows with nulls.
I was able to pass all rows through the In-DB filter by using a case statement to account for nulls but I'm curious if this is expected using the In-DB filter tool. I assumed all rows would have to be either TRUE or FALSE but in this case, null rows just don't exist anymore.
I am using Simba Spark ODBC Driver to connect to Databricks. I have not tried this on any other In-DB connections yet.