This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
The Remove Null Rows feature added to the Data Cleansing tool is really nice, however it doesn't work for a common use case for us where we have key metadata field(s) added to the data stream that make rows not null so we'd like to be able to ignore or exclude one or more fields from the Remove Null Rows output.
Here's a use case starting with an Excel file with multiple tabs where each tab holds the records for a different Province:
Note that the 2nd record in Southern is entirely empty, so this is the record that we'd like to remove using the Data Cleansing tool.
Since the Province name is only in the worksheet name (and not in the data) I'm using a Dynamic Input tool with the "Output File Name as Field" to include the worksheet name so I can parse it out later. So the output of the Dynamic Input looks like this:
With the FileName field populated the entire row is not Null and therefore the Remove Null Rows feature of the Data Cleansing tool fails to remove that record:
Therefore what we'd like is when we're using the Remove null rows feature in the Data Cleansing tool to be able to choose field(s) to ignore or exclude from that evaluation. For example in the above use case we might tick the "FileName" checkbox to exclude it and then that 2nd row in Southern would be removed from the data.
There are workarounds to use a series of other tools (for example multi-field formula + filter + select) to do this, so extending the Data Cleansing tool to support this feature is a nice to have.
I've attached the sample packaged workflow used to create this example.
Data Cleansing - Remove Null Rows filename issue.yxzp