Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Why my data has a lot of nulls? What should I do with that?

rively90
8 - Asteroid

Hi,

I have a lot of nulls in: Gender, Job, and Race field. What should I do? They seem a lot. I might do something wrong here.

 

datasets:

please help. I already tried my best to eliminate them. But, they still appear. Not sure what's going on.........

8 REPLIES 8
Peachyco
11 - Bolide

I'm not sure what your workflow is meant to do, but I suspect that the last Union Tool there is not doing as you're expecting.

 

I see that you have a separate stream each for Gender, Job, and Race. To illustrate:

Alteryx - rively90 lots of null a.png

 

If you Union these three, they will stack one on top of the other, and the Gender, Job, and Race values will not align. It will look like so (creating empty cells):

Alteryx - rively90 lots of null b.png

 

If you want them to align, you might consider the Join Tool or the Join Multiple Tool.

rively90
8 - Asteroid

@Peachyco Do you mind sending the correct flow please? If we decide to use join, what's the PK and FK?

What's the join multiple tool? I'm not familiar

rively90
8 - Asteroid

Hello @Peachyco  do you know what's the solution for this one? I'm not familiar with join multiple tool

Peachyco
11 - Bolide

@rively90 

Join Multiple is used similarly to Union, except you have to tell Join Multiple what fields to use to connect the incoming tables.

 

To illustrate, let's use these three tables:

Alteryx - rively90 Join Multiple b.png

 

We can see that we can use [Name] and [DOB] as our matching fields when Joining these three tables together. So, we'll connect the tables to Join Multiple, and then configure Join Multiple to use [Name] and [DOB] for the matching.

Alteryx - rively90 Join Multiple a.png

 

Note that I have two paths outlined here. In the first one, I've configured Join Multiple to "Output Only Records that Join from All Inputs". This will give you a nicely filled dataset in the sense that all the columns are likely to have values in them, but it also discards some records. I don't know if that's acceptable for your purposes - you'll have to decide on that one.

Alteryx - rively90 Join Multiple c.png

 

In the second path, we are taking all of the records, whether they have perfect matches or not. This will require some cleaning along the way because [Name] and [DOB] will duplicate, and there will be null values for those records with no or incomplete matches, but it doesn't drop any records. Again, I don't know if that's useful for your purposes.

Alteryx - rively90 Join Multiple d.png

 

When you use Join Multiple, you have to understand your datasets enough to tell which fields/columns can be used to connect them.

rively90
8 - Asteroid

Hi @Peachyco  thank you,

In my case should I use the top one or bottom one? I'm a bit confused?

2nd one has nulls while the first one does not have nulls right? 

What's the logic behind this: I dont understand? Is it Primary key and foreign key relation?

 

image.png

This is my flow

image.png

rively90
8 - Asteroid

@Peachyco 

I think it does not work in my model as the input #2 is missing the PK from Input #1. Did I do something wrong?
image.png

Peachyco
11 - Bolide

@rively90 

I can see that you've already entered [Sex] under "Join by Specific Fields" when you know that [Sex] is not present in some of the incoming tables.

 

The Join Tool and the Join Multiple Tool operate on the principles of Primary Keys and Foreign Keys - basic principles that we as data analysts need to understand to work with data tables. You must identify the key fields of all these tables that you're bringing together, and those key fields are what you need to specify in the Join Multiple.

rively90
8 - Asteroid

@Peachyco  do you know what's the solution for this? I'm kinda of depressed seeing a lot of nulls despite all of the efforts

Labels