Hi,
I have a lot of nulls in: Gender, Job, and Race field. What should I do? They seem a lot. I might do something wrong here.
datasets:
please help. I already tried my best to eliminate them. But, they still appear. Not sure what's going on.........
I'm not sure what your workflow is meant to do, but I suspect that the last Union Tool there is not doing as you're expecting.
I see that you have a separate stream each for Gender, Job, and Race. To illustrate:
If you Union these three, they will stack one on top of the other, and the Gender, Job, and Race values will not align. It will look like so (creating empty cells):
If you want them to align, you might consider the Join Tool or the Join Multiple Tool.
@Peachyco Do you mind sending the correct flow please? If we decide to use join, what's the PK and FK?
What's the join multiple tool? I'm not familiar
Hello @Peachyco do you know what's the solution for this one? I'm not familiar with join multiple tool
Join Multiple is used similarly to Union, except you have to tell Join Multiple what fields to use to connect the incoming tables.
To illustrate, let's use these three tables:
We can see that we can use [Name] and [DOB] as our matching fields when Joining these three tables together. So, we'll connect the tables to Join Multiple, and then configure Join Multiple to use [Name] and [DOB] for the matching.
Note that I have two paths outlined here. In the first one, I've configured Join Multiple to "Output Only Records that Join from All Inputs". This will give you a nicely filled dataset in the sense that all the columns are likely to have values in them, but it also discards some records. I don't know if that's acceptable for your purposes - you'll have to decide on that one.
In the second path, we are taking all of the records, whether they have perfect matches or not. This will require some cleaning along the way because [Name] and [DOB] will duplicate, and there will be null values for those records with no or incomplete matches, but it doesn't drop any records. Again, I don't know if that's useful for your purposes.
When you use Join Multiple, you have to understand your datasets enough to tell which fields/columns can be used to connect them.
Hi @Peachyco thank you,
In my case should I use the top one or bottom one? I'm a bit confused?
2nd one has nulls while the first one does not have nulls right?
What's the logic behind this: I dont understand? Is it Primary key and foreign key relation?
This is my flow
I think it does not work in my model as the input #2 is missing the PK from Input #1. Did I do something wrong?
I can see that you've already entered [Sex] under "Join by Specific Fields" when you know that [Sex] is not present in some of the incoming tables.
The Join Tool and the Join Multiple Tool operate on the principles of Primary Keys and Foreign Keys - basic principles that we as data analysts need to understand to work with data tables. You must identify the key fields of all these tables that you're bringing together, and those key fields are what you need to specify in the Join Multiple.
@Peachyco do you know what's the solution for this? I'm kinda of depressed seeing a lot of nulls despite all of the efforts