Alteryx Designer Desktop Discussions

Find answers, ask questions, and share expertise about Alteryx Designer Desktop and Intelligence Suite.

Joining two datasets by county and date

rxak
6 - Meteoroid

Hey experts!

I have two datasets, one containing data for weather types (Rain,Fog, Storm etc.) from 2016-2019 in California, and the second dataset containing wildfires in California from 2016-2019. In the second dataset (wildfires) there are 835 unique records, meaning that there were 835 wildfires in California from 2016-2019.

 

The weather dataset is much larger than this, since this shows the different weather activities during a day (it could be both foggy and rainy for one day, hence there are multiple duplicated dates)

 

df1 

TypeDateCounties
Rain2016-01-02Santa Cruz
Rain2016-01-02Santa Cruz
Rain2016-01-02Santa Cruz
Rain2016-01-04Santa Cruz
.....  
Fog2019-02-26Yuba

Contains 200,000+ records

 

df2

Date*CountyLatLonUniqueId
2016-07-22Monterey36.45994-121.89938977ac7ce
2016-06-23Kern35.6115-118.4562891816ff6
....    
2019-06-28Nevada39.40972222-121.00055556b38c0563

Contains 835 unique records

*Date here is when the first fire was observed (i.e., Started)

 

My question is, how would you guys join these two datasets to visualize what kind of weather there was during a wildfire?

 

What I have done until now: 

- Joined Left and Right by Date, and Counties on Left and Right

The issue with this: This gives me duplicated values (because of the duplicated dates in weather dataset), so it now looks like there are 1208 records, or 1208 wildfires (which is not correct) -> if I just take the unique values of the UniqueId column, then it only shows about 200 records

- Joined only on counties (Left and Right)

The issue with this: This obviously doesn't take the different dates into account, why it just

 

I hope my question isn't too broadly asked, and that you guys can come up with a "better" solution that could showcase the weather during a wildfire more precisely 

6 REPLIES 6
gautiergodard
13 - Pulsar

Hey @rxak 

I modified your sample data to make the example work but see attached for one way to join the two data sets in a way that doesn't cause a row explosion or loss of data.

gautiergodard_0-1668870333982.png

 

Hope this helps!

rxak
6 - Meteoroid

Hey @gautiergodard 

This is definitely a better solution than what I came up with, but this still give a misleading number of wildfires in California from 2016-2019

By applying your solution to the dataset, I come up with only 293 records (meaning there have only been 293 wildfires, which is not correct)

 

I appreciate the help, and will most likely use this if there are no better options than this :)

 

gautiergodard
13 - Pulsar

Hey @rxak 

Hard to diagnose the problem without the full dataset, but if you are getting a lower number of records I presume some of them may be getting dropped during your join?

Perhaps there are certain days or counties missing from the weather data set? 

 

If this solution ends up working for you - please accept the post as a solution !

rxak
6 - Meteoroid

@gautiergodard 

Correct, there are a few counties that are missing from the Weather dataset

If you would like to investigate further, I have attached both .csv files :)

gautiergodard
13 - Pulsar

@rxak I get the same 293 result. I suggest trying to obtain a more complete weather dataset.

rxak
6 - Meteoroid

Have tried, and that dataset was the closest I could find that have been filtered via counties 

Thanks once again for your time!

Labels