Hey experts!
I have two datasets, one containing data for weather types (Rain,Fog, Storm etc.) from 2016-2019 in California, and the second dataset containing wildfires in California from 2016-2019. In the second dataset (wildfires) there are 835 unique records, meaning that there were 835 wildfires in California from 2016-2019.
The weather dataset is much larger than this, since this shows the different weather activities during a day (it could be both foggy and rainy for one day, hence there are multiple duplicated dates)
df1
| Type | Date | Counties |
| Rain | 2016-01-02 | Santa Cruz |
| Rain | 2016-01-02 | Santa Cruz |
| Rain | 2016-01-02 | Santa Cruz |
| Rain | 2016-01-04 | Santa Cruz |
| ..... | | |
| Fog | 2019-02-26 | Yuba |
Contains 200,000+ records
df2
| Date* | County | Lat | Lon | UniqueId |
| 2016-07-22 | Monterey | 36.45994 | -121.89938 | 977ac7ce |
| 2016-06-23 | Kern | 35.6115 | -118.45628 | 91816ff6 |
| .... | | | | |
| 2019-06-28 | Nevada | 39.40972222 | -121.00055556 | b38c0563 |
Contains 835 unique records
*Date here is when the first fire was observed (i.e., Started)
My question is, how would you guys join these two datasets to visualize what kind of weather there was during a wildfire?
What I have done until now:
- Joined Left and Right by Date, and Counties on Left and Right
The issue with this: This gives me duplicated values (because of the duplicated dates in weather dataset), so it now looks like there are 1208 records, or 1208 wildfires (which is not correct) -> if I just take the unique values of the UniqueId column, then it only shows about 200 records
- Joined only on counties (Left and Right)
The issue with this: This obviously doesn't take the different dates into account, why it just
I hope my question isn't too broadly asked, and that you guys can come up with a "better" solution that could showcase the weather during a wildfire more precisely