Alteryx Designer Discussions

Find answers, ask questions, and share expertise about Alteryx Designer.
Don't forget to submit your entry for the Excellence Awards by October 30! | Need more information about the program? Check out the blog here
SOLVED

Time Series Data - COVID-19

Highlighted
8 - Asteroid

Hello All:

 

I have three separate CSV files containing data about COVID-19 infections. 1. Confirmed cases 2. Deaths and 3. Recovered cases.  Each one contains country information, long and lat and a series of dates, each with a count for that day.   I have to be able to combine them with totals across the time series data.  This data is from JHU and it is clean, but it is not summed and combined.  I have been trying to create a flow, but it is not creating the correct results. I have included the data here.  

 

I tried creating a separate column for the totals, but that did not work.  I also have the advanced join tool macro, but I need to summarize the data before I can join the files.

 

I hope this makes sense.  The updates will be on-going so I will need to add new daily information as it becomes available.  

 

Thank you so much,

 

Bruce.

Highlighted
Moderator
Moderator

Hi @bcampbell0621,

 

I'm not sure if you were just looking for totals on each row or something more complicated. Attached is a workflow which will calculate the total and append it as a new column at the end of the workflow. I also took the liberty of creating a point from the lat/long. Let me know if this isn't what you were looking for...

 

Highlighted
Moderator
Moderator

Another version with the streams combined in case that's what you were looking for.

Highlighted
8 - Asteroid

Thank you so much for getting back to me.  If I am understanding each of the flows, and please forgive my limited knowledge, this flow took each of the files summed all the dates for each country and then joined it back to the original data set.  Am I correct?  

 

The intention is to have on final file that combines each of the types of infections: confirmed, deaths and recoveries and have those summed by country.  

 

Does that make better sense?

Highlighted
Moderator
Moderator

Hi Bruce,

 

Yes, each stream assigns a unique record id to each row, then transposes the data and updates the column "value" to a numeric value that is summarized for each record id. Those summaries are then joined to the original records.

Highlighted
8 - Asteroid

I guess what I'm not understanding is how we would add a new field called "type of infection" that would allow us to combine all three files into one larger file with the totals by country. 

Highlighted
Moderator
Moderator

Assuming you want something like the image below. If that isn't what you're looking for, a mock-up would be helpful.

 

summary.PNG

Highlighted
8 - Asteroid

You definitely understood where I was going.  I hope this mock up makes it clearer.  Sorry for the delay, and thank you. 

Highlighted
Moderator
Moderator

Hi Bruce,

 

Just a modification to the Formula tool's to write to a new column rather than modify the existing column. A little rearranging in the Select tool to get the columns in the order you provided. Good luck and Happy Alteryxing!

Highlighted
6 - Meteoroid

Currently Thailand has 177 cases confirmed but you're showing 2219 cases confirmed because you're calculating the sum of all the cumulative values.

 

Each value in the data represents cumulative total # of cases. 

 

Taking Thailand as an example, they had:

1/22/20: 2 cases

1/23/20: 3 cases

1/24/20: 5 cases

 

What you really want here is to calculate the difference in the days. 

 

1/22/20: 2 cases

1/23/20: 1 case

1/24/20: 2 cases

 

Attaching a workflow which transposes the dates as rows (not columns) and uses a multi-row formula to calculate the daily value as noted above (among other things).

 

Jay

Labels