Time Series Data - COVID-19
- Subscribe to RSS Feed
- Mark Topic as New
- Mark Topic as Read
- Float this Topic for Current User
- Bookmark
- Subscribe
- Mute
- Printer Friendly Page
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hello All:
I have three separate CSV files containing data about COVID-19 infections. 1. Confirmed cases 2. Deaths and 3. Recovered cases. Each one contains country information, long and lat and a series of dates, each with a count for that day. I have to be able to combine them with totals across the time series data. This data is from JHU and it is clean, but it is not summed and combined. I have been trying to create a flow, but it is not creating the correct results. I have included the data here.
I tried creating a separate column for the totals, but that did not work. I also have the advanced join tool macro, but I need to summarize the data before I can join the files.
I hope this makes sense. The updates will be on-going so I will need to add new daily information as it becomes available.
Thank you so much,
Bruce.
Solved! Go to Solution.
- Labels:
- Preparation
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi @bcampbell0621,
I'm not sure if you were just looking for totals on each row or something more complicated. Attached is a workflow which will calculate the total and append it as a new column at the end of the workflow. I also took the liberty of creating a point from the lat/long. Let me know if this isn't what you were looking for...
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Thank you so much for getting back to me. If I am understanding each of the flows, and please forgive my limited knowledge, this flow took each of the files summed all the dates for each country and then joined it back to the original data set. Am I correct?
The intention is to have on final file that combines each of the types of infections: confirmed, deaths and recoveries and have those summed by country.
Does that make better sense?
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Hi Bruce,
Yes, each stream assigns a unique record id to each row, then transposes the data and updates the column "value" to a numeric value that is summarized for each record id. Those summaries are then joined to the original records.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
I guess what I'm not understanding is how we would add a new field called "type of infection" that would allow us to combine all three files into one larger file with the totals by country.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
You definitely understood where I was going. I hope this mock up makes it clearer. Sorry for the delay, and thank you.
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
- Mark as New
- Bookmark
- Subscribe
- Mute
- Subscribe to RSS Feed
- Permalink
- Notify Moderator
Currently Thailand has 177 cases confirmed but you're showing 2219 cases confirmed because you're calculating the sum of all the cumulative values.
Each value in the data represents cumulative total # of cases.
Taking Thailand as an example, they had:
1/22/20: 2 cases
1/23/20: 3 cases
1/24/20: 5 cases
What you really want here is to calculate the difference in the days.
1/22/20: 2 cases
1/23/20: 1 case
1/24/20: 2 cases
Attaching a workflow which transposes the dates as rows (not columns) and uses a multi-row formula to calculate the daily value as noted above (among other things).
Jay
