Hello All:
I have the attached flow that was modified to pull data from the JHU website and combine the three data sets. I have been trying to add a Summarize feature to sum the totals by country and province for Confirmed, Deaths and Recovered cases. I have not been able to get it to work and maybe one of you wonderful people could provide a solution.
The data is changing daily and it would be helpful for my purposes to add the new data daily as well.
Thank you so much!!
Bruce.
Hi @bcampbell0621, does adding a summarize tool to sum numerical value by country and province at the end of the workflow, and joining in back to the base data work? Something like the attached solution? Please let me know if I misunderstood the ask.
Hello @AbhilashR and thank you for responding:
I think we are closer to the right solution, but we need to figure out how to sum the totals for reach case type for each country. I added a mock up of what the flow should produce. I hope this helps and makes sense.
Thank you!
Hi @bcampbell0621 - please see my workflow attached. This uses Python pandas to generate a data frame from the daily report csv file on JHU's GitHub site. To get the most up to date data - you'll need to change the date in the url string in the Python tool i.e (change 04-04-2020 to 04-05-2020) to get the data that is posted tonight and so on. Hope this helps, please let me know if you have any questions, thanks.
-Kevin
Hi Kevin,
Sorry I'm just seeing this now; have been on a Tableau dashboard for weeks and am coming up for air.
If I wanted to pull all of the daily reports from the Github repository folder, i.e. January 21 - May 6, could I used the same logic as you have provided from Pandas to create a data frame and then update it daily through Alteryx? The idea would be to union all of the daily reports and and have them in one file by date, with cases by type (confirmed, active, deaths and recovered)
Does that seem possible? I hope it makes sense.
Thank you so much for your response.
Bruce.
No worries, Bruce. Yes, you could update the file daily through Alteryx and put an Output Data tool at the end of the flow as an Excel or TDE file. Changing the 'Output Options' to 'Append to Existing Sheet' or 'Append to an Extract file' will let you add the new records as you change the date. It would be a bit of a manual process but a good start for the specific columns you're looking for. You may also want to look at the timeseries data on JHU's GitHub which has daily data:
https://github.com/CSSEGISandData/COVID-19/tree/master/csse_covid_19_data/csse_covid_19_time_series
Best,
-Kevin
The time series data doesn't include active cases, but I guess that would be easy enough to calculate. Does the workflow you sent earlier only work for individual files or multiple? Trying to find a quick hit that will allow me to combine them and visualize increases and decreases.
Also, my original flow did combine the times series data, but the totals seemed to be off. Very much appreciate your help with this.
Hi Bruce, I haven't figured out a way to read the multiple csv files and merge them together. There's a stack overflow post trying to do something similar:
The flow I posted earlier is for individual csv files - you'd have to run that flow multiple times to get the combined dataset from January to May 2020. Lots of files but not sure of a easier way to do it right now.
-Kevin