We’ve extended Inspire Early Bird Pricing until March 1. Register now and enjoy 20% off conference passes and 10% off training passes. P.S. Don’t forget to bring friends! When you sign up for five or more tickets, you get an extra 20% discount on conference passes. Learn more now.

General Discussions

Discuss any topics that are not product-specific here.

Johns Hopkins University COVID-19 daily data workflow

DavidW
Alteryx Alumni (Retired)

Johns Hopkins University CSSE released the data set that powers their dashboard on GitHub (https://github.com/CSSEGISandData/COVID-19). If you want to work with that data easily, I created the attached workflow and macro to import the daily data found within that repository. Install the package to the root of the folder that Git creates. This will import the daily data, parse the date fields that change formats halfway through the timeseries, update null latitude and longitude fields, and other general cleansing. With that done, you can experiment with daily worldwide COVID-19 confirmed/deaths/recovered case numbers at the country/region and province/state level, with geocoding available for about 99% of records.

 

If this workflow is useful, please let us know. If you need help or have improvements to the workflow, please share.

 

We'd also love for you to share what you create or discover by replying to this thread!

 

EDIT: The workflow has been updated to better clean and regularize the data. A lot of clean up is being now to country and state names, with merging of duplicates being done, and a locality field being parsed out of values such as "Chicago, IL". Review the new workflow for details. This should improve the quality of the output data significantly, although JHU is still working on upstream issues on their end.

David Wilcox
Senior Software Engineer
Alteryx
52 REPLIES 52
mbarone
16 - Nebula
16 - Nebula

@kkoenig - am I seeing it correctly that county level data only lists the number of confirmed cases, not deaths or recovered?  Is that what you see as well?

kkoenig
6 - Meteoroid

That's what I see as well, unfortunately!

mbarone
16 - Nebula
16 - Nebula

Thanks for the confirmation, and hey, better than nothing!  Thanks for the post!

klpenney_Raytheon
6 - Meteoroid

Please go check out this amazing dataset.  They are doing it the old fashion way with brains and tenacity.

 

https://coronavirus.1point3acres.com/zh/faq

 

https://coronavirus.1point3acres.com/en/about

klpenney_Raytheon
6 - Meteoroid
kkoenig
6 - Meteoroid

@klpenney_Raytheon Thank you for sharing! I see that they have the death details by county as well. That's so helpful!

klpenney_Raytheon
6 - Meteoroid

I have requested data access but since it is for commercial purpose I am not sure how he will respond. Please let me/us know if you find a way in.

LukeM
Moderator
Moderator

Hi @DavidW,

 

Love this - great work.

 

Is there a reason that you haven't set it up to run the 'git pull' as an event before the workflow runs in order to pull the latest data set automatically?

 

Luke

DavidW
Alteryx Alumni (Retired)

@LukeM I didn't want to assume that people were using git to retrieve the data. In the discussion forum, another user set up a Download tool instead to retrieve the files. I certainly can make a version of the workflow that handles the `git pull` automatically.

David Wilcox
Senior Software Engineer
Alteryx
LukeM
Moderator
Moderator

@DavidW I guess I just wanted to make people aware of the ability to create a batch file to run the command line script to perform the 'git pull' and then use the Events panel to trigger the batch file before the workflow starts.

Labels