Community Spring Cleaning week is here! Join your fellow Maveryx in digging through your old posts and marking comments on them as solved. Learn more here!

General Discussions

Discuss any topics that are not product-specific here.

Johns Hopkins University COVID-19 daily data workflow

DavidW
Alteryx Alumni (Retired)

Johns Hopkins University CSSE released the data set that powers their dashboard on GitHub (https://github.com/CSSEGISandData/COVID-19). If you want to work with that data easily, I created the attached workflow and macro to import the daily data found within that repository. Install the package to the root of the folder that Git creates. This will import the daily data, parse the date fields that change formats halfway through the timeseries, update null latitude and longitude fields, and other general cleansing. With that done, you can experiment with daily worldwide COVID-19 confirmed/deaths/recovered case numbers at the country/region and province/state level, with geocoding available for about 99% of records.

 

If this workflow is useful, please let us know. If you need help or have improvements to the workflow, please share.

 

We'd also love for you to share what you create or discover by replying to this thread!

 

EDIT: The workflow has been updated to better clean and regularize the data. A lot of clean up is being now to country and state names, with merging of duplicates being done, and a locality field being parsed out of values such as "Chicago, IL". Review the new workflow for details. This should improve the quality of the output data significantly, although JHU is still working on upstream issues on their end.

David Wilcox
Senior Software Engineer
Alteryx
52 REPLIES 52
mbarone
16 - Nebula
16 - Nebula

Hi David,

JHU stopped publishing county level data on Mar 09.  Starting with Mar 10, you'll only have state level data on both the daily report csvs and time series csvs.  Just a heads up in case anyone is wondering why all the counties suddenly show "0" for all the county values as of Mar 10.  To my knowledge, that only place to get county level stats is to search and find individual state Departments of Health and see if they publish them.

ChrisK
Alteryx
Alteryx

There's a lot of cleansing needed here. Every file from 02-09 to 02-29 has a row:

    "Chicago, IL",US,2020-02-09T19:03:03,2,0,2

 

Probably a unique tool on Province/State, Country/Region, Last Update would be good.

 

DavidW
Alteryx Alumni (Retired)

A new version of the workflow that handles all of the irregularities in country, region, and locality data has been uploaded. The time portion of the updated field was removed and a unique tool added. Also, cruise ships were assigned in various ways previously, and now are under "Cruise Ship" for country, as that seemed to be the general direction JHU was heading. Keep that in mind if you'd prefer to handle the disposition of cruise ship data in a different manner.

 

The country name is now changed to use the ISO-3166-2 name. I've also added the ISO-3166-2 sub-region and intermediate region as new fields. The CSV file containing that data is included in the YXZP if you'd like to bring in additional country-related fields from that.

David Wilcox
Senior Software Engineer
Alteryx
neilgallen
12 - Quasar

probably bad form, but rather than clone the git I just went to the raw file directly with a download tool... felt like it took a step out of the process.

 

 

 

DavidW
Alteryx Alumni (Retired)

Not necessarily bad form at all. I chose git because a) I'm a developer and use to it and b) the data only updates daily, so I chose to download it only daily using that method. Thanks for sharing your workflow!

David Wilcox
Senior Software Engineer
Alteryx
kkoenig
6 - Meteoroid

@mbarone The University of Virginia's Biocomplexity Institute has county level data available on their dashboard. It's in rough shape, but nothing Alteryx can't handle. 

 

I've attached the workflow I've been using to get it into a usable format. It's nothing fancy, but it works! Any suggestions for improvement are always welcome.

 

The workflow will look for "COVID FIPS Lookup Table.yxdb" in C:/COVID_Data and will write the output to the same directory. 

 

EDIT: The first formula was subtracting 5 hours instead of adding them to match the UTC time zone in the data source. It's fixed in the attached version.

mbarone
16 - Nebula
16 - Nebula

oh this looks promising - thank you very much!!!  do you jsut download the CSV?  or have you figured out how to get a URL for it?  that's what i'm trying to do now - look in the code of the page, but i'm not very experienced in that LOL

mbarone
16 - Nebula
16 - Nebula

Thanks to Super ACE @jdunkerley79  and Super-ACE-Become-Super-AYX-Develper @Claje , I found the URL.  It is "https://nssac.bii.virginia.edu/covid-19/dashboard/data/nssac-ncov-sd-03-19-2020.csv". I suspect the date will have to be updated each day, but that's the easy part:)

kkoenig
6 - Meteoroid

@mbarone  Haha I'm not either! I had to dig and dig to figure it out for this one. The first few steps of the workflow grab the .csv data for you. The format of the URL is "https://nssac.bii.virginia.edu/covid-19/dashboard/data/nssac-ncov-sd-" + MM-DD-YYYY + ".csv"

 

They use UTC, so you'll want to include some kind of adjustment for that in your workflow. I'm on central time, so in the first formula tool of the attached workflow adds 5 hours to DateTimeNow(). 

 

P.S. I'm glad you mentioned this. When I went back to copy the URL string, I realized I was subtracting 5 hours instead of adding it. So thank you for saving me from a headache later!!

Labels