General Discussions

Discuss any topics that are not product-specific here.

Johns Hopkins University COVID-19 daily data workflow

DavidW
Alteryx Alumni (Retired)

Johns Hopkins University CSSE released the data set that powers their dashboard on GitHub (https://github.com/CSSEGISandData/COVID-19). If you want to work with that data easily, I created the attached workflow and macro to import the daily data found within that repository. Install the package to the root of the folder that Git creates. This will import the daily data, parse the date fields that change formats halfway through the timeseries, update null latitude and longitude fields, and other general cleansing. With that done, you can experiment with daily worldwide COVID-19 confirmed/deaths/recovered case numbers at the country/region and province/state level, with geocoding available for about 99% of records.

 

If this workflow is useful, please let us know. If you need help or have improvements to the workflow, please share.

 

We'd also love for you to share what you create or discover by replying to this thread!

 

EDIT: The workflow has been updated to better clean and regularize the data. A lot of clean up is being now to country and state names, with merging of duplicates being done, and a locality field being parsed out of values such as "Chicago, IL". Review the new workflow for details. This should improve the quality of the output data significantly, although JHU is still working on upstream issues on their end.

David Wilcox
Senior Software Engineer
Alteryx
52 REPLIES 52
sean_bolte_dup_544
8 - Asteroid

Already put this to use, and feeding a Tableau dashboard. Same comment as Luke on this, and I just created the cmd line prior to module kick-off for obtaining the refresh. The "git pull" is a new methodology, so I thank you for highlighting another option for obtaining data from the web!!

klaus01
5 - Atom

Hi @DavidW,

 

Thanks for sharing the knowledge. 

 

I'm curious about @LukeM comment. May I ask how to create a event to run the 'git pull', like @LukeM suggested? 

 

Thanks!

LukeM
Moderator
Moderator

AUTOMATE GIT PULL IN ALTERYX BEFORE WORKFLOW RUNS

You need to create a batch file (.bat) which can trigger the command line script as part of your Alteryx workflow. This can then be set up as a Event to be triggered before the workflow.

 

Steps (these are a continuation of the process outlined above)

1. Create a text file in notepad which contains the following script:

 

 

cd \Data\COVID-19
git pull

 

 

This changes the directory and then runs the 'git pull' command.

 

2. Save this file with the extension .bat rather than .txt to create the batch file

 

3. In your 'Daily_Import.yxmd' workflow head to the 'Events' tab of the workflow configuration panel.

 

4. Click 'Add..' and select 'Run Command' and then locate your newly created batch file and configure it to trigger "Before run" as shown:

 

Capture.JPG

 

@klaus01 @sean_bolte_dup_544 @DavidW

DavidW
Alteryx Alumni (Retired)

@LukeM @klaus01 You can simplify the event process even further. Since the workflow should be placed in the same directory as the data, there's no need to have a batch file that changes to that directory - you're already there. In @LukeM's instructions, skip to step3 and in step 4, change the command to git with the command arguments set to pull.

 

You will only need the batch file if you have the workflow placed somewhere other than the data directory, in which case you would also need to modify several tool parameters to absolute file references, rather than relative references anyway.

David Wilcox
Senior Software Engineer
Alteryx
klaus01
5 - Atom

@LukeM

Thank you! It works perfectly.

 

@DavidW 

I didn't leave the workflow in the same directory as the data, for fear of interfering in some process when updating the data through git pull. 

nickbecks
6 - Meteoroid

As of 3-23-20 the format of the daily reports has changed and thus data isn't making it thru the workflow. Wish I could figure out how to fix it, but my skills are still noob.

DavidW
Alteryx Alumni (Retired)

@nickbecks I'm working on an updated workflow now. Thanks for the heads up!

David Wilcox
Senior Software Engineer
Alteryx
DavidW
Alteryx Alumni (Retired)

Thanks to @nickbecks for notifying me that the data format changed on 23 March. I'm working on an updated version of the workflow that will handle the changes. JHU likes to keep us on our toes with their schema changes.

David Wilcox
Senior Software Engineer
Alteryx
klaus01
5 - Atom

@nickbecks

 

They changed date format, added same countries with different names, included columns. I treated the new file (23/03/2020) separately, and then overwrite on the "\csse_covid_19_daily_reports" folder.

I tried to treat on the first formula, but I was not able. So I took the detour above.

shevshenko
7 - Meteor

All the data Extract and the Alteryx Workflow worked perfectly, my only question is if anyone has worked with the Data? I have sliced it and built a TDE file to read in Tableau but for some reason the Confirmed cases around the globe are way higher that the numbers reported in the dashboard , Not sure if I am doing something wrong but for example US as of today is showing arond 53k confirmed cases vs the 120k confirmed cases that I am getting with the extracted data. I attached a Viz that uses a TDE generated through Alteryx. Any thougths?

 

Dashboard-COVID19.jpg

Labels