Engine Works

Under the hood of Alteryx: tips, tricks and how-tos.
DavidW
Alteryx Alumni (Retired)

JHU_Dashboard_Small.png

 


The Johns Hopkins University Center for Systems Science and Engineering has released the data they use to power their 2019 Novel Coronavirus Visual Dashboard. Since I was curious about exploring that underlying data to better understand the spread of the COVID-19 virus and seeing how Alteryx might be used to analyze that data for predictive purposes, I made a workflow to do just that.

 

The data is stored in a GitHub repository at https://github.com/CSSEGISandData/COVID-19. If you're familiar with git, clone that repository somewhere on your system using your favorite git client. For those that are new to git, installing Git for Windows is an easy and free way to get started. Once that is installed, open a command prompt and navigate to a folder where you'd like to store the data. I keep all my data sets at C:\Data\. Once there, run this command to download the data from GitHub to your local system. This will create a new folder called COVID-19.

 

 

 

 

git clone https://github.com/CSSEGISandData/COVID-19.git

 

 

 

 

With that created, save the attached YXZP package to your system and double-click to install it with Alteryx Designer. Select "Yes" to continue installing the package. When the dialog below appears, change the destination directory to the COVID-19 folder created in the previous step. For example, since I save my data in C:\Data\, I'd set the destination directory field to C:\Data\COVID-19. Click Import to finish this process.

 

DavidW_0-1584569069342.png

 


The data is read from a set of CSV files in the csse_covid_19_data\csse_covid_19_daily_reports\ subdirectory. A batch macro is used to perform the import since after a certain date, latitude and longitude fields were appended to the data set. A Formula tool is then used to parse the Last Updated field, since on a different date, the format of the date and time the record was last updated changes. A Sort tool followed by two Multi-Row Formula tools are used to populate null Latitude and Longitude fields with corresponding values from matching records before them. Finally, a Select, Auto-Field, Data Cleansing, and second Sort tool are used to do some cleansing of the data. All null strings are replaced with empty strings and all null integer fields are set to 0. The few null Latitude and Longitude fields are left as-is. The workflow generates a YXDB output.

 

The final data output contains the following fields:

Name Type Description
Country/Region V_String Name of the country or region in the world where the data was reported
Province/State V_String Name of the province or state in the country where the date was reported. Can be empty if the data is reported only at the country or region level
Updated DateTime The date and time the record was last updated
Confirmed Int32 Number of confirmed COVID-19 cases reported
Deaths Int16 Number of COVID-19 cases resulting in deaths
Recovered Int32 Number of COVID-19 cases resulting in recovery
Latitude Double Latitude of centroid of the reporting area
Longitude Double Longitude of centroid the reporting area

 


To use the data with spatial tools, an appropriate replacement for the null values in the latitude and longitude fields will have to be set. Fortunately, less than 1% of the values in those fields are null.

 

The GitHub repository is updated daily by JHU CSSE. To keep your local copy up to date, open a command prompt, navigate to the directory where the data is stored (e.g., C:\Data\COVID-19\), and run the following command:

 

 

 

 

git pull

 

 

 

 

If you have questions about using the workflow, or want to share what you create or discover with the data, please share in our General Discussion forum thread.

David Wilcox
Senior Software Engineer

David Wilcox is a Senior Software Engineer at Alteryx, with experience ranging from financial analytics to game development. He is now focused on the future of data platforms, machine learning, intelligent algorithms, and visualization, and somehow finding a way to make all of that available and understandable to Alteryx users (and himself).

David Wilcox is a Senior Software Engineer at Alteryx, with experience ranging from financial analytics to game development. He is now focused on the future of data platforms, machine learning, intelligent algorithms, and visualization, and somehow finding a way to make all of that available and understandable to Alteryx users (and himself).

Comments
AndyMoncla
10 - Fireball

Thanks David!

PhilBalderson
8 - Asteroid

This is fantastic work. Particularly love the workflow runtime event!  I have created a batch macro that provides ARIMA TS Forecasts for each country based on this data if anyone's interested. 

LukeM
Moderator
Moderator
DavidW
Alteryx Alumni (Retired)

Also, loving all the comments and new ideas here. I recommend we move further discussion to the discussion forum so we can keep it all in one spot. If you haven't been there yet, even more great ideas there, too.