General Discussions

Discuss a wide range of topics! Questions about the Alteryx Platform should be directed to the appropriate Product discussion forum.
Alteryx is here to help you solve your biggest data challenges. Read about the new Virtual Solution Center here.

Johns Hopkins University COVID-19 daily data workflow

Highlighted
5 - Atom

@shevshenko

 

Did you consider just the last day row for each country?

Because, if you sum all row per country you'll get distorted number. Example, day 1, 3 confirmed, day 2, 5 confirmed, the correct amount of confirmed is 5, but you sum will get 8 confirmed. 

Check if is not this. 

 

Highlighted
Alteryx
Alteryx

@klaus01 That's correct, the data is a running total, so the most recent day is the current total. So, @shevshenko, don't sum them, but take only the day that is relevant for your purpose (i.e., most recent day if you want a current dashboard).

 

When I upload the revised version of the workflow, I'll edit my original post to make this more clear.

David Wilcox
Senior Software Engineer
Alteryx
Highlighted
6 - Meteoroid
You are correct there are global lags. It is really immaterial as long as you are not over counting via too many duplications. Be rational.

Keith L. Penney
Data Analyst, Sr.
TEKWISSEN Contractor
Supply Chain Transformation
Space and Airborne Systems (SAS)
keith.penney-nr@raytheon.com
Desk: 972-344-9349
Mobile: 901-603-2906
Highlighted
Alteryx
Alteryx

@nickbecks and others:

 

Here's a new version of the workflow that addresses the radical change in schema that JHU introduced starting with the 22 March data file. They've cleaned up column names, which is nice, and also added a FIPS field with the numeric code for US counties, and Admin2 field with the name of an administrative region below province or state level (this is county level for the US), and added an Active field which represents the current active cases as of the Last_Updated value. This is calculated as Confirmed - Deaths - Recovered.

 

Because of the number of changes in the schema, the original batch macro is removed. Otherwise, use as you would the original workflow.

David Wilcox
Senior Software Engineer
Alteryx
Highlighted
5 - Atom

@DavidW thanks, this is great. 

 

Is there a way to pass file name as a field output from the Dynamic Input tool? I tried and failed...

 

I'm looking to use file name as date rather than last updated date. Using last updated date results in gaps when a specific region does not have a data point for a given date. For example many regions (US, Spain, Italy, Germany, France, etc) don't have a last updated date record for 3/13/20, so if you were to sum total cases on 3/13/20 you would end up with a misleading result.

Highlighted
6 - Meteoroid

 

Well, there are some gaps in the CSV files that are used as an input if you run the Alteryx Workflow, I can tell you that the files are in the folder when you use the GIT Pull command, but you will face some issues if the Column names are different that the one used as a Template. So check the column names first otherwise the file will be skipped and it will not be considered as part of your totalss

 

Column_Name.JPG

Highlighted
Alteryx
Alteryx

@shevshenko  Are you using the new workflow?

David Wilcox
Senior Software Engineer
Alteryx
Highlighted
Alteryx
Alteryx

@nickbecks Attached is a new version of the workflow that adds the FileName field from the Dynamic Input tools to the output. That should work for what you are trying to do.

David Wilcox
Senior Software Engineer
Alteryx
Highlighted
6 - Meteoroid

Yes, I did use the updated version but still got the headers warnings, I will double check but I did the manual change in the second select tool:  

 

Column_Name_Change.JPG

Highlighted
12 - Quasar

More sources are being made available, so I figured I'd contribute other options.

 

New York Times: In response to a number of requests, the New York times is releasing the data they have used for their own reporting. Details can be found on their github here. The data starts from the first reported case in Washington state as of 2020-01-21.

 

This workflow utilizes one python tool to pull the data in csv form (simpler to transform into a table using pandas), and then requires a few small modifications to include latitude and longitude where none was provided.