BACKGROUND
Like so many others, I’ve been looking at COVID19 data from many different angles. I’ve done this as part of my job, for personal discovery and analysis, and out of just plain old curiosity. I hear a lot about infection rates, confirmed cases, hospitalizations, and just straight up deaths. I wanted to investigate those deaths a bit more. Specifically, I was interested in metropolitan areas.
Interestingly enough, around this same time, I read an article from the CBC, where Ashleigh Tuite, an epidemiologist from the University of Toronto said “I think per capita deaths are a proxy for the extent of disease activity”. That got me to thinking – I had never seen any reporting on deaths per capita for COVID19, nor for deaths by metropolitan areas. I thought it would be interesting to combine the two and see what I get. Even better, I wanted to put it on a map. And so, my exploration began.
I started with the most popular and pervasive public source for COVID19 data, John Hopkins University. Everyone by now is probably familiar with seeing this dashboard on the evening news. Luckily, JHU makes the actual data files behind their dashboard available here in GitHub. And just as luckily, I have Alteryx and their Location + Business Insights datasets, which includes current year population estimates for various geographical areas, including U.S. counties and Metropolitan Core-Based Statistical Areas (Metro CBSAs). I often like Metro CBSAs for metrics, because they tend to allow for looking at data across different population sizes, but in a balanced way (minimum of 50K people each, plus any adjacent territory with integrated commuting ties).
Using the JHU data, and the Alteryx data package, I was on my way.
THE WORKFLOWS
I’m going to layout the process I created into separate workflows in order to “step things out”. I think it makes it easier to understand when you can digest a few (relatively) smaller pieces than one giant piece. But one could combine these into either one large workflow, or use the CReW macros to process them in sequence. And finally, you can use Alteryx Server to schedule the job(s) so you can “set it and forget it”!
GEOGRAPHICAL DATA
The JHU data contains U.S. county level data. Recall that I’m interested in looking at deaths per capita by metro area. The Alteryx Allocate dataset (part of the Location + Business Insights datasets) contains metro level and county level details, including which counties belong to which metro area. I can join the JHU data to the Allocate counties, and then aggregate the Allocate counties up to the Allocate metro areas.
First, I created data files for the Allocate datasets I’ll be using – State, County, and Metro. I used a batch macro to loop through the metros, because some metros crossover multiple states. For these, I’d like the metro to be listed separately in each state it belongs to. I also needed to do some clean-up and standardization of the names. The workflow to do this is attached and labeled “Allocate_CreateGeos.yxmd”. Note that you will need the Allocate dataset to see the input configurations.
PREP FOR MAPPING
The next workflow pulls down the most recent JHU data from their GitHub site. It preps the data and joins it to the Allocate datasets in order to put everything on a map. You’ll notice that the JHU county names don’t always match exactly to the Allocate county names (a frequent occurrence when matching text fields). So, I have a secondary match that relies on spatial matching (yet another part of the Location + Business Insights dataset) to join those unmatched-by-name JHU counties to an Allocate county.
What I do for these unmatched-by-name records is create a centroid based on the lat/lon listed on the JHU record, and then look for that centroid within the Allocate county spatial polygon. After doing this, all the unmatched-by-name counties are accounted for (and I use the county name from the Allocate dataset instead of the JHU dataset). The final outputs for this workflow is a file with the JHU county COVID information, and a separate file to be used for mapping. Also, note that some counties don’t roll up to a metro area. For these, I combine them into their own geographical representation and spatial object called “Non-Metros”. That way they too are represented on the map, but as kind of an “all others” category.
The workflow for this is also attached and is labeled “DPC_AND_MAP_PREP.yxmd”.
FINAL REPORTS
What I did here was create another batch macro, and the result of each iteration results in a heat-map by state, where the deaths per capita (by metro area) is the heat-map theme. The formula for deaths per capita was simply (# of Deaths / Population) x 100,000. I made use of the Map Legend Splitter and Map Legend Builder along with some formulas to get the map display exactly how I wanted it. I then put that macro into a workflow that read in each state. The result was one PDF for each state. I also made use of the Text Reporting tool to create URL links. Attached is the batch macro for this (“DPC_MAP.yxmc”), and the workflow the macro is used in (“DPC_MAP_BATCH.yxmd”). I've reached the allowed limit of 5 attachments, so check the comments for a few examples of the state PDFs (Illinois, Louisiana, and New York).
And again, you can incorporate all the pieces into one larger workflow or sequential workflows, make use of Scheduler, and have these reports updated daily, with no intervention once the initial build is complete. And now we can look at each metropolitan area, of each state, and compare the deaths per capita in that state across their metro areas, which is what I set out to do. Just one more example of how Alteryx puts these types of analyses in the hands of virtually anyone willing to learn!