This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
At Vantage, we have been using some of our time over the last few weeks to contribute to the global effort to mitigate the impact of Covid-19. In this use case, we have utilized the breadth of the Alteryx platform to create a real time simulation of US hospital and critical care capacity by US County. This data and many other analysis we are performing is shared via our platform Vantage Point and is being made available to hospitals, governments, departments of health and any other organization who needs insight to determine how to allocate resources during the current pandemic.
Describe the business challenge or problem you needed to solve
Early in the pandemic, the team at Vantage began analyzing and visualizing data from COVID-19. We have also tapped into brand new data exchanges on AWS and Snowflake to begin to source varied data sources to augment basic case, recovery and fatality data. We quickly determined that one of the biggest challenges facing governments was how to control the spread of COVID-19 in a way that ensures that hospitals and critical care facilities do not get overrun which would result in a significantly larger health crisis.
In order to tackle this problem, we have sourced detailed data on cases and found several geographies which are publishing highly detailed data on hospitalizations, ICU admissions and patient demographics. We have combined this knowledge with census data to build a model that is region and county specific to estimate hospitalization and critical care admission rates using the logistic regression and scoring tools. Next we sourced data on all US hospitals, their utilization and capacity. Finally we used the geospatial tools to map counties to individual hospitals and used published COVID-19 case data from counties to determine and model the current capacity at a facility and county level.
We are hoping to share our approach and insights with others in the community who we hope will continue to iterate and make better conclusions and predictions. We are also actively collecting additional data sources to help improve our modelling and are turning our attention to other areas of interest such as global pollution levels, mobility (public transport) and inter-county and commerce indexes. If anyone has interest in collaborating on data sources please email me at email@example.com
Describe your working solution
Starting with hospital data, we have sourced hospital data in a JSON format. Using Alteryx we are able to extract this data into a list of 6,700 institutions with their staffed capacity, bed capacity and utilization rates all with geo-coordinates:
To get COVID case data, we are using Alteryx to connect directly to the John Hopkins University data and convert their time series file into an analytics ready format. Vantage uses Snowflake with Azure in this case and built a custom bulk loading macro to load data in bulk to snowflake via Azure blob storage.
Next , Vantage extracted details on cases from Italy and from the state of Florida where patient details including demographics are available. From this data, Vantage has built a model using the Alteryx logistics regression algorithm to determine probability of hospitalization and ICU admission by gender and age. Using data from Italy, Vantage was able to model the length of hospitalization stay required.
Using US census data from each county on demographics, we are able to determine for each county what is the expected hospitalization rate, and ICU rate per confirmed case.
Next onto our hospital data, we can bring in ICU beds and hospital beds for each county in the US. This represented an initial challenge as some counties in the US do not have hospitals and many do not have ICU facilities. To account for this, we used the spatial distance tool and grouped the case count and expected hospitalization rate per case for each county to the nearest facility (either ICU or normal hospital).
Armed with this data, we can now use utilization estimates for hospital capacity pre-Covid and take live case data combined with our modelling on expected hospitalization rates in each county (and surrounding counties in cases where no ICU or hospital capacity exists) to create a live model of critical care shortages and or surpluses. We have used Tableau as our output tool and created some dynamic parameters where a user can change the length of an average hospitalization stay, or change assumptions on the pre-covid bed availability to account for suspension of elective or non urgent procedures.
Describe the benefits you have achieved
As a result of publishing this study, we have had several inquiries from hospital groups, national healthcare schemes and governments (local and national) interested in extending this analysis to their specific countries or geographies. We hope that our efforts and ideas can help other citizen data scientists use their creativity to make other valuable insights that can directly translate into benefits for those who are the front line fighting this pandemic.
Alteryx has made complex analysis a reality for many of us who are not expert coders which gives me the feeling of empowerment. This use case was compiled end to end in less than one week and involved pulling data from many locations to augment a standard data set.To do this analysis without Alteryx would have required significant coding efforts in numerous platforms with extensive use of various libraries for data engineering (JSON extracts), modeling, geospatial and data manipulation.
The emerging data exchange market is one of the most exciting developments in the analytics space. This is being led by Snowflake Data Exchange and AWS marketplace where datasets are being made available (under free, paid or limited use) licensing models. These datasets are extremely rich in content and availability is growing exponentially. However the datasets are far from standard and will require data analysts to be creative and flexible in order to maximize the benefits as data comes in all forms. In this use case alone we got geoJSON, JSON, Parquet, Csv and Excel files - which were sourced from ftp sites, API's, s3 buckets, databases and emails.
Analysts who are prepared to use tools like Alteryx and become experts in bringing together these valuable but varied and disparate data sources and combine them with their own proprietary data will be able to create massive value for their organizations.