Happy 8th birthday to the Maveryx Community! Take a walk down memory lane in our birthday blog, and don't miss out on the awesome birthday present that all Maveryx Community members get to take advantage of!
The Smart Cities movement has at its core a determination to democratize data. Cities operate in complex environments where they are tasked with addressing problems that typically are passed over by the private sector. The use of innovative means to collect data—sensors and satellites, for example—has captured the imagination of city administrators; but, having data does not necessarily lead to creating the knowledge necessary to make smart decisions about how to allocate scarce resources. Our project employs several of the core functionalities embedded in Alteryx to harness the power of millions of disparate data points to make good inferences. This year we began using Alteryx to teach students data analytics and to use open data to help move the needle on an intractable problem for many cities: abandoned housing and blighted neighborhoods.
Kansas City, like many other cities in the industrialized northern states, has faced decades of urban disinvestment and decline. The City of Kansas City (KCMO) has targeted 10,000 houses for demolition as they are abandoned, severely blighted, and a drag on the communities in which they are located. But, as urban planners have documented in cities such as Detroit, demolishing houses does not always lead to better conditions for neighbors. A second option used by KCMO was the creation of a land bank, where abandoned properties are warehoused and then transferred to private, public, or nonprofit entities to increase access to affordable housing and to eventually move abandoned houses back to productive use. The Land Bank of Kansas City, Missouri (LBKC) approached the project team to discuss how to better use data analytics and the KCMO Open Data portal to help select the properties that are most likely to be acquired by potential home buyers or home rehabbers who would bring them up to code and thus return them to productive use with a secondary impact of exerting a positive influence on the neighborhood. After exploring how to approach the massive data involved in such an undertaking, it became clear that Alteryx was an ideal tool for us since we were management school professors rather than data scientists and thus needed a solution with a moderate learning curve. After acquiring a license under the Alteryx for Good Program in 2016, we began to explore how we could use it to help LBKC’s public leaders make the best use of very limited resources as part of our commitment to community service.
As an added bonus, as teachers we also saw an opportunity to help our students learn to use Alteryx to harness the potential of big data. This year we were able to integrate Alteryx into some of courses and also work with students interested in urban issues through independent studies.
Describe the business challenge or problem you needed to solve
As a prelude, the key driver of this project is community service to help address complex urban issues in the Kansas City area. Given the pervasive nature of these problems across the country and the growth of Open Data upon which this project is based, we are also interested in scaling the project and extending the results to other markets to test their robustness. We are also interested in introducing more analytics into our respective curricula and thus see this project as a catalyst to advancing our analytical skills in general, and our Alteryx acumen in particular.
Abandoned to Vibrant (A2V) is an informal consortium of volunteers led by full-time academics (Jim DeLisle and Brent Never) and a technologist (Ron House) who is associated with the Code for America Brigade of Kansas City. The efforts benefited from input from several stakeholders including the KCMO Open Data team, the Land Bank of Kansas City, Legal Aid of Western Missouri, and neighborhood groups. The common problem bringing the consortium together was the desire to reverse urban blight, help revitalize neighborhoods, strengthen the local tax base, and provide access to affordable housing. The primary strategy was to use data and spatial analytics to help provide a foundation for decision-support that could be used by policy-makers, planners, and those implementing intervention programs. As such, one of the key elements of the research focused on processing the raw data into a meaningful data warehouse that could be used to support spatial and predictive analytics that benefited from the precision of the actual data points; both the raw and calculated variables.
A secondary strategy was to turn data into meaningful information which could empower individuals, households, and others interested in investing, acquiring, or disposing of individual properties located within the targeted area. The team used the calculated variables in the data warehouse to create a public-facing website that would provide data that potential buyers would need in purchasing highly-distressed properties.
Over the past 18 months or so, we have found Alteryx to be an indispensable tool when using public-sourced open data at the very granular level necessary to generate insights into the dynamics that affect urban blight and housing abandonment. Even with the power of Alteryx, we found the process of cleaning, blending, and creating inferences from large datasets that had not been subject to intense scrutiny was both very challenging and time intensive. Without the power and integrated functionality of Alteryx, we could not have been able to build the data warehouse we have assembled. This is particularly true since neither DeLisle nor Never were trained in information sciences. Clearly, Alteryx’s flexibility and relative ease-of-use (compared to other options) allowed us to make contributions to the Land Bank in the early phases of our analysis. Over time, these contributions have increased as we began to master some of the power of Alteryx and expanded both the breadth and depth of our data far beyond what we initially envisioned but what we began to discover would enable us to develop a better understanding of the complex issues we were exploring. This has, in turn, allowed us to involve our students who are not technologists begin to explore how the analysis of public datasets can help address urban issues related to blight, abandoned housing.
Describe your working solution
We made extensive use of Alteryx to explore, scrub, correct, code, classify, and assign to spatial objects. Some of the more common tools included: Preparation (e.g., Data Cleansing, Filter, Formula, Multi-Field Binning, Record ID, Select, Sort Tile), Join (e.g., Join, Join Multiple, Union, Find/Replace and some Fuzzy Match), Parse (e.g., Text to Columns for dates), Transform (e.g., Cross Tab, Summarize), Document (e.g., Comment, Tool Container), Spatial (e.g., Create Points, Find Nearest, Spatial Match), Data Investigation (e.g., Distribution Analysis, Frequency Table, Histogram, Pearson Correlation), and some exploratory Predictive (e.g., Linear Regression, Stepwise).
Each of the workflows highlighted below are repeatable, with data updates from Open Data and other sources. However, due to the nature and source of the data, additional data scrubbing and preparation is necessary to ensure the integrity of the underlying time-series. We are exporting some of our output to GeoDa, Tableau, and SPSS.
In general, the bulk of the raw data deployed in this project were obtained from the Kansas City MO Open Data portal. Using the APIs developed by the city, the data were extracted and fed into Alteryx for processing. The Kansas City data were supplemented by public data made available by Jackson County MO, as well as proprietary data on housing foreclosures purchased to supplement the analysis.
Exhibit 1 presents a snapshot of the data that were assembled into a data warehouse. As noted, the depth and breadth of data reflected the holistic approach adopted by the primary researchers, as well as their recognition that the data could be used to address urban issues from several disciplinary perspectives (e.g., real estate, policy, planning, health, crime).
The data pulled from the Open Data portal, in general, represented a tremendous challenge for many reasons: (1) in several cases, city officials did not have common codes necessitating using our inference to create those codes; (2) when codes existed, the lack of drop-down menus that would ameliorate errors in code entry; and, (3) the use of addresses as opposed to County APN numbers to identify addresses leading to innumerable spelling errors. Several of the Alteryx tools allowed the team to successfully clean the data, which then could be fed back to the Open Data portal to facilitate future use by other stakeholders.
Exhibit 1: Data Warehouse
The data warehouse constructed as part of this on-going project contained a mixture of data provided from the Open Data Portal, proprietary data, and calculated fields. As noted in Exhibit 1, these data sources provided 937 independent variables. Overall, some 11.2 million individual records were explored which, in most cases the data consisted of time-series spanning from 2010-2017. To provide some indication of how the variables were selected, it is useful to explore each of the datasets and the role that they may play in the project.
The surrounding environs are important indicators of the nature and quality of urban places. The environs are comprised of several variables, some of which are physical (e.g., address, land use mix, amenities, vacant/improved status) and some are dynamic (e.g., absentee ownership, property tax arrears, average market values). While the former was fairly easy to analyze and cross-tab, the latter involved more complicated data development. For example, the determination of absentee owner status was based on a comparison of addresses which consisted of the official parcel address vs. the mailing address for the tax bill. Since the addresses were fed from different sources and did not comply with standards and in many cases had improper suffixes), Alteryx was used to maximize matches. This process helped generate a master address file which contained original and corrected addresses, along with parcel ID numbers which are more stable.
In this workflow, the open data were sourced, explored, scrubbed, and revised/corrected to create a stable base. In addition to the raw data, several variables were calculated.
Exhibit 2: Land Use Workflow
In many markets, 311 Call systems have been established to improve access to services for residents, as well as to log complaints related to properties, events, or other activities that are of concern to neighbors and other interested parties. In many jurisdictions, 311 calls are used as a means of co-production of information to supplement the efforts of staff in identifying issues across a broad urban area. As such, the sheer volume of 311 calls may be a positive indicator of community engagement and self-help, or a negative indicator of nuisances or other events that can cause a degradation of a property or area.
In Kansas City, individual 311 call data are available at a parcel level through the Open Data portal. These records go back in time, although they are reported on an annual basis rather than as an integrated time series. In addition to items related to the incident location by address and reporting time, 311 calls are classified into 356 categories, so they can be tracked in terms of trends and then can be routed to the proper governmental agency. Over time, the classification system changed, making it difficult to analyze trends (e.g., in 2015, Graffiti calls were reclassified from a general category to more precise sub-categories). This change was not documented and was discovered through data exploration processes. To reconstruct the data into a meaningful time series, the changes were remapped to a new classification system. This system reduced 356 categories to a more meaningful number. Based on a combination of theory and expert input, the 311 Call data were filtered to remove service and information requests (such as calls for bulky trash pick-up), and then were mapped into two key categories: Visible from the Street, and Not Visible from the Street. These major categories recognized that the perception of the quality of the environs surrounding a property or on a block or other geographic area would be affected by items that could be observed (e.g., broken windows, boarded up entries, roof leaks) and thus would have a more direct impact on perceptions and thus attitudes. Exhibit 4a presents the major subcategories that were explored:
Exhibit 4: 311 Calls
Exhibit 4b: 311 Call Workflow
In Kansas City MO, property violations can be generated from a number of sources, including the 311 Call system which compiles complaints from citizens and then routes them to the appropriate department. Property Violations are also generated from public officials, including property inspectors who are assigned to various areas of the city. At a detailed level, property violations which are reported at a parcel level are classified into 472 categories for tracking and analysis. Using a combination of theory, literature, and expert recommendations, these categories were clustered into several subgroups which have some common elements and may systematically affect the market. In general, property violations fall into two major chapters of the city code: Chapter 48 which focuses on nuisances, and Chapter 56 which addresses compliance and safety issues. As with 311 calls, property violations can affect the perception of the condition of a block or neighborhood. Thus, the data were also grouped into two clusters: Visible from the Street, and Not Visible from the Street and the following major categories:
Exhibit 5a: Property Violation Categories
Exhibit 5b: Property Violation Workflow
In many urban centers, safety and security are key concerns of residents, businesses, and visitors. The Kansas City Open Data portal provides access to Crime data provided by the Police Department. To provide some level of confidentiality, the location of crime incidents is reported at the 100-block level, or at street intersections. Thus, it cannot be matched to individual parcels as can many of the other open data records. However, using Alteryx the addresses of the properties in the other datasets could be rounded to the same block level, making it possible to match disparate data sources and obtain a better understanding of the overall environs related to individual city blocks as well as higher levels of aggregation. The individual crime data are released periodically throughout the year and were assembled into annual datasets at the end of each year. As such, they were assembled into a time-series. While generally standard, for two of the years within the temporal frame, the crime data had a slightly different structure (i.e., 22 fields vs. 24) and thus had to be processed to provide a consistent time-series.
As with other indicators, the types of crime and their impact on the safety and security of a property or area vary by the type of crime. To track crime rates across the country, the Federal Bureau of Investigation (FBI) has created a standardized coding system (i.e., IBRS codes). Many local jurisdictions include IBRS codes when reporting criminal incidents, although some types of crime do not fit into the coding system and are recorded under a separate system. In addition to resolving differences in coding systems, an exploration of crime data revealed that dramatically different counts would be generated depending on how incidents were reported. For example, an individual event could result in 15 reports: five assailants committing three separate types of crime each. After exploring various treatments, the data were filtered by unique crime report and IRBS codes for type of crime. This reduced the potential 15 incidents to three and provided a more meaningful measure of the nature and severity of crime that occurred.
Exhibit 6a: Selected IBRS Crime Categories
Exhibit 6b: Crime Workflow
As with other externalities, building permits can provides insights into negative or positive dimension of a neighborhood or area. On the negative side, permits for demolitions or the lack of permits for additions or new construction can indicate neighborhood decline or stagnation. On the positive side, building permits for new construction, renovations, and additions can be a sign of a healthy neighborhood with normal development and reinvestment, or of a neighborhood undergoing renewal. Thus, building permits are an important indicator of the life cycle or succession forces operating in a neighborhood that can have impacts on surrounding properties or contiguous areas.
Exhibit 7: Building Permit Workflow
As with building permit activity, foreclosures can provide insights into the life cycle phase of a neighborhood or area of interest. This is especially the case with Land Bank Properties which often go through foreclosure prior to being abandoned by owners or their successors. While it is possible to access foreclosure records through public sources, in Kansas City MO the data are not available through the Open Data Portal. Since compiling foreclosure records from individual cases was prohibitive, the foreclosure data were acquired from a third-party vendor. These data were available at the individual property level and, through address-matching, could be linked to the official Parcel Number which allowed the data to be merged with other area and property indicators.
Exhibit 8: Foreclosure Workflow
Transactions and Detailed Market Conditions
To support insights into the nature or character of the areas surrounding individual parcels, detailed parcel level descriptive data were acquired from Jackson County. Using Alteryx, the quality and condition ratings of the properties were converted to indices for the respective spatial clusters. The transaction volume and average values provided additional insights into the overall economic health of the area surrounding individual parcels.
Exhibit 9: Urban Core Transaction Workflow
To help address the abandoned housing problem in Kansas City MO, during 2017 $10m of funding was allocated to begin demolishing unsafe and otherwise targeted buildings. While this helped alleviate some of the concerns related to problems associated with deteriorating abandoned housing, it also contributed to a growing vacant lot problem. The demolition list was acquired through the Open Data Portal at a parcel level. The data were then joined to other datasets including the Land Bank.
Exhibit 10: Demolition Workflow
Land Bank Holdings
As of the end of 2017, the Lank Bank of Kansas City had acquired 6,021 properties. These properties were quite diverse including: commercial vacant and improved; industrial vacant and improved, residential vacant and improved, urban renewal vacant, and unusable side lots. The improved properties that were transferred to the Land Bank were in various states of disrepair, since properties in better condition had been acquired through foreclosure sales. In addition to being foreclosed on, to acquire properties they had to be three years in arrears on taxes and have outstanding property violations. The Land Bank properties are scattered throughout the Kansas City area, although the clear majority are in areas that suffer from a number of urban issues including absentee ownership, vacant parcels, and abandoned buildings that have yet to be processed through the courts and/or transferred to the Land Bank. Since its creation in 2013, the Land Bank has had 298 properties on the demolition list, sold 1,932 properties, and holds 2,808 parcels in its inventory. Of the sold properties, 1,329 were non-residential improved, including vacant lots and side-yards. Of the residential improved properties that were sold, 375 were sold to sweat-equity buyers, and 228 were sold to rehabbers.
Extensive analysis was conducted related to Land Bank holdings, as well as sales programs. This analysis was done to provide insights into what sold and what didn’t, to develop predictive models for intakes, to help process incoming properties, to prioritize sales, and to better recruit potential buyers and help them understand the condition of an individual property, as well as its surrounding environs, extensive analysis was conducted related to Land Bank holdings, as well as sales programs.
Exhibit 11: Land Bank History and Holdings Workflow
Alteryx JOIN_ALL Data
The individual records—most of which were available over the 2010-2017 period—were processed through a series of Alteryx workflows. While sharing some commonalities, each of the workflows differed because of the nature of the data, as well as how it was recorded, logged, and tracked in the reporting system. In addition, variables were created to turn the data into actionable information. For example, to provide insights into trends, acceleration ratios were calculated for selected variables which provided an indication of whether the trend was increasing or decreasing over time. Exhibit 12 provides a snapshot of the aggregation workflow in which each of the individual datasets were combined. Since this is an on-going project, the workflow is designed to be updated as new data tables are released for the existing datasets, and new datasets (e.g., health outcomes, environmental conditions, demographics) are available. Alteryx has helped create a foundational data warehouse that can updated and expanded upon to create a sustainable foundation upon which additional community outreach activities can be launched.
Exhibit 12: Join All Workflow
Describe the benefits you have achieved
The Land Bank of Kansas City has adopted a dual mission: improve access to affordable housing in the urban core; and, help stimulate neighborhoods and the city by returning abandoned property to the economic base. As with many of its peers, the Land Bank operates on limited funds and draws on many partners to achieve its mission. In terms of processes, the Land Bank serves three key functions: property intake, inventory management, and dispositions. When this project was initiated in 2016, the Land Bank was taking in more residential improved properties than it was selling, creating an increasing inventory and placing pressure on operating funds. Over time, the situation has changed with the Land Bank inventory of residential properties that are suitable for rehab being liquidated at a pace relatively on par with new properties. Part of that success might be related to efforts of this project, as well as improvement in market conditions, changes in Land Bank strategies, and other forces. Despite this success, the Land Bank continues to hold a significant number of properties, many of which are vacant residential after demolition of dilapidated buildings. Thus, our project helps the leadership of the Land Bank in inventory management and prioritization of sales candidates that are the most likely to be successful. Finally, the project is exploring the impact of Land Bank sales, both in terms of individual properties and the neighborhoods and/or locations in which they are domiciled.
Exhibit 13: Land Bank Benefits
Working with the Land Bank, project leaders have developed a good understanding of the nature of buyers, and their motivations. In terms of buyers, they can be classified into two key categories: buyers seeking to own a home; and buyers seeking to rehab a property either for rent or sale. For many of the “owner-buyers,” homeownership has been an elusive goal due to a combination of factors including economic limitations and lack of access to capital. The Land Bank sales program allows these buyers to invest “sweat equity” in lieu of hard money, selling at a discount with the promise they will bring the properties up to code within 1-2 years. For these buyers, it is imperative they understand the condition of the property, as well as the condition of the immediate environs in which they will live upon completion. Rehabbers have similar needs, although they may have access to more resources, business acumen, and experience to select properties and locations in which to concentrate their efforts.
The project through its a2V-Lb website (Exhibit 14) and other efforts still in process will help support both needs. The website not only provides a clearinghouse of Land Bank properties available for purchase, but also an intuitive interface for users to understand the neighborhoods where they may like to purchase. One of the worries of Land Bank officials is that owner-buyers may not understand the nature of the communities in which they are buying (as opposed to rehabbers that tend to have more sophisticated search behaviors). This leads to the strong possibility that they will not successfully invest in the house and ultimately not bring it into productive use. Using Alteryx’s Tile tool, we can portray neighborhood conditions such as property violations and 311- Calls in percentile format making it easy for citizens to quickly ascertain whether a property is in an area that fits their risk profiles. All the data used behind the website interface is generated and updated through Alteryx.
Exhibit 14: a2V-Lb Website for Buyers
At this stage, we are working with formal and informal neighborhood associations to determine how to support their needs. As an introduction, we added a “Neighborhood Search Function” to our interactive website (Exhibit 15). The Area Statistics provide a relative ranking of the selected neighborhoods compared to other urban core average and other neighborhoods. We are staging a series of meetings with neighborhoods to determine which type of data are of most interest in terms of Area Statistics, as well a trend analysis related to key indicators such as Crimes Against People and Property, Property Violations, 311 Calls Visible from the Street, etc. Since different neighborhoods have different situations, programs, needs, and preferences, we are exploring how to customize the descriptive data presentation to help them address their priorities. On the analytical front, we are exploring the data to develop a classification system for neighborhoods, as well as a life-cycle tracking system. The results of this analysis will be customized solutions to priorities considering situational variables, as well as suggested interventions that may be appropriate to address a need, preempt further erosion, and/or help stimulate revitalization.
Exhibit 15: a2V-Lb Neighborhood-Area Statistics
Extension of Project Benefits
The initial emphasis of this project focused on the Land Bank of Kansas City MO. While that continues to be the key driver, it has become clear that the data warehouse will support exploration of several urban issues including health and wellness, economic vitality, impact assessment, policy formulation, etc. Some of these extensions have already been achieved including application of some of the spatial data and other indicators to student-led case studies. An example of this is the Entrepreneurial Urban Development course from Spring 2018 which is an interdisciplinary course in which the project lead is a co-instructor. Briefly, student teams worked on inner-city neighborhood issues including revitalization and repositioning of the historic 18th & Vine District. Another example is the Chestnut House, a non-profit organization focused on “intentional neighboring” as a way of empowering residents of inner city neighborhoods to understand their environs as well as how to access city services including 311 call system to affect grass-roots changes.
Our experience in a School of Management is that students are clamoring for tools that they can employ in the real-world to both solve problems and to give them a leg up as compared to other job-seekers. While we teach using other tools (R and RStudio, ArcMap, GeoDa, etc.), we have found Alteryx to be that perfect tool for our students. Its flexibility, functionality, and ease-of-use allows it to be a one-stop shop data analysis tool. Two groups of students have been exposed to it: graduate Real Estate students and Law students. As we have shown through this project, real estate is becoming an increasingly technical field where financial and geographic data must be successfully integrated. We have positioned our students as being comfortable blending, cleaning, and analyzing data to make informed decisions about investing in the built environment. Perhaps the Law Students are most illustrative of Alteryx’s ease-of-use; the students, enrolled in “Law, Technology, and Public Policy”, have a clear interest in technology but generally have no training in quantitative data preparation/analysis. Students employ Alteryx in real-world projects throughout Kansas City, adding a skill set that most law graduates do not possess.
Perhaps the most important aspect of using Alteryx in the classroom is helping our students understand that “more data” is not always a positive, and that raw data does not always support valid and reliable inference. Data cleaning and blending is challenging but very necessary to creating valid inference. Alteryx creates a non-threatening data environment to highlight the shortcomings of any dataset and to develop comprehensive strategies to address them. Because we are not formally trained in Alteryx, we also thoroughly enjoy the learning process that happens elbow-to-elbow in a graduate classroom. Hopefully, the result of this on-going project can be viewed holistically, with learning occurring at every stage.
We have developed a website entitled a2V-Lb for potential buyers, neighborhood leaders, community stakeholders and others (see: http://aagis.net/v2v/)