Data Science

Machine learning & data science for beginners and experts alike.
SusanCS
Alteryx Alumni (Retired)

SusanCS_0-1589911058491.gif

 

 

What’s the probability of finding a koala in your neighborhood? 

 

Unfortunately, it’s zero in my neighborhood in Oregon. But for the lucky folks in the areas of Australia where these adorable critters reside, there’s a real chance that a koala could be nearby at any given moment. And, thanks to an innovative mapping project, residents of New South Wales in Australia can find out their location’s “koala likelihood” -- vital information for helping these cute animals.

 

I learned about this project after listening to this week’s episode of the Alter Everything podcast, which features the work of One Tree Planted and other groups who are using data for environmental restoration and wildlife protection. I had to check out the koala mapping projects that were mentioned -- cuteness plus data, yes, please! -- and found the Koala Likelihood Map especially interesting. 

 

The Koala Likelihood Map provides information on koala habitat in order to preserve the environments where these unique animals live. These efforts are even more important after the devastating wildfires this year and last year in Australia. 

 

I wanted to see how the mapping team developed their koala likelihood model and to find out what we could learn for other kinds of modeling from their creative thinking. As it turns out, maximum entropy modeling -- in addition to having a cool name -- turns out to be relevant to this kind of project, and has lots of uses in other areas, too.

 

 

Koalas Here, Koalas (Not) There, but More Koalas Everywhere

The Koala Likelihood Map is a project of the New South Wales Department of Planning, Industry and Environment. New South Wales (NSW) is Australia’s most populous state, with the most humans and about 30,000 to 40,000 wild koalas. The map identifies areas where koalas might be present. The map quickly demonstrates areas humans need to protect to preserve these unusual animals.

 

 

SusanCS_1-1589911058439.png

A look at “koala likelihood” near Sydney.

 

 

By cleverly combining data sources, the Koala Likelihood Map displays a percent likelihood that a particular square on the map grid will contain koalas, even including a confidence interval for that prediction. The data include a model for koala-appropriate habitat locations; data on koala-preferred tree species, native plant life, and bodies of water; maps of Areas of Regional Koala Significance, where koala populations and threats exist; and a map of all koala sightings recorded through NSW BioNet, which tracks wildlife sightings submitted by professional and citizen scientists.

 

 

SusanCS_2-1589911058579.png

Some of the koala sightings in 2019.

 

 

The researchers validated their approach through an independent survey of koalas that supported their model and map. The koala map’s availability has important effects on strategies for koalas’ preservation, keeping these critters’ cuteness around and protecting biodiversity.

 

 

Koalas and … Entropy?

When I dug into a paper distributed by the mapping team as part of their initial 2014 mapping effort, I found the appendix on their modeling choices especially interesting. The map designers decided not to use a popular prediction method for species prevalence, called maximum entropy modeling, because it would likely predict koalas’ presence in areas where there had in fact been little data gathering. Instead, they used a simpler method that reflected koalas’ known presence in an area relative to the numbers of other mammals, as counted in wildlife surveys.

 

Yet maximum entropy, or “maxent,” modeling is still a useful tool for not only in biology, but also in computer vision and natural language processing (NLP) tasks, like sentiment analysis, spam detection and translation. The NLTK package in Python for NLP has a module for maxent modeling. These models classify data by calculating which labels or conditions generate maximum entropy. 

 

 

SusanCS_3-1589911059105.gif

Koala experiencing the absence of entropy

 

 

Entropy can be thought of as “disorder” or variation. Decision trees, for example, are built by minimizing the entropy in their nodes as they classify data, so each resulting group is as orderly and consistent as possible. There is another stage in decision tree usage where entropy could also apply. When we consider predictions generated by a decision tree, one label might tend to dominate, or we might see that the chance of classification into any one label was pretty even across all our labels. In the former situation, we could say that entropy was low in those data and in that system of labels we applied; one bucket tidily caught most of our data. In the latter, entropy was high, with data getting distributed all over into different bins.

 

What if you designed a model based on that state of higher entropy? Maxent modeling chooses the option that demonstrates the most entropy and that is consistent with “what we already know” about the data, i.e., its existing distribution. The model selects the most “uniform” distribution among the features you want to analyze -- like the “different bins” scenario above. The algorithm uses iterative optimization to figure out this maximum-entropy solution, and it can take a long time to train a model with this method. (Here’s a more complete explanation of how maxent models work.) These models, despite what superficially seems like an embrace of chaos, often perform well. And although they might not have been ideal for the koala map, maxent models still have lots of utility in other areas. 

 

 

SusanCS_4-1589911059846.gif

 

 

Fortunately, as the Alter Everything episode shows, there are creative folks like the Koala Likelihood Map creators and many others working to help koalas and other wildlife with all kinds of sophisticated data strategies. For example, drone imagery and convolutional neural networks might play a part. We can also try to predict where a koala might cross the road -- but we’re probably never going to know exactly why. (Sorry not sorry.) 

 

Check out the podcast episode for more on Australian critters, conservation and data analytics.

 

 

Susan Currie Sivek
Senior Data Science Journalist

Susan Currie Sivek, Ph.D., is the data science journalist for the Alteryx Community. She explores data science concepts with a global audience through blog posts and the Data Science Mixer podcast. Her background in academia and social science informs her approach to investigating data and communicating complex ideas — with a dash of creativity from her training in journalism. Susan also loves getting outdoors with her dog and relaxing with some good science fiction. Twitter: @susansivek

Susan Currie Sivek, Ph.D., is the data science journalist for the Alteryx Community. She explores data science concepts with a global audience through blog posts and the Data Science Mixer podcast. Her background in academia and social science informs her approach to investigating data and communicating complex ideas — with a dash of creativity from her training in journalism. Susan also loves getting outdoors with her dog and relaxing with some good science fiction. Twitter: @susansivek