This site uses different types of cookies, including analytics and functional cookies (its own and from other sites). To change your cookie settings or find out more, click here. If you continue browsing our website, you accept these cookies.
Many if not most supervised-classification problems involve some degree of class imbalance, where at least one class occurs more frequently than the others. The imbalanced-classification problem illustrates the value of approaching data-science problems as empirical (as well as formal) optimization problems, using techniques termed cost-sensitive learning. This post will show you how to do cost-sensitive binary classification.
Most real-world data-science design patterns combine several models to solve a single business problem. This post surveys the most common and effective techniques for combining models. Once you make it through this post (and its predecessors), you'll be ready to take on the design patterns we'll begin learning in 2017.
Cross validation (CV) is a difficult topic. There are many ways to do CV, and articles on the subject can be very technical. This blog post is a gentle introduction to CV. Read it and you'll find it much easier to understand later posts describing data-science design patterns that use CV.