PyCon 2016 in Portland, Or
hills next to breadcrumb illustration

High Quality High Performance Clustering

Leland McInnes, John Healy

Audience level:
Novice
Category:
Python Libraries

Description

Clustering data is a common problem in data science. In the absence of labelled data having confidence in the results of clustering can be challenging. We present a high performance clustering library for scalable high quality clustering.

Abstract

Clustering data is a common problem in data science. In the absence of labelled data having confidence in the results of clustering can be challenging. We present the [hdbscan](https://github.com/lmcinnes/hdbscan) library, a high performance clustering library for scalable high quality clustering. We will discuss the requirements for high quality clustering, and compare and contrast our clustering implementation with other clustering algorithms available in python for both clustering quality and performance.