Change the future

Spatial Clustering in Python

Shane Grigsby

Audience level:


Density-based clustering allows the identification of objects from unstructured data. The DBSCAN and OPTICS algorithms allow clustering and classification of remotely-sensed points into objects; however, current implementations have been unable to handle the data volume produced by LiDAR (Light Detection And Ranging). Using modified kd-trees as a spatial index allows for increased scalability.


This project presents an implementation of the OPTICS and DBSCAN density-based clustering algorithms programmed in python. The goal is to enable efficient processing and segmentation of unstructured LiDAR points, specifically for point clouds of over 10 million points. Spatially-indexed tree structures from the SciPy library allow the development of spatially aware processing queues, which subsequently allow the application of the above algorithms to arbitrarily large point clouds.

The clustering of points into objects allows for interesting practical applications. In this case, density-based segmentation is applied to laser scans of living Ficus and Eucalyptus trees, identifying individual leaves and allowing additional modeling such as estimating the number of leaves per tree or leaf size distribution.