Typically, when considering using a topic model in production, you encounter two hurdles: First, topics change continually, and document tags become stale as soon as they are created. Second, while unsupervised topic models do a good job of clustering topics, creating robust, human-interpretable labels is challenging. Framing topic modeling as a search problem, helps overcome these challenges and makes it easier to use supervised or unsupervised topic models in real-time applications.
Python packages and extensions to commonly used databases that now support dense vectors, have made this approach viable —and relatively easy to implement— in most production environments.
In this talk, attendees will learn how to repurpose embeddings generated by a topic model as dense vectors that can be used in search to find current documents similar to the topic.
Key concepts
- Embeddings - quick introduction (or refresher, for those familiar)
- Convert sentence embeddings into document embeddings and ultimately topic embeddings.
- Generate topic embeddings (using LLMs, supervised models or unsupervised models)
- Store topic embeddings on popular databases to enable search
- Retrieve current documents related to a topic using search queries and cosine similarity — without tagging documents.
Benefits of this approach
- Simplify and improve performance for storing and retrieving documents related to a topic
- Capture fast-moving, evolving conversations
- Allow user-generated topics (personalization)
- Anticipate topics that don’t yet exist
- No need to maintain document "tags"
Audience
- Intermediate Python developers interested in NLP and search technologies
- Practitioners building search, recommendation, or semantic discovery systems
- Anyone curious about combining classic topic modeling with modern vector search architectures
Workshop Outcomes - Run a complete topic-vector search pipeline - Gain a reusable mental model for combining: - Topic modeling - Embeddings - Vector search
No prior experience with vector search is required. Familiarity with Python and basic NLP concepts is helpful.