Deep Chat JS, semantic search, and OpenAI integration

This is part 2 of a series of articles: From zero to chatbot: Building a website answer engine

Building on the foundation laid in the previous article, this post dives into integrating Deep Chat JS component for real-time, context-aware conversations. By leveraging semantic search on your vector database, you can retrieve the most relevant information faster than ever.

Building a robust chat component from the ground up can be both time-consuming and technically challenging, especially if you want to handle real-time interactions, dynamic content, and a smooth user experience. Rather than reinventing the wheel, I decided to explore existing solutions to save development time and reduce complexity. That's when I discovered Deep Chat JS, feature-rich component that provides the functionality and flexibility needed to create a seamless chat interface without starting from scratch.

If we revisit the image from the first article in the series, where we discussed data vectorization using the OpenAI Embedding API, we can see that this article corresponds to the part of the diagram labeled with the number 2.

Image

Given that our data has been converted into vectors (in my case, the pages of this website were vectorized and stored in a Qdrant vector database), we can now perform semantic searches. As shown in the diagram above, the user input—typically a question in a chatbot scenario—also needs to be converted into vectors. In my case, I accomplished this using the OpenAI Embeddings API and a model called text-embedding-3-large, but technically, this could be done with any other closed-source or open-source model.

Once we convert the user input into vectors, we then need to query the vector database. Qdrant provides an API where we pass in the vectorized query and specify how many similar results we want to retrieve. In the world of RAG (Retrieval-Augmented Generation) applications, this parameter is commonly referred to as the Top K parameter.

Once Qdrant returns the results, these documents are the ones closest to the user's query. Here, the search is not based on keywords but rather on proximity within the vector space—meaning that the closer two vectors are, the more similar their meanings. Now that we have these results, we bundle them together with the user's query and any system instructions, then send them to the OpenAI Chat API. This way, the AI can formulate its answer using the specific context we've provided, rather than relying solely on its own internal knowledge base. Essentially, we supply the context from which we expect the LLM to extract and present a well-structured answer.

The OpenAI Chat API will return a response to the user's question, which we then pass to the Deep Chat JS component. It looks like this:

Image

The chat has a built-in rate limiter to prevent bots from exploiting it. It uses a rolling window rate limit of n requests per n seconds per user IP address.

About the Author

Goran Nikolovski is a web and AI developer with over 10 years of expertise in PHP, Drupal, Python, JavaScript, React, and React Native. He founded this website and enjoys sharing his knowledge.

AI Assistant