Hybrid Search
In most text search scenarios, it's crucial to ensure that the most relevant results appear in the candidates. Vector and keyword searches each have their strengths in the search domain. Hybrid Search combines the advantages of both techniques while compensating for their respective shortcomings.
In Hybrid Search, vector and keyword indices are pre-established in the database. Upon user query input, the system searches for the most relevant text in documents using both search methods.
"Hybrid Search" doesn't have a definitive definition; this article exemplifies it as a combination of Vector Search and Keyword Search. However, the term can also apply to other combinations of search algorithms. For instance, we could combine knowledge graph technology, used for retrieving entity relationships, with Vector Search.
Different search systems each excel at uncovering various subtle connections within texts (paragraphs, sentences, words), including precise relationships, semantic relationships, thematic relationships, structural relationships, entity relationships, temporal relationships, and event relationships. It's safe to say that no single search mode is suitable for all scenarios. Hybrid Search, by integrating multiple search systems, achieves a complementarity among various search technologies.
Vector Search
Definition: Vector Search involves generating query embeddings and then searching for text chunks that most closely match these embeddings in terms of vector representation.
TopK: This setting is used to filter text chunks that have the highest similarity to the user's query. The system also dynamically adjusts the number of chunks based on the context window size of the selected model. The default value for this setting is 3.
Score Threshold: This setting is used to establish a similarity threshold for the selection of text chunks. It means that only text chunks exceeding the set score are recalled. By default, this setting is turned off, meaning that the system does not filter the similarity values of the recalled text chunks. When activated, the default value is set to 0.5.
Rerank Model: After configuring the Rerank model's API key on the "Model Provider" page, you can enable the "Rerank Model" in the search settings. The system then performs a semantic re-ranking of the document results that have been recalled after semantic search, optimizing the order of these results. Once the Rerank model is set up, the TopK and Score threshold settings are only effective in the Rerank step.
Full-Text Search
Definition: Full-Text Search involves indexing all the words in a document, enabling users to query any term and retrieve text chunks that contain these terms.
TopK: This setting is utilized to select text chunks that most closely match the user's query in terms of similarity. The system also dynamically adjusts the number of chunks based on the context window size of the chosen model. The default value for TopK is set at 3.
Rerank Model: After configuring the API key for the Rerank model on the "Model Provider" page, you can activate the "Rerank Model" in the search settings. The system will then perform a semantic re-ranking of the document results retrieved through full-text search, optimizing the order of these results. Once the Rerank model is configured, the TopK and any Score threshold settings will only be effective during the Rerank step.
Hybrid Search
Hybrid Search operates by concurrently executing Full-Text Search and Vector Search. It then applies a re-ranking step to choose the best results that match the user's query from both types of search results. To effectively use this feature, it is necessary to configure the Rerank Model API.
TopK: This setting is used for filtering text chunks that have the highest similarity to the user's query. The system will dynamically adjust the number of chunks based on the context window size of the model in use. The default value for TopK is set at 3.
Rerank Model: After configuring the Rerank model's API key on the "Model Supplier" page, you can enable the "Rerank Model" in the search settings. The system will perform a semantic re-ranking of the document results retrieved through hybrid search, thereby optimizing the order of these results. Once the Rerank model is set up, the TopK and any Score threshold settings are only applicable during the Rerank step.
Setting the Search Mode When Creating a Knowledge
To set the search mode when creating a knowledge base, navigate to the "Knowledge -> Create Knowledge" page. There, you can configure different search modes in the retrieval settings section.
Modifying the Search Mode in Prompt Engineering
You can modify the search mode during application creation by navigating to the "Prompt Engineering -> Context -> Select Knowledge -> Settings" page. This allows for adjustments to different search modes within the prompt arrangement phase.
最后更新于