We’ve introduced the Elasticsearch Relevance Engine (ESRE), new capabilities for creating highly relevant AI search applications. ESRE builds on Elastic’s leadership in search and over two years of machine learning research and development. The Elasticsearch Relevance Engine combines the best of AI with Elastic’s text search. ESRE gives developers a full suite of sophisticated retrieval algorithms and the ability to integrate with large language models (LLMs). Even better, it’s accessible via a simple, unified API that Elastic’s community already trusts, so developers around the world can start using it immediately to elevate search relevance.
The Elasticsearch Relevance Engine’s configurable capabilities can be used to help improve relevance by:
- Applying advanced relevance ranking features including BM25f, a critical component of hybrid search
- Creating, storing, and searching dense embeddings using Elastic’s vector database
- Processing text using a wide range of natural language processing (NLP) tasks and models
- Letting developers manage and use their own transformer models in Elastic for business specific context
- Integrating with third-party transformer models such as OpenAI’s GPT-3 and 4 via API to retrieve intuitive summarization of content based on the customer’s data stores consolidated within Elasticsearch deployments
- Enabling ML-powered search without training or maintaining a model using Elastic’s out-of-the-box Learned Sparse Encoder model to deliver highly relevant, semantic search across a variety of domains
- Easily combining sparse and dense retrieval using Reciprocal Rank Fusion (RRF), a hybrid ranking method that gives developers control to optimize their AI search engine to their unique mix of natural language and keyword query types
- Integrating with third-party tooling such as LangChain to help build sophisticated data pipelines and generative AI applications
The evolution of search has been driven by a constant need to improve relevance and the ways in which we interact with search applications. Highly relevant search results can lead to increased user engagement on search apps with significant downstream impacts on both revenue and productivity. In the new world of LLMs and generative AI, search can go even further — understanding user intent to provide a level of specificity in responses that’s never been seen before.
Notably, every search advancement delivers better relevance while addressing new challenges posed by emerging technologies and changing user behaviors. Whether expanding on keyword search to offer semantic search or enabling new search modalities for video and images, new technology requires unique tools to deliver better experiences for search users. By the same token, today’s world of artificial intelligence calls for a new, highly scalable developer toolkit that’s been built on a tech stack with proven, customer-tested capabilities.
With generative AI’s momentum and increased adoption of technologies like ChatGPT, as well as growing awareness of large language model capabilities, developers are hungry to experiment with technology to improve their applications. The Elasticsearch Relevance Engine ushers in a new age of capabilities in the world of generative AI and meets the day with powerful tools that any developer team can use right away.
The Elasticsearch Relevance Engine is available on Elastic Cloud — the only hosted Elasticsearch offering to include all of the new features in this latest release. You can also download the Elastic Stack and our cloud orchestration products, Elastic Cloud Enterprise and Elastic Cloud for Kubernetes, for a self-managed experience.
Overcoming the limitations of generative AI models
The Elasticsearch Relevance Engine is well positioned to help developers evolve quickly and address these challenges of natural language search, including generative AI.
-
Enterprise data/context aware: The model might not have sufficient internal knowledge relevant to a particular domain. This stems from the data set that the model is trained on. In order to tailor the data and content that LLMs generate, enterprises need a way to feed models proprietary data so they can learn to furnish more relevant, business-specific information.
-
Superior relevance: The Elasticsearch Relevance Engine makes integrating data from private sources as simple as generating and storing vector embeddings to retrieve context using semantic search. Vector embeddings are numerical representations of words, phrases, or documents that help LLMs understand the meanings of words and their relationships. These embeddings enhance transformer model output at speed and scale. ESRE also lets developers bring their own transformer models into Elastic or integrate with third-party models.
We also realized that the emergence of late interaction models allows us to provide this out of the box — without the need for extensive training or fine tuning on third-party data sets. Since not every development team has the resources nor expertise to train and maintain machine learning models nor understand the trade-offs to scale, performance, and speed, the Elasticsearch Relevance Engine also includes Elastic Learned Sparse Encoder, a retrieval model built for semantic search across diverse domains. The model pairs sparse embeddings with traditional, keyword-based BM25 search for an easy to use Reciprocal Rank Fusion (RRF) scorer for hybrid search. ESRE gives developers machine learning-powered relevance and hybrid search techniques on day one.
-
Privacy and security: Data privacy is central to how enterprises use and securely pass proprietary data over a network and between components, even when building innovative search experiences.
Elastic includes native support for role-based and attribute-based access control to ensure that only those roles with access to data can see it, even for chat and question answering applications. Elasticsearch can support your organization’s need to keep certain documents accessible to privileged individuals, helping your organization to maintain universal privacy and access controls across all of your search applications.
When privacy is of the utmost concern, keeping all data within your organization’s network can be not only paramount, but obligatory. From allowing your organization to implement deployments that are in an air-gapped environment to supporting access to secure networks, ESRE provides the tools you need to help your organization keep your data safe.
-
Size and cost: Using large language models can be prohibitive for many enterprises due to data volumes and required computing power and memory. Yet businesses that want to build their own generative AI apps like chatbots need to marry LLMs with their private data.
The Elasticsearch Relevance Engine gives enterprises the engine to deliver relevance efficiently with precision context windows that help reduce the data footprint without hassle and expense.
-
Out of date: The model is frozen in time at the point when training data is collected. So the content and data that generative AI models create is only as fresh as data they’re trained on. Integrating corporate data is an inherent need to power timely results from LLMs.
-
Hallucinations: When answering questions or conversing with the model, it may invent facts that sound trustworthy and convincing, but are in-fact projections that aren’t factual. This is another reason that grounding LLMs with contextual, customized knowledge is so critical to making models useful in a business context.
The Elasticsearch Relevance Engine lets developers link to their own data stores via a context window in generative AI models. The search results added can provide up-to-date information that’s from a private source or specialized domain, and therefore can return more factual information when prompted instead of relying solely on a model’s so-called "parametric" knowledge.
Supercharged by a vector database
The Elasticsearch Relevance Engine includes a resilient, production grade vector database by design. It gives developers a foundation on which to build rich, semantic search applications. Using Elastic’s platform, development teams can use dense vector retrieval to create more intuitive question-answering that’s not constrained to keywords nor synonyms. They can build multimodal search using unstructured data like images, and even model user profiles and create matches to personalize search results in product and discovery, job search, or matchmaking applications. These NLP transformer models also enable machine learning tasks like sentiment analysis, named entity recognition, and text classification. Elastic’s vector database lets developers create, store, and query embeddings that are highly scalable and performant for real production applications.
Elasticsearch excels at high-relevance search retrieval. With ESRE, Elasticsearch provides context windows for generative AI linked to an enterprise’s proprietary data, allowing developers to build engaging, more accurate search experiences. Search results are returned according to a user’s original query, and developers can pass that data on to the language model of their choice to provide an answer with added context. Elastic supercharges question-answer and personalization capabilities with relevant contextual data from your enterprise content store that’s private and tailored to your business.
Delivering superior relevance out-of-the-box for all developers
With the release of the Elasticsearch Relevance Engine, we’re making Elastic’s proprietary retrieval model readily available. The model is easy to download and works with our entire catalog of ingestion mechanisms like the Elastic web crawler, connectors or API. Developers can use it out of the box with their searchable corpus, and it’s small enough to fit within a laptop’s memory. Elastic’s Learned Sparse Encoder provides semantic search across domains for search use cases such as knowledge bases, academic journals, legal discovery, and patent databases to deliver highly relevant search results without the need to adapt or train it.
Most real-world testing shows hybrid ranking techniques are producing the most relevant search result sets. Until now, we've been missing a key component — RRF. We're now including RRF for your application searching needs so you can pair vector and textual search capabilities.
Machine learning is on the leading edge of enhancing search result relevance with semantic context, but too often its cost, complexity, and resource demands make it insurmountable for developers to implement it effectively. Developers commonly need the support of specialized machine learning or data science teams to build highly relevant AI-powered search. These teams spend considerable time selecting the right models, training them on domain-specific data sets, and maintaining models as they evolve due to changes in data and its relationships.
Learn how Go1 uses Elastic’s vector database for scalable, semantic search.
Developers who don’t have the support of specialized teams can implement semantic search and benefit from AI-powered search relevance from the start without the effort and expertise required for alternatives. Starting today, all customers have the building blocks to help achieve better relevance and modern, smarter search.
Try it out
Read about these capabilities and more.
Existing Elastic Cloud customers can access many of these features directly from the Elastic Cloud console. Not taking advantage of Elastic on cloud? See how to use Elasticsearch with LLMs and generative AI.
The release and timing of any features or functionality described in this post remain at Elastic's sole discretion. Any features or functionality not currently available may not be delivered on time or at all.
Elastic, Elasticsearch, Elasticsearch Relevance Engine, ESRE, Elastic Learned Sparse Encoder and associated marks are trademarks, logos or registered trademarks of Elasticsearch N.V. in the United States and other countries. All other company and product names are trademarks, logos or registered trademarks of their respective owners.