Mistral AI embedding models now available via Elasticsearch Open Inference API

Following our close collaboration with the Mistral AI team, today we’re excited to announce that Elasticsearch vector database now stores and automatically chunks embeddings from the mistral-embed model, with native integrations to the open inference API and the semantic_text field. For developers building RAG applications, this integration eliminates the need to architect bespoke chunking strategies, and makes chunking with vector storage as simple as adding an API key.

Mistral AI provides popular open-source and optimized LLMs, ready for enterprise use cases. The Elasticsearch open inference API enables developers to create inference endpoints and use machine learning models of leading LLM providers. As two companies rooted in openness and community, it was only natural for us to collaborate!

In this blog, we will use Mistral AI’s mistral-embed model in a retrieval augmented generation (RAG) setup.

Get Started

To get started, you’ll need a Mistral account on La Plateforme and an API key generated for your account.

mistral platforme view

Generate a usage key on Mistral AI’s La Plateforme.

Now, open your Elasticsearch Kibana UI and expand the dev console for the next step.

elastic dev console view

Replace the highlighted API key with the key generated in the Mistral Platform.

You’ll create an inference endpoint by using the create inference API and providing your API key, and the name of the Mistral embedding model. As part of the configuration for an inference endpoint, we will specify the mistral-embed model to create a “text_embedding” endpoint named “mistral_embeddings”

PUT _inference/text_embedding/mistral_embeddings 
{
    "service": "mistral",
    "service_settings": {
        "api_key": "<api_key>", 
        "model": "mistral-embed" 
    }
}

You will receive a response from Elasticsearch with the endpoint that was created successfully:

{
    "model_id": "mistral_embeddings",
    "task_type": "text_embedding",
    "service": "mistral",
    "service_settings": {
        "model": "mistral-embed",
        "dimensions": 1024,
        "similarity": "dot_product",
        "rate_limit": {
            "requests_per_minute": 240
        }
    },
    "task_settings": {}
}

Note that there are no additional settings for model creation. Elasticsearch will automatically connect to the Mistral platform to test your credentials and the model, and fill in the number of dimensions and similarity measures for you.

Next, let’s test our endpoint to ensure everything is set up correctly. To do this, we’ll call the perform inference API:

POST _inference/text_embedding/mistral_embeddings
{
  "input": "Show me some text embedding vectors"
}

The API call will return vthe generated embeddings for the provided input, which will look something like this:

{
    "text_embedding": [
        {
            "embedding": [
                0.016098022,
                0.047546387,
                … (additional values) …
                0.030654907,
                -0.044067383
            ]
        }
    ]
}

Automated chunking for vectorized data

Now that we have our inference endpoint all set up and verified that it works, we can use the Elasticsearch semantic_text mapping with an index to use it to automatically create our embeddings vectors when we index. To do this, we’ll start by creating an index named “mistral-semantic-index” with a single field to hold our text named content_embeddings:

PUT mistral-semantic-index
{
  "mappings": {
    "properties": {
      "content_embeddings": {
        "type": "semantic_text",
        "inference_id": "mistral_embeddings"
      }
    }
  }
}

Once that is set up, we can immediately start using our inference endpoint simply by indexing a document. For example:

PUT mistral-semantic-index/_doc/doc1
{
  "content_embeddings": "These are not the droids you're looking for. He's free to go around"
}

Now if we retrieve our document, you will see that internally when the text was indexed our inference endpoint was called and text embeddings were automatically added to our document as well as some additional metadata about the model used for the inference. Under the hood, the semantic_text field type will automatically provide chunking to break larger input text into manageable chunks on which it will perform the inference.

GET mistral-semantic-index/_search

{
  …
  "hits": {
    …
    "hits": [
      {
        "_index": "mistral-semantic-index",
        "_id": "doc1",
        "_score": 1,
        "_source": {
          "content_embeddings": {
            "text": "These are not the droids you're looking for. He's free to go around",
            "inference": {
              "inference_id": "mistral_embeddings",
              "model_settings": {
                "task_type": "text_embedding",
                "dimensions": 1024,
                "similarity": "dot_product",
                "element_type": "float"
              },
              "chunks": [
                {
                  "text": "These are not the droids you're looking for. He's free to go around",
                  "embeddings": [
                    -0.039367676,
                    0.022644043,
                    -0.0076675415,
                    -0.020507812,
                    0.039489746,
                    0.0340271,
        …

Great! Now that we have that setup and we’ve seen it in action, we can load our data into our index. You can use whatever method you like. We prefer the bulk indexing API. While your data is indexing, the ingest pipeline will use our inference endpoint to perform inference on the content_embedding field data across to the Mistral API.

Semantic Search powered by Mistral AI

Finally, let’s run a semantic search using our data. Using the semantic query type, we’ll search for the phrase “robots you’re searching for”.

GET mistral-semantic-index/_search
{
  "query": {
    "semantic": {
      "field": "content_embeddings",
      "query": "robots you're searching for"
    }
  }
}
elastic dev console view

That’s it! When the query is run, Elasticsearch’s semantic text query will use our Mistral inference endpoint to obtain the query vector and use that under the hood to search across our data on the “content_embeddings” field.

We hope you enjoy using this integration, available today with Elastic Serverless, or soon on Elastic Cloud Hosted with version 8.15. Head over to the Mistral AI page on Search Labs to get started.

Happy searching!

Ready to try this out on your own? Start a free trial.
Elasticsearch has integrations for tools from LangChain, Cohere and more. Join our advanced semantic search webinar to build your next GenAI app!
Recommended Articles