Automatically updating your Elasticsearch index using Node.js and an Azure Function App

Maintaining an up-to-date Elasticsearch index is crucial, especially when dealing with frequently changing dynamic datasets. This blog post will guide you through automatically updating your Elasticsearch index using Node.js and an Azure Function App.

First, we'll load the data using Node.js and ensure it remains current through regular updates. Then, we'll leverage the capabilities of Azure Function Apps to automate these updates, thereby ensuring your index is always fresh and reliable.

For this blog post, we will be using the Near Earth Object Web Service (NeoWs), a RESTful web service offering detailed information about near-earth asteroids. By integrating NeoWs with Node.js services integrated as Azure serverless functions, this example will provide you with a robust framework to handle the complexities of managing dynamic data effectively. This approach will help you minimize the risks of working with outdated information and maximize the accuracy and usefulness of your data.

Prerequisites

Setting up locally

Before you begin indexing and loading your data locally, setting up your environment is essential. First, create a directory and initialize it. Then, download the necessary packages and create a .env file to store your configuration settings. This preliminary setup ensures your local environment is prepared to handle the data efficiently.

mkdir Introduction-to-Data-Loading-in-Elasticsearch-with-Nodejs
cd Introduction-to-Data-Loading-in-Elasticsearch-with-Nodejs
npm init

You will be using the Elasticsearch node client to connect to Elastic, Axios to connect to the NASA APIs and dotenv to parse your secrets. You will want to download the required packages running the following commands: 

npm install @elastic/elasticsearch axios dotenv

After downloading the required packages, you can create a .env file at the root of the project directory. The .env file allows you to keep your credentials secure locally. Check out the example .env file to learn more. To learn more about connecting to Elasticsearch, be sure to take a look at the documentation on the subject

To create a .env file, you can use this command at the root of your project:

touch .env

In your .env , be sure to have the following entered in. Be sure to add your complete endpoint:

ELASTICSEARCH_ENDPOINT="https://...."
ELASTICSEARCH_API_KEY="YOUR_ELASTICSEARCh_API_KEY"
NASA_API_KEY="YOUR_NASA_API_KEY"

You will also want to create a new JavaScript file as well:

touch loading_data_into_a_index.js

Creating your index and loading your data in

Now that you have set up the proper file structure and downloaded the required packages, you are ready to create a script that creates an index and loads data into the index. If you get stuck along the way be sure to check out the full version of the file you are creating in this section.

In the file loading_data_into_a_index.js, configure the dotenv package to use the keys and tokens stored in your .env file. You should also import the Elasticsearch client to connect to Elasticsearch and Axios and make HTTP requests.

require('dotenv').config();

const { Client } = require('@elastic/elasticsearch');
const axios = require('axios');

Since your keys and tokens are currently stored as environment variables, you will want to retrieve them and create a client to authenticate to Elasticsearch.

const elasticsearchEndpoint = process.env.ELASTICSEARCH_ENDPOINT;
const elasticsearchApiKey = process.env.ELASTICSEARCH_API_KEY;
const nasaApiKey = process.env.NASA_API_KEY;

const client = new Client({
  node: elasticsearchEndpoint,
  auth: {
    apiKey: elasticsearchApiKey
  }
});

You can develop a function to retrieve data from NASA's NEO (Near Earth Object) Web Service asynchronously. You will first configure the base URL for the NASA API request and create date objects for today and the previous week to establish the query period. After you format these dates in the YYYY-MM-DD format required for the API request, set up the dates as query parameters and execute the GET request to the NASA API. Additionally, the function includes error-handling mechanisms to aid debugging should any issues arise.

async function fetchNasaData() {
  const url = "https://api.nasa.gov/neo/rest/v1/feed";
  const today = new Date();
  const lastWeek = new Date(today);
  lastWeek.setDate(today.getDate() - 7);

  const startDate = lastWeek.toISOString().split('T')[0];
  const endDate = today.toISOString().split('T')[0];
  const params = {
    api_key: nasaApiKey,
    start_date: startDate,
    end_date: endDate,
  };

  try {
    const response = await axios.get(url, { params });
    return response.data;
  } catch (error) {
    console.error('Error fetching data from NASA:', error);
    return null;
  }
}

Now, you can create a function to transform the raw data from the NASA API into a structured format. Since the data you get back is currently nested in a complex JSON response. A more straightforward array of objects makes handling data easier. 

function createStructuredData(response) {
  const allObjects = [];
  const nearEarthObjects = response.near_earth_objects;

  Object.keys(nearEarthObjects).forEach(date => {
    nearEarthObjects[date].forEach(obj => {
      const simplifiedObject = {
        close_approach_date: date,
        name: obj.name,
        id: obj.id,
        miss_distance_km: obj.close_approach_data.length > 0 ? obj.close_approach_data[0].miss_distance.kilometers : null,
      };

      allObjects.push(simplifiedObject);
    });
  });

  return allObjects;
}

You will want to create an index to store the data from the API. An index inside Elasticsearch is where you can store your data in documents. In this function, you will check to see if an index exists and create a new one if needed. You will also specify the proper mapping of fields for your index. This function also loads the data into the index as documents and maps the id field from the NASA data to the _id field in Elasticsearch.

async function indexDataIntoElasticsearch(data) {
  const indexExists = await client.indices.exists({ index: 'nasa-node-js' });
  if (!indexExists.body) {
    await client.indices.create({
      index: 'nasa-node-js',
      body: {
        mappings: {
          properties: {
            close_approach_date: { type: 'date' },
            name: { type: 'text' },
            miss_distance_km: { type: 'float' },
          },
        },
      },
    });
  }

  const body = data.flatMap(doc => [{ index: { _index: 'nasa-node-js', _id: doc.id } }, doc]);
  await client.bulk({ refresh: false, body });
}

You will want to create a main function to fetch, structure, and index the data. This function will also print out the number of records being uploaded and log whether the data is indexed, whether there is no data to index, or whether it failed to get data back from the NASA API. After creating the run function, you will want to call the function and catch any errors that may come up.

async function run() {
  const rawData = await fetchNasaData();
  if (rawData) {
    const structuredData = createStructuredData(rawData);
    console.log(`Number of records being uploaded: ${structuredData.length}`);
    if (structuredData.length > 0) {
      await indexDataIntoElasticsearch(structuredData);
      console.log('Data indexed successfully.');
    } else {
      console.log('No data to index.');
    }
  } else {
    console.log('Failed to fetch data from NASA.');
  }
}

run().catch(console.error);

You can now run the file from your command line by running the following:

node loading_data_into_a_index.js

To confirm that your index has been successfully loaded, you can check in the Elastic Dev Tools by executing the following API call:

GET /nasa-node-js/_search

Keeping your index updated with an Azure Function App

Now that you've successfully loaded your data into your index locally, this data can quickly become outdated. To ensure your information remains current, you can set up an Azure Function App to automatically fetch new data daily and upload it to your Elasticsearch index.

The first step is to configure your Function app in Azure Portal. A helpful resource for getting started is the Azure quick start guide

After you've set up your function, you can ensure that you have environment variables set up for ELASTICSEARCH_ENDPOINT, ELASTICSEARCH_API_KEY, and NASA_API_KEY. In Function Apps, environment variables are called Application settings. Inside your function app, click on the "Configuration" option in the left panel under "Settings." Under" the "Application settings" tab, click on "+ New application setting."

You will want to make sure the required libraries are installed as well. If you go to your terminal on the Azure Portal, you can install the necessary packages by entering the following:

npm install @elastic/elasticsearch axios

The packages you are installing should look very similar to the previous install, except you will be using the moment to parse dates, and you no longer need to load an env file since you just set your secrets to be Application settings.

You can click where it says create to create a new function inside your Function App select the template entitled “Timer trigger”. You will now have a file called function.json set for you. You will want to adjust it to look as follows to run this application every day at 10 am.

{
    "bindings": [
      {
        "name": "myTimer",
        "type": "timerTrigger",
        "direction": "in",
        "schedule": "0 0 10 * * *"
      }
    ]
  }

You'll also want to upload your package.json file and ensure it appears as follows:

{
  "name": "introduction-to-data-loading-in-elasticsearch-with-nodejs",
  "version": "1.0.0",
  "description": "A simple script for loading data in Elasticsearch",
  "main": "loading_data_into_a_index.js",
  "scripts": {
    "test": "echo \"Error: no test specified\" && exit 1"
  },
  "repository": {
    "type": "git",
    "url": "git+https://github.com/JessicaGarson/Introduction-to-Data-Loading-in-Elasticsearch-with-Nodejs.git"
  },
  "author": "Jessica Garson",
  "license": "Apache-2.0",
  "bugs": {
    "url": "https://github.com/JessicaGarson/Introduction-to-Data-Loading-in-Elasticsearch-with-Nodejs/issues"
  },
  "homepage": "https://github.com/JessicaGarson/Introduction-to-Data-Loading-in-Elasticsearch-with-Nodejs#readme",
  "dependencies": {
    "@elastic/elasticsearch": "^8.12.0",
    "axios": "^0.21.1"
  }
}

The next step is to create a index.js file. This script is designed to automatically update the data daily. It accomplishes this by systematically fetching and parsing new data each day and then seamlessly updating the dataset accordingly. Elasticsearch can use the same method to ingest time series or immutable data, such as webhook responses. This method ensures the information remains current and accurate, reflecting the latest available data.You can can check out the full code as well.

The main differences between the script you run locally and this one are as follows:

  • You will no longer need to load a .env file, since you have already set your environment variables
  • There is also different logging designed more towards creating a more sustainable script
  • You keep your index updated based on the most recent close approach date
  • There is an entry point for an Azure Function App

You will first want to set up your libraries and authenticate to Elasticsearch as follows:

const elasticsearchEndpoint = process.env.ELASTICSEARCH_ENDPOINT;
const elasticsearchApiKey = process.env.ELASTICSEARCH_API_KEY;
const nasaApiKey = process.env.NASA_API_KEY;

const client = new Client({
 node: elasticsearchEndpoint,
 auth: {
   apiKey: elasticsearchApiKey
 }
});

Afterward, you will want to obtain the last date update date from Elasticsearch and configure a backup method to get data from the past day if anything goes wrong.

async function getLastUpdateDate() {
  try {
    const response = await client.search({
      index: 'nasa-node-js',
      body: {
        size: 1,
        sort: [{ close_approach_date: { order: 'desc' } }],
        _source: ['close_approach_date']
      }
    });

    if (response.body && response.body.hits && response.body.hits.hits.length > 0) {
      return response.body.hits.hits[0]._source.close_approach_date;
    } else {
      // Default to one day ago if no records found
      const today = new Date();
      const lastWeek = new Date(today);
      lastWeek.setDate(today.getDate() - 1);
      return lastWeek.toISOString().split('T')[0];
    }
  } catch (error) {
    console.error('Error fetching last update date from Elasticsearch:', error);
    throw error;
  }
}

The following function connects to NASA's NEO (Near Earth Object) Web Service to get the data to keep your index updated. There is also some additional error handling that can capture any API errors that might come up.

async function fetchNasaData(startDate) {

  const url = "https://api.nasa.gov/neo/rest/v1/feed";
  const today = new Date();

  const endDate = today.toISOString().split('T')[0];

  const params = {
    api_key: nasaApiKey,
    start_date: startDate,
    end_date: endDate,
  };

  try {
    // Perform the GET request to the NASA API with query parameters
    const response = await axios.get(url, { params });
    return response.data;
  } catch (error) {
    // Log any errors encountered during the request
    console.error('Error fetching data from NASA:', error);
    return null;
  }
}

Now, you will want to create a function to organize your data by iterating over the objects of each date.

function createStructuredData(response) {
  const allObjects = [];
  const nearEarthObjects = response.near_earth_objects;

  Object.keys(nearEarthObjects).forEach(date => {
    nearEarthObjects[date].forEach(obj => {
      const simplifiedObject = {
        close_approach_date: date,
        name: obj.name,
        id: obj.id,
        miss_distance_km: obj.close_approach_data.length > 0 ? obj.close_approach_data[0].miss_distance.kilometers : null,
      };

      allObjects.push(simplifiedObject);
    });
  });

  return allObjects;
}

Now, you will want to load your data into Elasticsearch using the bulk indexing operation. This function should look similar to the one in the previous section.

async function indexDataIntoElasticsearch(data) {
  const body = data.flatMap(doc => [{ index: { _index: 'nasa-node-js', _id: doc.id } }, doc]);
  await client.bulk({ refresh: false, body });
}

Finally, you will want to create an entry point for the function that will run according to the timer you set. This function is similar to a main function, as it calls the functions created previously in the file. There is also some additional logging, such as printing the number of records and informing you if the data was indexed correctly.

module.exports = async function (context, myTimer) {
  try {
    const lastUpdateDate = await getLastUpdateDate();
    context.log(`Last update date from Elasticsearch: ${lastUpdateDate}`);

    const rawData = await fetchNasaData(lastUpdateDate);
    if (rawData) {
      const structuredData = createStructuredData(rawData);
      context.log(`Number of records being uploaded: ${structuredData.length}`);
      
      if (structuredData.length > 0) {

        const flatFileData = JSON.stringify(structuredData, null, 2);
        context.log('Flat file data:', flatFileData);

        await indexDataIntoElasticsearch(structuredData);
        context.log('Data indexed successfully.');
      } else {
        context.log('No data to index.');
      }
    } else {
      context.log('Failed to fetch data from NASA.');
    }
  } catch (error) {
    context.log('Error in run process:', error);
  }

Conclusion

Using Node.js and Azure's Function App, you should be able to ensure that your Elasticsearch index is updated regularly. By utilizing Node.js's capabilities in conjunction with Azure's Function App, you can efficiently maintain your index's regular updates. This powerful combination offers a streamlined, automated process, reducing the manual effort involved in keeping your index regularly updated. Full code for this example can be found on Search Labs GitHub. Let us know if you built anything based on this blog or if you have questions on our forums and the community Slack channel.

Ready to try this out on your own? Start a free trial.
Want to get Elastic certified? Find out when the next Elasticsearch Engineer training is running!
Recommended Articles