Running a containerized MongoDB application using an attached EBS volume data store

Over the past few months, I’ve been learning how to build ML pipelines consisting of containerized micro services. A common task in such pipelines is to send data over a message queue to be stored in a database. The message queue insulates the data producer from the latency and delays associated with writing data to the database by introducing a layer of indirection. Message queues also provide a level of fault tolerance to data loss because of the unavailability of the database server by caching messages that couldn’t be stored. I may cover message queues in another post. In this post, I’ll focus on running a MongoDB database as a containerized application on a AWS EC2 compute instance that uses an attached EBS volume as the data store. Running Mongo in a container makes it easy to deploy it on any compute instance that supports containers and using an attached EBS volume to store data makes it easy to migrate data and increase the size of the data store if more capacity is needed.

The main steps in the process are:

  1. Creating an EBS volume
  2. Attaching it to an EC2 compute instance as a mounted disk
  3. Installing and running MongoDB in a container and configuring it to use the mounted EBS volume as the datastore.

While instructions for all these steps are described in various technical documentation, blog posts and stackoverflow question, it still took me a while to set all of this up. Therefore, I thought it would be helpful to write an article that consolidates all the info already available with my contributions. Let’s go through these steps in detail.

Creating an EBS volume:

The steps to create an EBS volume on AWS are described in detail here. I’ll outline some points I found a bit tricky.

  • Availability Zone: The EBS volume will be available only to instances in the same availability zone, so choose the same availability zone as the one your database instances are located.

  • GiB vs. GB: GiB: 1024^3, GB: 1000^3
  • MiB vs. MB: MiB: 1024, MB: 1000
  • IOPS: Input/Output Operations per second
  • Throughput: Number of bytes read or written per second. Throughput =  IOPS \times bytes/IO operation. See this stackoverflow post for more details.

EBS volume types

AWS documentation provides detailed information about EBS volume types, but the information is scattered across many pages. In the table below, I’ll consolidate all relevant information. As the table shows, the general purpose SSD should suffice for most applications. For more pricing information and examples, see this and for more details about EBS volumes, see this. I recommend just picking the general purpose SSD and following the rest of the instructions. Once you have everything working, you can experiment with different EBS volumes.

Volume Type Description Min/Max (GiB) Cost Description
General Purpose SSD Baseline performance is 3 IOPS per GiB, with a minimum of 100 IOPS and a maximum of 10000 IOPS. This means if you provision 10 GiB, you’ll get at least 100 IOPS. If you provision 100 GiB, you’ll get at least 300 IOPS. General Purpose (SSD) volumes under 1000 GiB can burst up to 3000 IOPS. 1 I/O: 16 KiB. Maximum IOPS/Volume = 16,000. @16 KiB per I/O, maximum throughput = 250 MiB/sec (~16\times 16)

You are unable to provision a specific IOPS performance. For that you need Provisioned IOPS.

 

1 – 16384 $0.10 per GB-month of provisioned storage General purpose SSD volume that balances price and performance for a wide variety of workloads
Provisioned IOPS Provisioned IOPS allow you to provision a specific IOPS performance. There are two relevant settings with Provisioned IOPS. The amount of storage and the IOPS performance. The maximum IOPS performance you can provision depends on the amount of storage requested. you can provision up to 50 IOPS per GiB with a maximum of 20000 IOPS. You pay for both the storage and the provisioned IOPS (see the cost column).  1 I/O: 16 KiB. Maximum IOPS/Volume = 64,000. @16 KiB per I/O, maximum throughput =1000 MiB/sec (~64\times 16) 4 – 16384 $0.125 per GB-month of provisioned storage AND \$0.065 per provisioned IOPS-month Highest-performance SSD volume for mission-critical low-latency or high-throughput workloads
Cold HDD Maximum IOPS/Volume = 500. @1 MiB per I/O, maximum throughput = 500 MiB/sec 500 – 16384 $0.045 per GB-month of provisioned storage Low-cost HDD volume designed for frequently accessed, throughput-intensive workloads
Throughput Optimized HDD Maximum IOPS/Volume = 250. @1 MiB per I/O, maximum throughput = 250  MiB/sec 500 – 16384 $0.025 per GB-month of provisioned storage Lowest cost HDD volume designed for less frequently accessed workloads
Magnetic N/A 1 – 1024 N/A N/A

Attaching an EBS volume to an instance

Now you have created the EBS volume, next step is to attach it to your EC2 instance. The steps to create an EC2 instance are listed here. It is a good idea to attach an appropriate tag to your instance so you know that it is a database instance. For example, I use the key-value pair (instance-type, db-node) for my EC2 database instance. When your instance is up and running, it will need to be able to access EC2 and S3 AWS services to finish the steps outlined below. This access can be provided in two ways:

  1. You can create an IAM user with the appropriate access and then SSH into your instance and use aws configure to configure the settings (AccessKey and SecretAccessKey) that AWS Command Line Interface (CLI) uses to interact with AWS.
  2. A better approach is to create an IAM Role and attach the AmazonEC2FullAccess and AmazonEC2S3FullAccess policies. You can then specify this role when creating your EC2 instance. This way your instance will launch with the required access to AWS services and no further configuration steps are needed. For better security, you could create a targeted policy that provides access to the specific resources (eg. a specific S3 bucket) you need. Later in this article, we’ll see why access to S3 is needed.
Attaching the AmazonEC2FullAccess and AmazonS3FullAccess policies to an IAM role
Specifying the role to launch your EC2 instance (Step 3: Configure Instance Detail)

Make sure to launch your instance in the same VPC as where you created the EBS volume. The VPC can be specified in Step 3: Configure Instance Detail while launching your EC2 instance. MongoDB listens on port 27017 by default. To make your instance accessible over this port, you should either open this port on your default security group or create a new security group with access to this port (which I would recommend). Instructions on creating a new security group are here.

Once your instance is up and running, SSH into your instance and run the following bash script. Each step is commented in detail. These instructions assume that Ubuntu Bionic 18.04 OS is installed on your EC2 instance. These instructions install aws-cli, attach the EBS volume you just created and maps the directory /mongo-data to the attached EBS volume

Installing MongoDB and running it as a containerized application

The final step is to install and run MongoDB as a containerized application and opening the port it listens on to the internet so the rest of your application can communicate with it. First, we need a Dockerfile that builds our MongoDB image.

This file builds MongoDB version 4.2. For another version, use the official Mongo docker-library

Few things to note about this file:

  1. It copies docker-entrypoint.sh to /usr/local/bin and uses the script as the docker container entrypoint. You can find this file in the official Mongo docker-library listed above. Make sure the user docker runs as has read/write permission to this file, otherwise it may not be able to copy it inside the container image. The entrypoint file sets up a number of things:
    1. It uses the MONGO_INITDB_ROOT_USERNAME and MONGO_INITDB_ROOT_PASSWORD environment variables to set up a root user and run mongo using –auth (in authorization mode). This mode enables authorization so that clients that connect to the mongodb instance must authenticate themselves as a MongoDB user and can only perform actions as determined by their assigned roles. See this for more details about mongodb access controls. Here’s the relevant snippet from docker-entrypoint.sh:

      You can set the MONGO_INITDB_ROOT_USERNAME and MONGO_INITDB_ROOT_PASSWORD either using the –env option in docker run or in a docker-compose.yml file. I used a docker-compose.yml file and the contents are shown below.
    2. If shouldPerformInitdb is set (see code snippet above), it loops over all the .js and .sh files in the docker-entrypoint-initdb.d directory inside your container and executes them in a mongo shell. You can use this mechanism to perform any other initialization actions. I use this mechanism to run a .js file that creates an admin user that has access to my databases. This file is also shown below.
  2. Exposes port 27017 and runs mongod so when you run your container, mongod will be running and accepting connections on port 27017.

With that said, here is the content of mongo-init.js and docker-compose.yml files.

Note that we set the MONGO_INITDB_ROOT_USERNAME and MONGO_INITDB_ROOT_PASSWORD environment variables in the docker-compose.yml and also map the /mongo-data directory to which we mapped our EBS volume to /data/db, the default data directory used by mongo. Now, Mongo will use our attached EBS volume to store its data! If you want to migrate your data, all you need to do is to detach the volume and attach it to another instance.

To make things simple, I zipped up the four files – mongo-init.js, docker-compose.yml, Dockerfile (to install and run Mongo) and docker-entrypoint.sh and placed the zip on a AWS S3 bucket. I then use AWS CLI to copy the file to the EC2 instance and run docker-compose. Remember we attached the AmazonS3FullAccess policy to the IAM role we launched our instance in? Now you see why that was necessary.

The bash script to carry out all these steps is shown below. It copies the zip file from S3, unzips it, installs docker and docker-compose and runs the docker-compose.yml file. You can append the content at the end of the previous script and run it as a single script.

That’s it! Your client should now be able to authenticate using the username/password specified in the mongo-init.js and connect to your MongoDB database. The easiest way to test this is using the MongoDB Compass application. Hope you found the info helpful. Please leave a comment if you did.

1 Comment

  1. Thanks a lot for this! A few notes (hope this all makes sense):

    The volume was already attached by the instance setup process – there’s a place to do that. It was named /dev/sdb and all I had to do was initialize the fs, create a mount point and mount.

    The docker install process for Amazon Linux 2 instances appears to be different as well (https://docs.aws.amazon.com/AmazonECS/latest/developerguide/ecs-agent-install.html).

    Specifically:

    # Different
    # install docker
    sudo amazon-linux-extras install -y ecs; sudo systemctl enable –now ecs
    #Ensure that the agent is running
    curl -s http://localhost:51678/v1/metadata | python -mjson.tool

    # Similar
    sudo usermod -aG docker $USER
    sudo chkconfig docker on

    # Same
    # install docker compose:
    sudo curl -L “https://github.com/docker/compose/releases/download/1.25.0/docker-compose-$(uname -s)-$(uname -m)” -o /usr/local/bin/docker-compose
    sudo chmod +x /usr/local/bin/docker-compose

    # Run the docker container
    docker-compose up –build -d mongodb

Leave a Reply

Your email address will not be published.


*