Lambda Function to Resize EBS Volumes of EMR Nodes

I have to start by saying that you should not use EMR as a persistent Hadoop cluster. The power of EMR lies in its elasticity. You should launch an EMR cluster, process the data, write the data to S3 buckets, and terminate the cluster. However, we see lot of AWS customers use the EMR as a persistent cluster. So I was not surprised when a customer told that they need to resize EBS volume automatically on new core nodes of their EMR cluster. The core nodes are configured to have 200 GB disks, but now they want to have 400 GB disks. It’s not possible to change the instance type or EBS volume configuration of core nodes, so a custom solution was needed for it. I explained to the customer, how to do it with some sample Python code, but at the end they gave up to use this method (thanks God).

I wanted to see if it can done anyway. So for fun and curiosity, I wrote a Lambda function with Java. It should be scheduled to run on every 5 or 10 minutes. On every run, it checks if there’s an ongoing resizing operation. If the resizing is done, it connects to the node and run “growpart” and “xfs_growfs” commands to grow the partition and filesystem. If there’s no resizing operation in progress, it checks all volumes of a specific cluster, and start a resizing operation on a volume which is smaller than a specific size.

Here’s the main class which will be used by Lambda function:

package com.gokhanatil.volumeresizer;

import com.amazonaws.services.dynamodbv2.document.Item;
import com.amazonaws.services.lambda.runtime.Context;
import com.amazonaws.services.lambda.runtime.RequestHandler;

import java.util.Map;

public class Resizer implements RequestHandler<Map<String, Object>, String> {
    public String handleRequest(Map<String, Object> input, Context context) {

        String result = "{'result': 'success'}";

        Item volumeInfo = MyDynamoDB.getVolumeInfo();

        if (volumeInfo != null) {

            String targetVolume = volumeInfo.getString("volid");
            String targetInstance = volumeInfo.getString("pip");

            if (VolumeChecker.isResized(targetVolume)) {
                MySSH.runShellCommands(targetInstance);
                MyDynamoDB.deleteVolumeInfo();
                result = "{'result': 'resized " + targetVolume + "'}";
            } else
                result = "{'result': 'waiting for " + targetVolume + "'}";

        } else VolumeChecker.checkVolumes();

        return result;
    }

}

As you can see, it also uses DynamoDB to keep track of modified volumes. I created a table called “resizedvolumes”. Its Primary partition key is defined as “clusterid (Number)”. It should be scheduled to run every 10 minutes (or 5 minutes). In every run, it will check if any volume is resizing or requires repartitioning. If there is no volume requires repartitioning, it will check if any volume is undersized. If there is any volume undersized, it will start resizing and store its information to DynamoDB.

The name of the Lambda Handler function is “com.gokhanatil.volumeresizer.Resizer”. After you build the JAR, you upload to an S3 bucket and create the Lambda function. The lambda function will expect you to define some environment variables. To be able to connect the EMR nodes, it needs to access your private key. You can upload your private key to a S3 bucket and give the bucket name and file name as parameters. You also need to give the name of DynamoDB table. The other required variables are, cluster ID, the AWS region, and the target volume size. I think their names explain what they are used for.

When creating the Lambda function, you need to specify a VPC, a subnet and a security group, so you can configure the security groups of EMR nodes to accept connection from your lambda function. You can select the VPC, subnet and security group used by EMR master node.

As I said, you need to schedule it to run periodically. You can use “CloudWatch Events” trigger, and make it run every 5/10 (whatever you want) minutes.

Here’s the AWS policy for the IAM role which I assigned to my Lambda function:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "ec2:ModifyVolume",
                "elasticmapreduce:ListInstances",
                "dynamodb:PutItem",
                "dynamodb:DeleteItem",
                "ec2:DescribeVolumes",
                "dynamodb:GetItem",
                "ec2:DescribeVolumesModifications",
                "logs:*"
            ],
            "Resource": "*"
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::yourbucketname/*"
        }
    ]
}

I hope it helps. If you have any questions about the sample application, let me know. So I can explain them in more details.

Lambda Function to Resize EBS Volumes of EMR Nodes

Trending Articles

Stalker hid in bushes leaving his ex 'terrified'

मुख मैथुन से उठाएं सेक्स का भरपूर मज़ा, जानें क्या है इसका सही तरीकामुख मैथुन...

Black Angus Grilled Artichokes

BO RUSSELL BENDER Arrested by Clackamas County Sheriff's Office on Mar 11, 2020

Teen Shot In Miami Drive-By Dies From Injuries

A Bottle of Dew Class 6 Worksheet English Poorvi Chapter 1

[ROM][UNOFFICIAL][x1s][SM-G980F/DS][10] Resurrection Remix v8.6.6 for Samsung...

Our most epic blog yet, 4 stunning, gorgeous Curvy Kate Star In A Bra...

LC4245W - TOSHIBA LCD TV - POWER SUPPLY SCHEMATIC [Circuit Diagram]

JAVARIS FOSTER Arrested by Miami-Dade County Corrections on Feb 01, 2017

SANIDAPA LIVE IN HALDADUWANA 2005-06-26

Police charge man, 23, with assault and criminal damage following incident in...

Man arrested for threatening to shoot up police station

Hizia picha za utupu za meneja wa benki imekaaje?

Giorgio Moroder - Music From Battlestar Galactica and Other Original...

'Exceptionally dangerous' rapist Bradley Trengove from Camborne...

Chaoro Lyrics Translation | Mary Kom - Priyanka Chopra

Creating Database from Backup of a Terminated DB System

Tinny — Dzormo (Prod by Hammer)

Banks reluctant to lend on 400 Manx homes built in 1970s