HADOOP: “How to share Limited Storage of Datanode to the Namenode in Hadoop Distributed Storage Cluster?”

Shobhit Sharma
9 min readOct 16, 2020

--

Hadoop is an open source framework which provides distributed storage cluster to solve Big Data problems like Volume and Velocity. Generally, Big Data is an issue and the term describe huge amount or high volume of data.

In Hadoop Architecture, There is one namenode (Master), more than one datanodes (Slave) and can have multiple clients. The Datanode(s) share their storage to the namenode and it becomes the Powerful Distributed Storage Cluster where the large amount of data can be stored easily and instantly. Hadoop stripe the data in blocks and same time, the data is sent to the different nodes. This results, it can solve the Two major Big Data problem Volume and Velocity.

By Default, In Hadoop Architecture, The Datanode share their all the available storage the Master. For example, if datanode has 100 GB Hard Disk, Apart from Reserved Storage of Operating System or User Data files, It will share all their available space to the namenode. If OS reserves 25 GB out of 100 GB then, it will share approx 75 GB of storage to the Master. But, there is a solution for this problem, instead of sharing 75 GB of storage, we can share limited or customized space and make it available for Master or Namenode.

How to share limited storage to Master (Namenode) in Hadoop?

In Operating System where the Datanode or Slave is running, we need to do some configuration to solve this issue. The solution for this challenge is “Hard Disk Partition”.

If we want to share “n1” amount of available space from “n” amount of free space available to the Namenode, we need to create a partition of “n1” amount of space.

In Hadoop, In Context of Datanode, At Initial Step of configuration, we use to configure a file called “hdfs-site.xml”. In this file, we add one property which tells Hadoop that this machine is datanode and you have to use a directory for Hadoop Data Storage, to serve the file(s) in future. Finally, The motive of adding this property is to configure a folder or a directory which is going to be shared to the Hadoop Distributed Storage Cluster.

In “hdfs-site.xml” the property looks like

<configuration<property><name>dfs.data.dir</name><value>/shobhitDataNode</value></property></configuration>

Inside the tag <name></name>, The “dfs.data.dir” describes that this is the Distributed File Storage (dfs) for Datanode (data) and it is using the configured directory (dir).

Inside the tag <value></value>, The “/shobhitDataNode” is the directory for what the property “dfs.data.dir” is talking about.

<property></property> uses the Name-Value Pair to describe property under the <configuration></configuration>.

The directory “/shobhitDataNode” is a folder which is created in the root directory of the operating system. This directory initially using the complete available space of the System, where the Datanode is runnning.

The conclusion is if the System has 80 GB out of 100 GB Free space then this folder “/shobhitDataNode” will share all the free space to the Namenode.

Creating Partition to share limited storage instead of complete available free space

The partitions in hard drive in short use to create separate block of Hard Disk to use is according to the requirement like Installing different Operating System in single Hard Drive, Storing sensitive files in different partitions instead of storing them in os partition to prevent data loss if OS may be corrupted.

CREATING PARTITION

I have Red Hat Enterprise Linux version 8 running on AWS EC2 Service. In this OS, I have two different Hard Drives, first one is 10 GB in size and second is 1 GB in size. For this operation, I will use my second hard drive which is 1 GB in size. (You can use any hard drive, There is no restrictions.)

Steps to create partition

1. In terminal or console, we need to first run a command to check available hard drive(s) or partition(s).

Note: Make sure that the current user is “root”

fdisk -l

The output will be

Disk /dev/xvda: 10 GiB, 10737418240 bytes, 20971520 sectorsUnits: sectors of 1 * 512 = 512 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytesDisklabel type: gptDisk identifier: 66B3909F-969E-4FD1-901C-CEE3A9974A83
Device Start End Sectors Size Type/dev/xvda1 4096 20971486 20967391 10G Linux filesystem/dev/xvda128 2048 4095 2048 1M BIOS boot
Partition table entries are not in disk order.Disk /dev/xvdf: 1 GiB, 1073741824 bytes, 2097152 sectorsUnits: sectors of 1 * 512 = 512 bytesSector size (logical/physical): 512 bytes / 512 bytesI/O size (minimum/optimal): 512 bytes / 512 bytes

In this output, there are two Disks available, I will use Disk “/dev/xvdf” which is 1 GB in size.

2 Now, we need to select the second Disk by using the following command

fdisk /dev/xvdf

The output will be

Welcome to fdisk (util-linux 2.30.2).Changes will remain in memory only, until you decide to write them.Be careful before using the write command.
Device does not contain a recognized partition table.Created a new DOS disklabel with disk identifier 0x6d9d5e0e.
Command (m for help): nPartition type p primary (0 primary, 0 extended, 4 free) e extended (container for logical partitions)Select (default p): pPartition number (1-4, default 1): 1First sector (2048-2097151, default 2048): 2048Last sector, +sectors or +size{K,M,G,T,P} (2048-2097151, default 2097151): +500M
Created a new partition 1 of type 'Linux' and of size 500 MiB.
Command (m for help): wThe partition table has been altered.Calling ioctl() to re-read partition table.Syncing disks.

In this output, there are some important operations I have done.

First is

Command (m for help): n

In this command, I’ve entered “n” to create new partition.

After that

Partition type  p  primary (0 primary, 0 extended, 4 free)  e  extended (container for logical partitions)Select (default p): p

In this command, I’ve entered “p” to create parition type “primary”.

After that

Partition number (1-4, default 1): 1

In this command, I’ve entered “1” to create first partition of this disk.

After that

First sector (2048-2097151, default 2048): 2048

In this command, I’ve entered “2048”, as an initial sector size, This will be discussed later.

After that

Last sector, +sectors or +size{K,M,G,T,P} (2048-2097151, default 2097151): +500M

In this command, I’ve entered “+500M” size to create a partition of 500 MB.

And finally

Command (m for help): w

I’ve entered “w” to write or apply all the changes I’ve made in this Hard disk operation.

The First Sector defines the initial size of partition, that means, in this case, the size is 2048 because from 0 to 2047 it is reserved by hard disk for other operations. The Last sector is +500M that means from 2048 sector size to 500 MB, it will create 1st partition of the particular size defined in this operation.

FORMATTING THE PARTITION

Before Format, we need to check the partition name, there is one command, it will show the partition size in human readable format.

lsblk

This will show

NAME   MAJ:MIN RM SIZE RO TYPE MOUNTPOINTxvda   202:0   0  10G 0 disk+-xvda1 202:1   0  10G 0 part /xvdf   202:80  0   1G 0 disk+-xvdf1 202:81  0 500M 0 part

This command told that xvdf is a disk with size 1 GB and in this disk there is an empty parition named xvdf1 with size 500M that means 500 MB.

To format the partition, we need to enter the following command

mkfs.ext4 /dev/xvdf1

This command will format the partition “/dev/xvdf1” of hard drive /dev/xvdf. The ext4 is a format type or partition type.

The output will be

mke2fs 1.42.9 (28-Dec-2013)Filesystem label=OS type: LinuxBlock size=1024 (log=0)Fragment size=1024 (log=0)Stride=0 blocks, Stripe width=0 blocks128016 inodes, 512000 blocks25600 blocks (5.00%) reserved for the super userFirst data block=1Maximum filesystem blocks=3407872063 block groups8192 blocks per group, 8192 fragments per group2032 inodes per groupSuperblock backups stored on blocks:       8193, 24577, 40961, 57345, 73729, 204801, 221185, 401409
Allocating group tables: doneWriting inode tables: doneCreating journal (8192 blocks): doneWriting superblocks and filesystem accounting information: done

THE FINAL STEP : MOUNT

Now the game starts, we already know that, Hadoop datanode uses a folder or directory which will share all the available storage to the Namenode or Hadoop Cluster.

After creating and formatting the partition, the 3rd or final step is Mount or Mounting the partition. If we want to use a parition for some purpose, we need to mount it to some folder. In storage system, we can’t do any file operating without any directory and the directory is defined by a folder. So, The filesystem is uses a folder of can say a file system is like a folder.

For example, If I create a partition of 10 GB named /dev/xabc1, and after formatting it, I will mount it to the folder called “shobhit10GB”. This is an example folder.

According to this example, this operation will be completed by the following commands.

To create a folder, we need to run this command

mkdir shobhit10GB

and After creation, The command to moumt a partition of 10 GB is

mount /dev/xabc1 /shobhit10GB

This command mount the 10 GB size partition to this folder “/shobhit10GB”.

The conclusion is the folder /shobhit10GB has the size of 10 GB.

Now coming to the real operation, we need to mount 500 MB size partition to the folder “/shobhitDataNode”. Important thing to be remembered is the same folder is used by the Hadoop Datanode. That means, after mounting the partition to this folder which is used by Hadoop Datanode, The result is, The Hadoop Datanode will only share the 500 MB to the Namenode or Master or Hadoop Cluster.

The command to mount the following partition to the following folder is

mount /dev/xvdf1 /shobhitDataNode

That’s all, This partition is successfully mounted to the Hadoop Datanode directory or folder, that means, The overall size and capacity of Datanode directory is 500 MB.

To check the folder is successfully mounted, we need to run this command

df -h

The output will be

Filesystem     Size Used Avail Use% Mounted ondevtmpfs       474M    0 474M  0% /devtmpfs          492M    0 492M  0% /dev/shmtmpfs          492M 436K 492M  1% /runtmpfs          492M    0 492M  0% /sys/fs/cgroup/dev/xvda1      10G 2.2G 7.9G 22% /tmpfs           99M    0  99M  0% /run/user/1000/dev/xvdf1     477M 2.3M 445M  1% /shobhitDataNode

Great, This partition is successfully mounted.

VERIFYING THE OPERATION

After all the operations, finally, we need to verify that the Datanode can share the 500MB of storage or not. For this we need to start datanode and connect it to the Namenode.

To start datanode, we need to run the following command

hadoop-daemon.sh start datanode

The output will be

starting datanode, logging to /var/log/hadoop/root/hadoop-root-datanode-ip-172-31-20-10.us-east-2.compute.internal.out

After that, one more optional thing but recommended we need to do, we should check whether datanode is running or not, is it configured successfully or not, To do this, we need to run a single command

jps

The output will be

3636 Jps1961 DateNode

If DataNode service is running, then the Datanode is working fine.

Now, we need to go to the next step

CHECKING THE HADOOP CLUSTER STATUS

After starting the Datanode, now we need to check the status by running this command in Namenode

hadoop dfsadmin -report

The output will be

Configured Capacity: 0 (0 KB)Present Capacity: 0 (0 KB)DFS Remaining: 0 (0 KB)DFS Used: 0 (0 KB)DFS Used%: ?%Under replicated blocks: 0Blocks with corrupt replicas: 0Missing blocks: 0
-------------------------------------------------Datanodes available: 0 (0 total, 0 dead)
[root@ip-172-31-10-143 ~]# hadoop dfsadmin -reportConfigured Capacity: 499337216 (476.21 MB)Present Capacity: 466568192 (444.95 MB)DFS Remaining: 466548736 (444.94 MB)DFS Used: 19456 (19 KB)DFS Used%: 0%Under replicated blocks: 0Blocks with corrupt replicas: 0Missing blocks: 0
-------------------------------------------------Datanodes available: 1 (1 total, 0 dead)
Name: 18.216.147.60:50010Decommission Status : NormalConfigured Capacity: 499337216 (476.21 MB)DFS Used: 19456 (19 KB)Non DFS Used: 32769024 (31.25 MB)DFS Remaining: 466548736(444.94 MB)DFS Used%: 0%DFS Remaining%: 93.43%Last contact: Thu Oct 15 11:46:38 UTC 2020

Great, it works fine, We can see there is Total 1 Datanode connected with the Namenode and the size of the shared storage is 476 MB (500 MB).

We can also check the WebUI of this Namenode to confirm again

The Address should be “<public_ip_of_namenode>:50070”

For example “34.3X.XX.1XX:50070”

The output will be

Finally, Using “Partition” method, we can share limited storage or better say, according to you, you can share customized or limited storage to the namenode instead of sharing complete available space.

This Article is originally written, edited and published by Shobhit Sharma.

--

--

Shobhit Sharma
Shobhit Sharma

Written by Shobhit Sharma

Documenting my life's experiences and learning.

No responses yet