Memory Consumption of Hadoop NameNode

Each file or directory or block occupies about 150 bytes in the namenode memory. So a cluster with a namenode with 32G RAM can support a maximum of (assuming namenode is the bottleneck) about 38 million files. (Each file will also take up a block, so each file takes 300 bytes in effect. I am also assuming 3x replication. So each file takes up 900 bytes)

In practice however, the number will be much lesser because all of the 32G will not be available to the namenode for keeping the mapping. You can increase it by allocating more heap space to the namenode in that machine.

Replication also affects this to a lesser degree. Each additional replica adds about 16 bytes to the memory requirement.

(Each file metadata = 150bytes) + (block metadata for the file=150bytes)=300bytes so 1million files each with 1 block will consume=300*1000000=300000000bytes =300MB for replication factor of 1. with replication factor of 3 it requires 900MB

So as a thumb rule for every 1GB you can store 1million files.

There are several technical limits to the NameNode (NN), and facing any of them will limit your scalability.

  1. Memory — NameNode consumes about 150 bytes per each block.
  2. IO — NN is doing 1 IO for each change to the filesystem (like create, delete block etc). So your local IO should allow enough. It is harder to estimate how much you need. Taking into account the fact that we are limited in the number of blocks by memory you will not claim this limit unless your cluster is very big.
  3. CPU — Namenode has considerable load keeping track of health of all blocks on all datanodes. Each datanode once a period of time reports the state of all its blocks. Again, unless the cluster is not too big it should not be a problem.

--

--

Naveen - (Founder & Trainer @ NPN Training)

A software training institute which believes that technology has to be learnt under experienced practitioners — www.npntraining.com