Naveen - (Founder & Trainer @ NPN Training)Memory Consumption of Hadoop NameNodeEach file or directory or block occupies about 150 bytes in the namenode memory. So a cluster with a namenode with 32G RAM can support a…Aug 8, 2022Aug 8, 2022
Naveen - (Founder & Trainer @ NPN Training)Various Entry Points for Apache SparkIn **Data Engineering** Apache Spark is probably one of the most popular framework to process huge volume of data. In this blog post I am…Sep 27, 2021Sep 27, 2021
Naveen - (Founder & Trainer @ NPN Training)Apache Spark ArchitectureLearn the fundamentals of apache spark architecture, and various components of apache spark, Masters Node, Workder Node, Executor, Task etcApr 2, 2021Apr 2, 2021
Naveen - (Founder & Trainer @ NPN Training)SparkContext & SparkSessionLearn the difference between SparkContext and SparkSession which is used in spark 1.x and 2.x, how to create both of themMar 19, 2021Mar 19, 2021
Naveen - (Founder & Trainer @ NPN Training)Getting Started with Containerization and DockerIn this blog post we will learn the fundamentals of Containerization and Docker.Mar 4, 2021Mar 4, 2021
Naveen - (Founder & Trainer @ NPN Training)Data Skew Problem in SparkLearn how to solve one of the biggest problem to improve the performance of Spark i.e Data SkewFeb 25, 20211Feb 25, 20211
Naveen - (Founder & Trainer @ NPN Training)Flight delay dataset Analysis using HiveHive is a data ware house infrastructure built on top of Hadoop ecosystem to query and analyze structured data. It gives SQl like…Feb 25, 2021Feb 25, 2021
Naveen - (Founder & Trainer @ NPN Training)Getting Started with Sqoop — Part IMost of the organizations store their operational data in relational databases. So, there was a need for a tool which can import and…Feb 24, 2021Feb 24, 2021
Naveen - (Founder & Trainer @ NPN Training)Multi Module Project with MavenIn this blog post I will explain Multi Module Project with Apache Maven. First we will understand what is Multi Module Project in Maven.Feb 19, 2021Feb 19, 2021