Big Data

Learning Apache Spark

Apache Spark is an open-source cluster-computing framework. Apart from Spark core, we will also learn about its components such as Spark SQL, Spark Streaming, Mlib and GraphX. Though Spark can be used with Hadoop, it can also be used without. So for most notes, knowlwdge of Hadoop is not required. 

Getting Started With Apache Hadoop And Ecosystem

Get started learning about Hadoop and its ecosystem components through simple theory and Hands on exercises. 

Big Data and Data Science Notes

Here I will include notes on Big Data and Data Science concepts in general. There will be separate books on specific technologies like Hadoop.  

Tags (Notebook): 

Apache Zookeeper Notes

Apache ZooKeeper is a software project of the Apache Software Foundation, providing an open source distributed configuration service, synchronization service, and naming registry for large distributed systems.

Tags (Notebook): 

Apache Kafka Notes

Apache Kafka is an open source publish-subscribe based distributed messaging system. From the architecture perspective, Kafka is closer to traditional messaging systems such as ActiveMQ or RabitMQ. However from a Big Data and Hadoop perspective, Kafka can be compared with Scribe or Flume as it is useful for processing activity stream data.

Tags (Notebook):