Flume @ Goibibo

05 May 2015


We, at Goibibo,  rely heavily on open source technologies to manage our systems. One such technology is Apache Flume (https://flume.apache.org/).  Flume is a distributed system for collecting, aggregating and moving large amounts of data from different sources to a centralized location.  Flume can be used for transferring log data, social media data (e.g. tweets) and pretty much any data that we have.


Without going much into details about Flume, I will talk about how we use it at Goibibo and how it plays an integral part in our log aggregation, system health monitoring and critical alerting.

We have approximately 35 different application servers from which data is transferred to our 13 hadoop machines and finally to various locations. A high level diagram would be something like this:


 We stream various forms of log data from our application servers. We aggregate the following logs using flume:

  1. Apache logs
  2. uwsgi Logs
  3. Celery Logs
  4. Custom Application Logs


 As explained in the flow diagram above, we stream all these logs from our application servers to the data nodes of our hadoop cluster. To handle fail over, we use fail over sink in Flume and transfer data to multiple data nodes at a time with a set priority. If, in case, a particular data node fails, the other one automatically becomes active and starts collecting data from application servers. One thing to note here is that one application server pushes data to only one data node at any point of time. This data is collected pretty much real time across all the machines.


 Once the data is collected at the data nodes, we finally route it to various places as per different use cases.


Here are some stats regarding the number of logs we transfer and index using flume. These numbers are not as great as what Google or Linkedin deals with, but provides a good insight into the scale we are managing with a mere 13 node hadoop cluster.

Total Logs Transferred in a day: ~150 million
Logs inserted into MongoDB in a second: ~500
Total Logs Indexed in a day: ~50 million

What next 

In the next article I will share further details about our flume agents and how they are connected to each other. I will also share some workarounds which we had to do to make the systems work in a smooth and failure resistant way.