Blog Data Analytics with Elasticsearch, Logstash and Kibana (ELK) RSS Feed

Data Analytics with Elasticsearch, Logstash and Kibana (ELK)

Jun 21, 2018

ELK stack, scales nicely and works together seamlessly, is a combination of three open source projects –

  • Elasticsearch: founded in 2012, commercially supported open-source, built on top of Lucene, uses JSON and has rich API
  • Logstash: it’s there since 2009, as a method to stash logs
  • Kibana: it’s around since 2011, to visualize event data

ELK is mostly used in log analysis and end to end Big Data analytics. This is a mini tutorial on setting up ELK stack so that you can implement the solution on top of it. 

ELK Stack Installation steps

  1. Go to its official website https://www.elastic.co/downloads and download below products in a separate directoryinstallation steps
  2. Extract all the three downloads. Here in this tutorial we are using windows10 as a host or OS.
  3. To start Elasticsearch

    • Go to the <<Elasticsearch>>/bin and run elasticsearch.bat as an administrator.
    • After starting Elasticsearch server check http://localhost:9200 in browser to confirm the startup.
  4. To start Kibana

    • Go to the <<Kibana>>/bin and run kibana.bat as an administrator.
    • After Kibana server is started check http://localhost:5601 in web browser.
  5. To start Logstash

    • Go to the bin directory of Logstash and open command prompt as an administrator

      logstash -e 'input { stdin { } } output { stdout {} }'
    • When the main pipeline starts (“Pipeline main started”), type any message in the command prompt.
    • If everything is working seamlessly, Logstash will return your message with appended timestamp and IP.

Architectural description of ELK stack

ELEK Stack Architecture

As we can see in the above architecture, Logstash collects the raw data from various sources like HDFS, logs (system logs, HTTP logs, proxy logs etc.), Twitter streams, MySQL, etc and sends for further processes. Let’s try to nibble every component from this ELK stack and

  1. Elasticsearch– Elasticsearch is a highly scalable real-time distributed search engine, which is mostly used for analysing and indexing the data. 

    • It uses Lucene engine for fast searching and indexing.
    • It uses full text based searching.
    • Elasticsearch is an unstructured database which stores the data in the documents.
    • Elasticsearch runs in cluster mode and data is distributed on every node.

      Elasticsearch RDBMS
      Index Database
      Shard Shard
      Mapping Table
      Field Field
      JSON Object Tuple
    • Comparison between Relational database and Elasticsearch
    • “Index” in Elasticsearch is a collection of different type of documents and document properties. When data is pushed to the Elasticsearch, the data is arranged in indexes of Lucene, then Elasticsearch uses the Lucene indexes to read/write operations.
    • To create Index, raise a PUT request http://localhost:9200/index_name
    • You can search your data with http://localhost:9200/index_name/_search? As shown in below screenshot screenshot
  2. Logstash– As shown in the above architectural diagram

    • Logstash collects logs and events from various sources like HDFS, MySql, logs (system logs, application logs, network logs), twitter etc and.
    • It transforms the data and sends to the Elasticsearch database.
    • At the same time Logstash uses a number of inputs, filters and output plugins. It transforms the raw data based on specified filters in its configuration file.
    • Here is an example of Logstash configuration file screenshot_2
    • Above file contains the information of input location, output location and the filter (This needs to be applied to the processed data.)
    • The following command will help you to start Logstash with configuration filescreenshot_3

    As shown above, Logstash has started the pipeline between Elasticsearch and Logstash and then parsing the data to Elasticsearch has started. If we want to visualize the data, we will use Kibana, the visualization tool.

  3. Kibana– Kibana is an opensource visualization tool which provides a beautiful web interface to visualize the Elasticsearch data.

    • Kibana allows us to create real-time dashboards in browser based interfaces.
    • Kibana has different visualization effects like bar charts, graphs, pie charts, maps, tables etc.
    • It allows to save, edit, delete and share the dashboards.
    • After starting Kibana.bat file open http://localhost:5601 in browser and go to Management View like in the below screenshotscreenshot_4
    • From the above picture select your “Index_name” and move ahead to work on that Index.
    • Discover option will allow you to see the data as shown in the below screenshotscreenshot_5
    • Dashboard option will allow you to create your own dashboard which can have multiple visuals as in the below screenshotgraph
    • Kibana “DevTool” option helps you to interact with elasticsearch data. For example, if I want to search records of my Index. , we can see how it works belowscreenshot_6
  4. Elasticsearch-Hadoop– Es-Hadoop is a new distribution recently introduced by Elasticsearch. which allows people to work with Big Data and Hadoop ecosystem seamlessly. It offers native support for Apache Spark, Streaming, Kafka, Hive, Storm, MapReduce etc. ES-Hadoop has been certified with other partners like CDH, MapR and HDP.Elasticsearch-Hadoop

Use Cases or Examples of ELK Implementations

  1. DELL – Powering the Search to Put the Customer First.
  2. Facebook– Delivering a better help experience for over a billion users
  3. Microsoft– Providing search on Azure and powering Social Dynamics
  4. IBM– Providing the operational log analysis engine for Bluemix Apps
  5. Salesforce– Empowering businesses with log analysis for usage trends
  6. Accenture– Powering the search for the best client service
  7. Sprint– Analyzing 200 dashboards to search for better retail operations insight
  8. Symantec– Successfully switched from Solr to Elasticsearch with Elastic Support
  9. SunHotels– Scaling anomaly detection across 1000+ bookings a day with Elastic machine learning
  10. BBC– Unlocking yesterday's content for the future of media search

TatvaSoft being a Software Development Company and working over the time with various projects have a deal with the Big Data Analytics services and consultancy for the clients from various industries. We even conveyed a project to the Media & Entertainment Industry using Elastic Search functionality for boosting up the purpose and process.

To know more about the project performed – Digital Distribution Platform

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *

About Tatvasoft

TatvaSoft is a CMMi Level 3 and Microsoft Gold Certified Software Development Company offering custom software development services on diverse technology platforms, like Microsoft, SharePoint, Biztalk, Java, PHP, Open Source, BI, Big Data and Mobile.

Follow Us