What is Log analysis? Why we need it?
In this article, we are going to see an overview of ELK stacks.
What exactly is ELK stack and its components, and what log analysis means!
Let us consider you have hundreds of servers that you support in production and it becomes hard to debug especially if you are facing any issues and you need to find out which server is having issues. Moreover, to narrow down the actual error message among hundreds of servers is really a tedious task. To solve this problem, we have ELK Stack that comprises of Elastic Search, Logstash and Kibana.
Normally when you have hundreds of web servers, you would like to see the web services log, which will give you specific information of those servers, but it is hard to log in to each server and check the logs. Instead if you have a centralized mechanism to visualize the logs that will be helpful. That is one scenario. The second scenario is going to be performance analysis. Assume that your application is getting slow and you want to check what is going on by checking the logs, in that case, you need to analyse the logs.
These are some cases where you need log analysis to solve that and in such cases, we have ELK stack in place. Logs are of different formats. For example, consider we have different types of servers like database servers, application servers in your environment. So with each of such servers, we have different type of logs and log sources from which we need to make a structured and meaningful report out of them. We need to have a decentralized place to look at all the logs in one shot.
Efficiency is the key here. Therefore, the process involves collection of data, cleaning up of the data, converting them into structured format, and analysing the data.
Elastic search is a search engine or search server and a no SQL database that uses indexes to search which is a very powerful mechanism in terms of providing the search functionality.
Then we have Kibana to visualize whatever data gathered by Elastic search.
You can create and manage dashboards using Kibana and customise graphs according to your business requirements. Moreover, it is created to visualize all the health of the environment in one place.
We have yet another component called Logstash that allows you to ingest the unstructured data from a variety of data sources including system logs, website logs and application server logs. It also offers something called pre-built filters, so you can readily transform common data types, index them in Elastic search, and start querying without having to build custom data transformation pipelines. So briefly, the benefit of using Logstash is, you can easily load the unstructured data from various sources and can have pre-built filters, which can be used to do the transformations that are needed.
In addition to that, we have one another component used in ELK stack that is Filebeat. It is one of the best log file shippers out there today. It is lightweight, supports SSL and TLS encryption, supports back pressure with a good built-in recovery mechanism and it is extremely reliable.
Moreover, this is the best log file Shipper that is commonly used in the live environment and then Logstash is used to aggregate the logs by pulling data from various sources before pushing it down the pipeline usually in elastic search.
Therefore, we always need to remember that Filebeat and Logstash are used in conjunction.
So let us look at the typical pipeline.
Now the left side is the Filebeat that runs on each server and ships the logs to a specific location, which is Logstash. Therefore, the Filebeat and other sources ship data to Logstash for processing and transform data access to pipeline and is forwarded to Elasticsearch. Then the Elasticsearch receives data from Logstash, Indexing it for faster searching and stores it in a No SQL database. Once the data is index and stored in the Elasticsearch, we use Kibana to provide visualization of dashboard.
Below is a sample Kibana dashboard and you can create your own dashboards similar to this as per your business requirements.
Therefore, you can see the dashboard for logs off many servers, and you can see the IP addresses which has issues categorized in a pie chart format.