Splunk: Bringing Big Data Analysis to the Rest of Us

Today’s IT departments need to deal with incredible amounts of machine data. Splunk collects, indexes and harnesses all the fast moving machine data generated by enterprise applications and devices. This is a Big Data challenges that others find intimidating, with terabytes of information in many formats and from many sources. Many large enterprises now generate hundreds of gigabytes daily without an IT staff to match. This can lead to a swamped IT department and under-utilized machine data. Splunk aims to simplify this sort of Big Data collection, sorting, and analysis to provide insightful results quickly, allowing regular companies and even non-technical employees to make the most of all of their data.

Splunk sometimes describes itself as a “search engine for machine data.” Like a search engine, Splunk sorts and indexes millions of items, retrieves results quickly, and is simple and intuitive to use. Splunk collects all of the machine data from applications, servers, and devices, whether physical, virtual or in the cloud, all without adapters, connectors, or back-end databases. Already, by cutting costs and increasing possibilities, Splunk works to bring Big Data to the masses, but the platform goes farther by simplifying the process and leaving the level of technical detail up to the user.

Splunk was originally conceived to pinpoint system problems by indexing and searching log files. Problems that would be otherwise hidden in millions of log file entries can be searched by timestamp, error code, user IP address, or whatever other information is available, and can be narrowed by clicking and toggling options in the results so that users don’t even need to know precise search syntax to refine a search. They can sort by the fields that are automatically generated from the data, and click to toggle them on and off. Error messages are also decoded and rendered in plain English. This simplified interface allows for almost anybody to work with the company’s Big Data, but Splunk provides more advanced users deeper dives and more technical views. It also makes monitoring simple and allows enterprises to be proactive rather than reactive by setting up dashboards to display relevant values and trends when you log on, as well as alerts when those values exceed certain defined boundaries.

By making Big Data analysis fast, inexpensive, and simple, Splunk has branched out beyond just finding systems errors to solve a variety of data-intensive problems and attract close to 3,000 enterprise customers from all sectors. NPR, for example, uses Splunk to get a more accurate count of listeners for their online streams and downloads. While earlier, using conventional analytics, NPR couldn’t tell the difference between one listener starting then restaurting a stream after a pause or connectivity problem, they can now derive listener information from server logs and combine separate related entries into one visit. Splunk has also been adopted by US federal agencies such as the Department of Homeland Security, theDepartment of Energy, and NASA to help with FISMA Continious Monitoring, for which Splunk has developed an app. With Splunk, these organizations can combine their operational and security data into an easy to read real-time dashboard to rapidly respond to threats and perform forensic analysis. Splunk also complements your other Big Data tools, such as Hadoop, in a variety of ways. For example, you can send log data from a Hadoop cluster to Splunk for troubleshooting, alerting, and monitoring, or analyze log data processed by Splunk with Hadoop.

If you’ve been curious about Big Data analysis, you can try it out yourself. Splunk is easy to set up and available for free download. The folks at Splunk are so confident in their software that even the Enterprise version is available for free for 60 days. It’s worth a try, and for the first time, that doesn’t just mean for IT people.