Big Data Analytics Application to Solve Known Security Issues

Dr. Aly, O.
Computer Science

Introduction

The purpose of this discussion is to identify two advantages of applying Big Data Analytics to solve known security issues such as malware detection, network hacking, spam, and so forth.  The discussion and the analysis include the reasons and rationale for utilizing Big Data Analytics to solve security issues.  The discussion begins with a brief overview of Big Data Analytics and the Security Threats.

Big Data Analytics

Big Data (BD) is the major topic across some domains and fields such as management and marketing, scientific research, national security and government (Vivekanand & Vidyavathi, 2015).  BD enables making an informed decision as it shifts the reasoning from logical and causality-based to the acknowledgment of correlation links between events (De Mauro, Greco, & Grimaldi, 2015).   The public and private sectors are increasing their use of the Big Data Analytics (BDA) in different areas (Vivekanand & Vidyavathi, 2015).  The process of very large amounts of data is the main benefit of Big Data Analytics (Emani, Cullot, & Nicolle, 2015).  Big Data Analytics is defined in (Emani et al., 2015) as the use of advanced analytics techniques on Big Data. As elaborated by (Gupta & Jyoti, 2014), BDA is the process of analyzing Big Data to find hidden patterns, unknown correlations and other useful information which can be extracted to make a sound decision.  In (CSA, 2013), BDA is described as the process of analyzing and mining Big Data and can produce operational and business knowledge on an unprecedented scale and specificity.  The massive volume of semi-structured, unstructured data can be mined using the BDA (Gandomi & Haider, 2015; Gupta & Jyoti, 2014).  The need and the requirement to analyze and leverage trend data which are collected by organizations is one of the main drivers for BDA tools (CSA, 2013).  The value of BDA is increasing as the cash flow is increasing.  Figure 1 illustrates the graph for the value of BDA with dimensions of time and cumulative cash flow.  Thus, there is no doubt that BDA provides great benefits to organizations.  

Figure 1.  The Value of Big Data Analytics. Adapted from (Gupta & Jyoti, 2014).

Big Data Analytics for Security

BD is changing the analytics landscape (CSA, 2013).  BDA can be leveraged to enhance the information security and situational awareness (CSA, 2013).  For instance, BDA can be utilized to analyze financial transactions, log files, and network traffic to identify anomalies and suspicious activities, and to accelerate multiple sources of information into a coherent view (CSA, 2013).  The malicious attacks have been increasing lately.  Thus, the increasing security threats come along with increasing use of BD, BDA, and Cloud Computing technologies. The malicious attacks have become the major topic of government, organization, and industry (Gupta & Jyoti, 2014).  Big Data Security Analytics is used for the increasing practice of organization to gather and analyze security data to detect vulnerabilities and intrusions by attackers (Gupta & Jyoti, 2014).   The Advanced Persistent Threats (APT) is a subset of the malicious attacks and threats which are well-resourced and trained attacks which conduct multi-year intrusion campaigns targeting highly sensitive economic, proprietary or national security information (Gupta & Jyoti, 2014).  The aim APT is to maintain the persistent attack without getting detected inside their target environment (Gupta & Jyoti, 2014).

Thus, the main purpose of using BD techniques to analyze the data and apply same to implement enhanced data security techniques (Gupta & Jyoti, 2014).  Big Data technologies facilitate a wide range of industries to develop affordable infrastructures for security monitoring (Cardenas, Manadhata, & Rajan, 2013). Organizations can use various systems with a range of Security Analytics Sources (SAS).  These systems can generate messages or alerts and transmits them to the trusted server for analysis and action (Gupta & Jyoti, 2014).  The system can be Host-based Intrusion Detection System (HIDS), an antivirus engine which writes a Syslog or interfaces reporting events to remove service such as Security and Information Event Monitoring (SIEM) system (Gupta & Jyoti, 2014).  

There are very good reasons for BD to enter the security domain.  In (Gupta & Jyoti, 2014) three main reasons for BD to enter the enterprise security mainstream.  The first reason is the continuing problems with detection and response of threats because the existing security analytics tools are found inadequate to handle advanced virus, malware, stealthy attack techniques, and the growing army of well-organized global cyber attacks.  The second reason is the Moore’s Law and Open Source.  Security vendors are increasing the development cycles by customizing open source tools like Cassandra, Hadoop, MapReduce and Mahout for security analytics purposes which can help accelerate innovation to protect systems from threats.  The third reason is the tons of activity on the supply side (Gupta & Jyoti, 2014).  Organizations want security alerts from new vendors aside from HP, IBM, McAfee, and RSA Security.   Some vendors such as Hexis Cyber Solutions, Leidos, Narus, and Palantir will move beyond the government and extend into the private sector.  Others like Click Security, Forescale, and Netskope have intelligence backgrounds to deal with the malicious attacks (Gupta & Jyoti, 2014).

Fraud Detection is one of the most visible utilization of BDA (Cardenas et al., 2013; CSA, 2013).  Although Credit Card companies have conducted fraud detection for decades, the custom-built infrastructure to mine BD for fraud detection was not economical to adapt for other fraud detection uses (CSA, 2013).  The off-the-shelf BD tools and techniques are now attracting the attention to analytics for fraud detection in healthcare, insurance and other fields (CSA, 2013).  Examples of using BD for Security purposes include (1) Network Security, (2) Enterprise Event Analytics, (3) Netflow Monitoring to Identify Botnets, and (4) Advanced Persistent Threats Detection (APT).  The APT has two categories of (1) Beehive: Behavior Profiling for APT Detection, and (2) Using Large-Scale Distributed Computing to Unveil APTs.  For this discussion, the Network Security and Netflow Monitoring to Identify Botnets are the two examples for taking advantages of BDA for the security purposes (CSA, 2013).

Network Security

The case study by Zions Bancorporation is a good example for using BD for security purposes (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014).   The traditional SIEM could not handle the volume of the data generated for security purposes (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014).  Zions Bancorporation reported that using Hadoop clusters and business intelligence tools lead to parsing more data faster than the traditional SIEM tools (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014).   While the traditional SIEM system takes between twenty and one hours, the Hadoop system provides the result in a minute (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014).  The system enables users to mine meaningful security information from sources such as firewalls and security devices, website traffic, business processes and other transactions (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014).  The incorporation of unstructured data and multiple disparate datasets into a single analytical framework is one of the main promising features of BD (CSA, 2013; Raja & Rabbani, 2014).

Netflow Monitoring to Identify Botnets

Botnets are a major threat to the current Internet (Francois, Wang, Bronzi, State, & Engel, 2011). The traffic of botnet is mixed with a large volume of benign traffic due to ubiquitous high-speed networks (Francois et al., 2011).  These networks can be monitored using IP flow records.  However, their forensic analysis forms the major computational bottleneck (Francois et al., 2011).  The BotCloud research project by (Francois et al., 2011) leveraging Hadoop and MapReduce technology is a good example of taking advantage of BDA for security purpose.  In this project of (Francois et al., 2011), a distributed computing framework leveraging a host dependency model and adapted PageRank algorithm were proposed.  Moreover, the Hadoop cluster including MapReduce was utilized to analyze and detect densely interconnected hosts which are potential botnet members.  The large volume of Netflow data collected for data analysis was the reason for using MapReduce framework (CSA, 2013; Francois et al., 2011).  The project showed a good detection accuracy and a good efficiency based on Hadoop cluster. 

Conclusion

Big Data means Big Value for organizations at various levels including the security. BD is changing the analytics landscape.  BDA can be leveraged to enhance the information security and situational awareness to detect any abnormal activities.  For instance, BDA can be utilized to analyze financial transactions, log files, and network traffic to identify anomalies and suspicious activities, and to accelerate multiple sources of information into a coherent view.  Organizations can benefit greatly from BDA tools such as Hadoop and MapReduce for security purposes.  There are various reasons for using BD and BDA for security discussed in this DB.  In this discussion, the Network Security and Netflow Monitoring to Identify Botnets are the two examples for taking advantages of BDA for the security purposes.

References

Cardenas, A. A., Manadhata, P. K., & Rajan, S. P. (2013). Big data analytics for security. IEEE Security & Privacy, 11(6), 74-76.

CSA, C. S. A. (2013). Big Data Analytics for Security Intelligence. Big Data Working Group.

De Mauro, A., Greco, M., & Grimaldi, M. (2015). What is big data? A consensual definition and a review of key research topics. Paper presented at the AIP Conference Proceedings.

Emani, C. K., Cullot, N., & Nicolle, C. (2015). Understandable big data: A survey. Computer science review, 17, 70-81.

Francois, J., Wang, S., Bronzi, W., State, R., & Engel, T. (2011). Botcloud: Detecting botnets using MapReduce. Paper presented at the Information Forensics and Security (WIFS), 2011 IEEE International Workshop on.

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137-144.

Gupta, B., & Jyoti, K. (2014). Big data analytics with Hadoop to analyze targeted attacks on enterprise data.

McDaniel, P., & Smith, S. (2013). Big Data Analytics for Security. The University of Texas at Dallas.

Raja, M. C., & Rabbani, M. A. (2014). Big Data analytics security issues in a data-driven information system.

Vivekanand, M., & Vidyavathi, B. M. (2015). Security Challenges in Big Data: Review. International Journal of Advanced Research in Computer Science, 6(6).