Scalable and Intelligent Security Analytics Tools

Dr. Aly, O.
Computer Science

Abstract

The purpose of this project is to discuss and analyze scalable and intelligent security analytics tools from different vendors.  The discussion begins with an overview of SIEM and the essential capabilities of an analytics-driven SIEM.  The first generation of SIEM has been overwhelmed with the volume of the data and the rise of the advanced threats.  The second generation of SIEM was introduced because Big Data tools have the potential to provide significant enhancement in the action of the security intelligence by minimizing the time for correlating, consolidating, and contextualizing diverse security event information and also for correlating long-term historical data for forensic purposes.  The discussion and the analysis are limited to only four scalable and intelligent security analytics; AlienVault, QRadar, Splunk, and LogRhythm.  The Unified Security Management (USM) of AlienVault includes SIEM, vulnerability assessment, asset discovery, flow and packet capture, network and host detection, and file integrity monitoring.  The USM is utilized to improve the security visibility throughout the organization and to detect security incidents in the real-time.  IBM Security QRadar provides management for log and events, behavioral and reporting analysis for the network and applications.  Splunk Enterprise Security enables the search of the data and the implementation of visual correlation to identify malicious events and collect data about the context of those events.  LogRhythm supports log management and has the capabilities of the network forensic.  They can be deployed in smaller environments. The discussion and the analysis include the key design considerations for scalability of each tool.  The project also discusses the advantages and the disadvantages of these four intelligent security analytic tools.

Keywords: SIEM, Security Analytics Tools, QRadar, AlienVault, LogRhythm, Splunk.

Introduction

The advanced threats which often called advanced persistent threats or APTs are the primary reason which drives the organization to collect and analyze information.   The APTs are sophisticated attacks (Oprea, Li, Yen, Chin, & Alrwais, 2015), and can involve multiple events occurring across the organization which would otherwise not be connected without the utilization of advanced intelligence (SANS, 2013).  APTs cause severe risks and damages to organizations and governments because they target the confidential propriety information (SANS, 2013). 

Advanced threats require advanced intelligent and analytic tools.  Recent years have witnessed the rise of more sophisticated attacks including the APTs (IBM, 2013; Oprea et al., 2015; SANS, 2013).  In a survey conducted by (SANS, 2013), the APTs affected two-thirds of the respondents in the past two years.  The result showed that one respondent is seeing almost 500 attacks in a nine-month period, and many seeing somewhere between one and twenty.  Thus, the number of the APTs is increasing.   As indicated in (Radford, 2014) “Frequently, a victim of these attacks [APTs] do not even know that their perimeter security has been penetrated for an average of 243 days. They all have up-to-date anti-virus software, and 100% of breaches involve stolen credentials.”  The APTs is in every organization’s mind (Radford, 2014).  The Cloud Service Providers (CSP) is the prime target for cyber-attacks. Organizations who utilize the Cloud technology must consider the myriad access points to their data when hosted in a Cloud environment and should consider at length the solutions available to them, and pertinently that all data access points are covered (Radford, 2014).  The PAT is one of various security threats against which organizations must protect their information.

Various security techniques are offered to deal with security threats including the APTs.   This project discusses and analyzes four scalable and intelligent security tools from different vendors.  The analysis covers the main functions such as anomaly detection capability, event correlation capability, real-time analytics capability, and so forth of each tool.  The analysis includes an assessment if the tool is suited or used in the Cloud Computing, and an identification of the targeted applications.   The analysis also includes the pros and cons of each tool.   The project begins with an overview of Security Information and Event Management, followed by the scalable and intelligent security tools from different vendors.

Security Information and Event Management (SIEM)

The traditional and conventional model for protecting the data often focused on network-centric and perimeter security, using devices such as firewalls and intrusion detection systems (Oprea et al., 2015; Shamsolmoali & Zareapoor, 2016).  However, this conventional approach is not adequate when dealing with Big Data and Cloud Computing technologies.  The conventional approach does not provide enough protection against the advanced persistent threats (APT), privileged users or any other malicious security attacks (Oltsik, 2013; Oprea et al., 2015; Shamsolmoali & Zareapoor, 2016). 

Thus, many organizations deploy other techniques such as the database audit and protection (DAP), and security information and event management (SIEM) to collect information about the network activities (Shamsolmoali & Zareapoor, 2016).  Examples of the SIEM techniques include RSA envision and HP ArchSight using a standardized approach to collecting information and events, storing and querying and providing correlation degrees driven by rules (Pearson, 2013).   However, SIEM provides inputs which need to be properly analyzed and translated into a certain format to be utilized by senior risk evaluators and strategic policymakers. This manual process is not adequate when dealing with security issues.   Moreover, the standards of the risk assessment such as ISO2700x and NIST operate at a macro-level and usually do not fully leverage information coming from logging and auditing activities carried out by the IT operations (Pearson, 2013). Thus, SIEM lacks a solution for the business audit and strategic risk assessment (Pearson, 2013). 

In (Splunk, 2017), SIEM which is analytic-driven has six essential capabilities, (1) Real-Time Monitoring, (2) Incident Response, (3) User Monitoring, (4) Threat Intelligence, (5) Advanced Analytics, and (6) Advanced Threat Detection.  Table 1 summarizes these six essential capabilities of an analytics-driven SIEM (Splunk, 2017).

Table 1.  Six Essential Capabilities of an Analytics-Driven SIEM. Adapted from (Splunk, 2017).

      In a survey implemented by (SANS, 2013), the result showed 58% of the respondents were using dedicated log management and 37% were using a SIEM system.  The result also showed that nearly half of the respondents have dedicated log management platforms, SIEM, and scripted searches a part of their data collection and analysis processes.  The result showed that less than 10% utilizes unstructured data repositories and specific Big Data framework for analysis and search. Figure 1 illustrates the types of Security Data Analysis Tools in Use.

Figure 1. Types of Security Data Analysis Tool in Use. Adapted from (SANS, 2013).

      As indicated in (SANS, 2013), the dedicated log management platform does not meet the needs created by Big Data collection.  SIEM is anticipated to step up and meet the needs created by Big Data collection (SANS, 2013).  The survey results also showed that 26% of the respondents rely on analytics tools to do the heavy lifting of threat intelligence, and others leverage SIEM platforms and manual techniques as illustrated in Figure 2.

Figure 2.  Threat Intelligence Capabilities. Adapted from (SANS, 2013).

            In the same survey, the result showed that 51% of the respondents indicated they are currently using third-party intelligence services as illustrated in Figure 3.

Figure 3.  The Use of Third Party Intelligence Tools.  Adapted from (SANS, 2013).

The result of the survey also showed that the most organizations are still focused on the fundamentals such as better SIEM, more training to detect patterns of malicious activity, vulnerability management, and network protection tools, and endpoints visibility for future investments in security analytics and Big Data platform as illustrated in Figure 4.

Figure 4.  Future Investments in Analytics/Intelligence.  Adapted from (SANS, 2013).

Scalable and Intelligent Security Analytic Tools

            The management of alerts from different intrusion detection sensors and rules was a big challenge in the settings of the organizations.  The first generation of SIEM was able to aggregate and filter alarms from many sources and present actionable information to security analysis (CSA, 2013).  However, the first generation of SIEM has been overwhelmed with the volume of the data (Glick, 2014), and cannot keep up with the rate and complexity of the current wave of the cyber attacks and advanced threats (Splunk, 2017).  The second generation of SIEM was introduced because Big Data tools have the potential to provide a significant enhancement in the action of the security intelligence by minimizing the time for correlating, consolidating, and contextualizing diverse security event information, and also for correlating long-term historical data for forensic purposes (CSA, 2013).   

Most of the current SIEM systems provide the same basic features, except for those features proposed by vendors.  However, the basic feature of the SIEM systems contains Server, Database, FrontEnd, Probes, and Agents (Di Mauro & Di Sarno, 2018).  The Server is the core component of the whole deployment for collecting and processing the log coming from the external sources on behalf of the correlation engine.  The Database stores the data for analysis and runtime configuration of SIEM.  The FrontEnd is the user interface to the Server, while the Probe is the collection of sensors deployed within the monitored infrastructure.  The typical examples of the Probes include perimeter defense systems such as firewalls and intrusion prevention systems, host sensors such as Host IDSs, or security applications such as web firewalls and authentication systems (Di Mauro & Di Sarno, 2018).  The Agent represents the counterparts of probes embedded in the server and can convert heterogenous logs generated by different probes, in logs with the same syntax and a specific semantic (Di Mauro & Di Sarno, 2018).  Figure 5 illustrates the classical framework of the SIEM systems.

Figure 5.  A Classical Framework of a SIEM System.  Adapted from (Di Mauro & Di Sarno, 2018).

The security analytics market is rapidly evolving as vendors are merging, developers are adding new features, and tools once deployed exclusively on-premises which are also offered as a Cloud service (Dan Sullivan, 2015).   As indicated in (Dan Sullivan, 2015), there are three reasons for organizations to deploy security analytics software.  These reasons are the compliance, security event detection and remediation, and forensics.  The compliance is regarded to be the key driver of security requirement for more organizations.  It is imperative to verify the compliance because the government and industry regulations, organizations implement their security policies and procedures (Dan Sullivan, 2015).   The security analytics tools should alert organizations to significant events which are defined by rules such as trigger (Dan Sullivan, 2015).  The tools can help in minimizing the time and effort required to collect, filter, and analyze event data (Dan Sullivan, 2015).  The attacks can occur at high speed; these tools should also be at high speed to detect the malicious attacks.  In case the attack took place, the organization should be able to block any future attacks through the forensics, as the forensic analysis can reveal vulnerabilities in the network of the organization or desktop security controls which were not known prior the attack (Dan Sullivan, 2015). 

Thus, organizations must consider deploying security analytics software.  Various security analytics tools are introduced with the aim to detect and block any malicious attacks.  Examples of these security analytics tools include AlienVault, QRadar, Splunk, LogRhythm, FireEye, McAfee Enterprise Security Manager, and so forth.  This project discusses and analyzes four of these security analytic tools, their features, the pros, and cons. 

  1. AlienVault

AlienVault started in 2007.  During 2010 the company received an initial round of venture capital funding and relocated the headquarters from Spain to United States (Nicolett & Kavanagh, 2011).  The AlienVault’s Unified Security Management (USM) provides SIEM, vulnerability assessment, network and host intrusion detection, file integrity monitoring functions via software or appliance options (Mello, 2016; Nicolett & Kavanagh, 2011), asset discovery, and the capture of the flow and packets (Mello, 2016).  The AlienVault Unified SIEM contains the proprietary and the Open Source Security Information Management (OSSIM).  OSSIM has been available since 2003 and is an open source security management platform. AlienVault integrates OSSIM into the SIEM solution to offer enhanced performance, consolidated reporting and administration, and multi-tenanting for most managed security service providers (Nicolett & Kavanagh, 2011).    AlienVault added the real-time feature in 2010.   AlienVault’s long-term plan features to solve existing competitive gaps in areas such as application, monitoring feature for data and users, while the short-term plan includes dynamic monitor system to rule-based correlation (Nicolett & Kavanagh, 2011).

AlienVault’s USM offers three major advantages.  AlienVault offers SIEM solution, monitoring system for file integrity, assessment system for vulnerability, control for the endpoint, and intrusion detection system.   AlienVault is based on open source.  It is regarded to be less expensive than the corresponding product sets from most competitors in the SIEM domain (Nicolett & Kavanagh, 2011).  A more recent review of AlienVault Unified SIEM by (Vanhees, 2018), indicates that the AlienVault Unified Security System incorporates various technologies such as vulnerability scanning, NetFlow, host intrusion detection system, and network intrusion detection system. Moreover, it is easy to scale up and down, and it scores very high in that aspect (Vanhees, 2018). In another review by (Morrissey, 2015), AlienVault can identify risks and vulnerabilities on systems, to provide log consolidation and analysis, and to correlate threats between different systems.  As cited in (Mello, 2016), Gartner recommends AlienVault’ USM for organizations which require a broad set of integrated security capabilities, either on-premises or in AWS environment.

AlienVault has disadvantages and limitations. AlienVault lacks support for Database Activity Monitoring (DAM).  Moreover, there is no feature to integrate third-party DAM technologies (Mello, 2016; Nicolett & Kavanagh, 2011).   AlienVault lack supports the integration of Identity and Access Management (IAM) beyond Active Directory Monitoring (Mello, 2016; Nicolett & Kavanagh, 2011).   Report on AlienVault Unified Security Management by (Morrissey, 2015; Vanhees, 2018) indicates that it is not easy to develop custom plugins compared to other products, and to setup correlation rules.  Moreover, it is difficult to deal with static data as it deals only with dynamic data like syslogs, NetFlow, data captures, and so forth.  The custom reporting is very limited, and the task of creating a bar chart to visualize most common attacked ports is not possible.  Organizations which requires high-end reporting, advanced correlation rules or complex use case scenarios, should not consider AlienVault (Vanhees, 2018). 

Although AlienVault’s USM has this limitation, in a review by (Vanhees, 2018), AlienVault’s USM is described as a “huge value” as it does not require any additional setup and is baked into the tool nicely and smoothly compared with other vendors.  It helps detect suspicious traffic (Morrissey, 2015; Vanhees, 2018), and it automatically syncs with other intelligence feeds which are regarded to be handy.  The correlation rules are used to spot unwanted behavior proactively (Vanhees, 2018). 

  • QRadar

QRadar is IBM’s SIEM platform (Mello, 2016).  It is composed of QRadara Log Manager, Data Node SIEM, Risk Manager, Vulnerability Manager, QFlow, and VFlow Collectors, and Incident Forensics (Mello, 2016).  It uses the capabilities of Big Data to keep up with the advanced threats and prevent attacks before they occur (IBM, 2013).   QRadar has the capabilities to reveal hidden relationships within massive amount of security data using proven analytics to minimize billions of security events to a manageable set of prioritized incidents (IBM, 2013). 

The platform of IBM Security QRadar can be deployed as a physical or virtual or as a cloud service solution (Mello, 2016).   It can be installed using various options such as “all-in-one” implementation option or scaled.  QRadar provides capabilities such as collection and processing of the event and log data, NetFlow, deep-packet inspection of network traffic, and full-packet capture and behavior analysis (McKelvey, Curran, Gordon, Devlin, & Johnston, 2015; Mello, 2016).   IBM added more enhanced features to QRadar to support IBM X-Force Exchange for sharing threat intelligence and IBM Security App Exchange for sharing applications, and security app extensions (Mello, 2016).  After the acquisition of Resilient Systems in 2016 (Rowinski, 2016), IBM developed an integrated end-to-end security operations and response platform offering a quick response to cyber incidents.  It also enhanced the multi-tenant feature, and the capabilities of the search and system administration (Mello, 2016).

The IBM’s QRadar utilizes a distributed data management system providing horizontal scaling of data storage (D. Sullivan, 2016).  While organizations can utilize distributed USIM to access local data in some cases, they may also require searching across the distributed platform in some other scenarios.  QRadar incorporates a search engine which enables searching locally as well as across platforms (D. Sullivan, 2016).  QRadar is a big data SIEM utilizing data nodes instead of storage area network (SAN), which help in reducing the associated cost and the complexity of the management (D. Sullivan, 2016).  QRadar is a distributed storage model based on data nodes and can scale to petabytes of storage and can meet the requirement of organizations for a large volume of long-term storage (D. Sullivan, 2016).  QRadar has a vulnerability management component which is designed to integrate data from various vulnerability scanners and enhance that data with context-relevant information about network usage (D. Sullivan, 2016).   It has been used to process a large volume of events per second in real-world applications (D. Sullivan, 2016).   QRadar can be deployed in the Cloud to reduce the infrastructure management (D. Sullivan, 2016)).   The Security QRadar Risk Manager add-on offers the capability of the automated monitoring,  provides support for multiple vendor product audits, and assessment of compliance policy, and threat modeling (D. Sullivan, 2016).  QRadar platform can meet the requirement of mid-size and large organization with general SIEM needs.  It is also a good fit for mid-size organizations which require a solution with flexible implementation, hosting, and monitoring options.  QRadar is also good for organizations which look for a single security event and response platform for their security operation centers (Mello, 2016).  In recent reviews by (Verified-User2, 2017), IBM QRadar was the preferred option for the clients of the organizations across all departments for fast deployment and instant log visibility to meet security and compliance requirements. 

QRadar has various areas of strength and advantages.  It provides an integrated view of log and event data and the correlation of network traffic behavior across NetFlow and event logs.  It also supports security event and monitoring feature for the log in IaaS Cloud Service Model, including the monitoring for AWS CloudTrail and SoftLayer (Mello, 2016).   The security platform of QRadar is straightforward to implement and maintain (Mello, 2016).  In more recent reviews of IBM QRadar by (Verified-User2, 2017), IBB QRadar was described as simple, flexible framework, easy deployment and out of the box content good enough to have quick wins.  In another review by (Verified-User1, 2017), the creation of rules is intuitive and fast helping in emergency scenarios.  The maintenance of IBM QRadar is light, and the appliance has nearly flawless uptime (Verified-User1, 2017).  The generation of the reports is very functional and efficient (Verified-User1, 2017).  It was described as a positive return on investment in a recent review by (Verified-User2, 2017).  Moreover, third-party capabilities can be plugged into the framework through the Security App Exchange (Mello, 2016).  This capability of third-party support is useful because QRadar has limitation for the endpoint monitoring for threat detection and response and the integrity of the basic file (Mello, 2016). 

QRadar has additional limitations besides the endpoint monitoring.  In recent reviews by (Verified-User1, 2017), the limitations of IBM QRadar includes the steep learning curve compared to other platforms.  QRadar does require training and homework.  Moreover, there is a lack of threat feed utilization of STIX (Structured Threat Information Expression)/TAXII (Trusted Automated Exchange of Indicator Information), which remains very limited (Verified-User1, 2017).  It may require a considerable amount of tuning during the deployment with very little “out of the box” offense information (Verified-User1, 2017).  In another recent review by (Verified-User2, 2017), IBM QRadar is limited in event log parsing, and the correlation engine needs to be more flexible and dynamic (Verified-User2, 2017).

  • Splunk

The Splunk Enterprise (SE) is the core product of the company (Mello, 2016).  It provides log and event collection, searches and visualization with the query language of Splunk (Mello, 2016).  The Splunk Enterprise Security (SES) provides security features including correlation rules, reports and pre-defined dashboards (Mello, 2016).  SES supports the real-time monitoring and alerts, incident response and compliance reporting (Mello, 2016).  SE and SES can be deployed locally on-premise, or in public, private or hybrid Cloud, or as a service using the Cloud Service Models (Mello, 2016).  Splunk acquired Caspida in 2015 (Mello, 2016; Tinsley, 2015).  After the acquisition, Splunk added the native behavioral analytics to its repertoire (Mello, 2016).  Moreover, it provided support to third-party UEBA products (Mello, 2016).  Additional features have been added to SES to integrate with other behavioral products.  More capabilities such as improved incident management and workflow capabilities, lower data storage requirements, better visualizations have been implemented.  The expansion of monitoring system to additional infrastructure and software-as-a-service provider (Mello, 2016).

            Splunk has the capabilities of broad data ingestion, offering connectors to the data sources and allowing custom connectors (D. Sullivan, 2016).   Splunk systems stores data in the schema-less database and uses indexes based on ingestion enabling various data types with rapid query response (D. Sullivan, 2016).  Splunk provides flexible SIEM platform which can handle various data sources and has the analytic capabilities or a single data analysis platform (Mello, 2016).  Splunk was found gaining “significant” visibility across the client base of Gartner (Mello, 2016).  Splunk has strong advanced security analytics for combating advanced threat detection and insider threats (Mello, 2016). 

            Splunk has various advantages.  In a review by (Taitingfong, 2015), Splunk was found to be flexible and extensible.  It can ingest logs from disparate systems using disparate formats and disparate file types (Murke, 2015; Taitingfong, 2015).  Splunk was found to be flexible in parsing, formatting, and enhancing the data (Taitingfong, 2015).  Splunk was found to scale very well in large environments adding more indexers as needed with the expanded environment (Kasliwal, 2015; Murke, 2015; Taitingfong, 2015).  Splunk can do multi-site clustering and search head clustering providing load balancing and redundancy (Taitingfong, 2015).   In another review by (Murke, 2015), Splunk has the capability of the real-time analysis.   Splunk provided the best results amongst its competitors as indicated in the review of (Murke, 2015).   It provided fast results on large datasets and is easy to manage as indicated in the review by (Murke, 2015).

            Splunk system has limitations.  SES product provides only the basic pre-defined correlations for user monitoring and reporting (Mello, 2016).  The licensing model for high volume data costs more than other SIEM products, although Splunk offers a new licensing scheme for high-volume data users (Mello, 2016).  In recent reviews by (Kasliwal, 2015; Taitingfong, 2015), the search language of Splunk, the more advanced formatting or statistical analysis were found very deep and required a learning curve.  The dashboard of Splunk may require more visualization which requires development using simple XML, Java Scripts and CSS (Taitingfong, 2015).  Moreover, Splunk releases minor revisions very quickly due to the increased number of bugs (Taitingfong, 2015).  In another review by (Murke, 2015), Splunk was found limited in providing optimized results with the smaller size of data.  Moreover, it was found by (Kasliwal, 2015; Murke, 2015) as costly.

  • LogRhythm

The SIEM of the LogRhythm supports n-tier-scalable, decentralized architecture (Mello, 2016).  LogRhythm’s SIEM is composed of various tools such as Platform Manager, AI Engine, Data Processors, Data Indexers, and Data Collectors (Mello, 2016).   The deployment of LogRhythm can be consolidated as an all-in-one.  The implementation options can be appliance, software or virtual instance format (Mello, 2016).  The behavioral analytics of the user and entity, an integrated incident response workflow and automated response capabilities can be combined with other capabilities such as event, endpoint and network monitoring (Mello, 2016).   The log processing and indexing capabilities of the LogRhythm’s SIEM are divided into two components of the systems.  Moreover, more features to the system are added such as the capabilities of the unstructured search through a new storage backend based on Elasticsearch (Mello, 2016). More features are added such as the clustered full data replication; more parsers for applications and protocols, an improved risk-based prioritization (RBP), support for the Cloud services such as AWS, Box, and Okta.  The integration with the Cloud Access Security Broker solutions such as Microsoft’s Cloud App Security and Zscaler are also added to enhance the LogRhythm’s SIEM system (Mello, 2016).  

LogRhythm has various advantages.  LogRhythm can integrate the capabilities of the advanced threat monitoring with SIEM (Mello, 2016).   It offers effective out-of-the-box content and workflows which can help in the automation process (Mello, 2016).  It provides highly interactive and customizable and automated response capabilities for performing actions on remote devices (Mello, 2016). It is praised for offering straightforward SIEM systems implementation and maintenance (Mello, 2016).  LogRhythm is also described to be very visible in the SIEM evaluations of the Gartner’s clients (Mello, 2016).  In a review for LogRhythm by (Eng, 2016), it was described as a great SIEM which is easy to implement.  Building blocks using LogRhythm is intuitive and easy using drag and drop building block technique which can be easily manipulated (Eng, 2016).  It offers statistical building blocks with powerful anomaly detection capabilities which are found to be more difficult or not possible in other SIEMs products (Eng, 2016).   LogRhythm provides better event classification than any other SIEM products (Eng, 2016).   In another review by (Ilbery, 2016), LogRhythm has the capabilities to import log files from hundreds of devices into one, and it is easy to search the database (Ilbery, 2016).  It also has the capabilities to send alert messages to the network activities using emails.  It provides a good view of the network equipment, traffic, and the servers. 

            LogRhythm some limitations.  More enhancement and improvement are required for the custom report engine included (Mello, 2016).  In a review by (Eng, 2016), there is a requirement and need for LogRhythm to provide back-end support for threat intelligence lists.  There is a proposal by (Eng, 2016) for log rhythm to replace the code with hash tables to avoid the excessive cost associated with referencing lists in the rule.   The reporting of LogRhythm was described by (Eng, 2016) as the worst of all SIEM systems because it is not intuitive and needs improvement.  In another review for LogRhythm by (Ilbery, 2016),  the upgrade process was described as a not easy process.    

Conclusion

This project discussed and analyzed scalable and intelligent security analytics tools from different vendors.  The discussion began with an overview of SIEM and the essential capabilities of an analytics-driven SIEM.  The first generation of SIEM has been overwhelmed with the volume of the data and the rise of the advanced threats.  The second generation of SIEM was introduced because Big Data tools have the potential to provide significant enhancement in the action of the security intelligence by minimizing the time for correlating, consolidating, and contextualizing diverse security event information and also for correlating long-term historical data for forensic purposes.

Examples of these security analytics tools include AlienVault, QRadar, Splunk, LogRhythm, FireEye, McAfee Enterprise Security Manager, and so forth.  The discussion and the analysis were limited to only four scalable and intelligent security analytics; AlienVault, QRadar, Splunk, and LogRhythm.  The Unified Security Management (USM) of AlienVault includes SIEM, vulnerability assessment, asset discovery, flow and packet capture, network and host detection, and file integrity monitoring.  The USM is utilized to improve the security visibility throughout the organization and to detect security incidents in the real-time.  IBM Security QRadar provides management for log and events, behavioral and reporting analysis for the network and applications.  Splunk Enterprise Security enables the search of the data and the implementation of visual correlation to identify malicious events and collect data about the context of those events.  LogRhythm supports log management and has the capabilities of the network forensic.  They can be deployed in smaller environments. The discussion and the analysis included the key design considerations for scalability of each tool.  The project also discussed the advantages and the disadvantages of these four intelligent security analytic tools.

References

CSA. (2013). Expanded Top Ten Big Data Security and Privacy Challenges. Cloud Security Alliance, Big Data Working Group.

Di Mauro, M., & Di Sarno, C. (2018). Improving SIEM capabilities through an enhanced probe for encrypted Skype traffic detection. Journal of Information Security and Applications, 38, 85-95.

Eng, J. (2016). LogRhythm Review: “So you want to know which SIEM to buy”. Retrieved  from https://www.trustradius.com/reviews/logrhythm-2016-06-08-06-30-33.

Glick, B. (2014). Information Security is a Big Data Issue. Retrieved from http://www.computerweekly.com/feature/Information-security-is-a-bigdata(05.05).

IBM. (2013). Extending Security Intelligence with Big Data Solutions. Retrieved from http://www.ndm.net/siem/pdf/Extending%20security%20intelligence%20with%20big%20data%20solutions.PDF.

Ilbery, S. (2016). User Review: “LogRhythm does what it promises.”. Retrieved from https://www.trustradius.com/reviews/logrhythm-2016-06-07-13-22-49.

Kasliwal, G. (2015). Splunk Enterprise Review: “Splunk: Dynamic and Fast compliance tool”. Retrieved from https://www.trustradius.com/reviews/splunk-2015-12-07-15-43-44.

McKelvey, N., Curran, K., Gordon, B., Devlin, E., & Johnston, K. (2015). Cloud Computing and Security in the Future Guide to Security Assurance for Cloud Computing (pp. 95-108): Springer.

Mello, J. P. J. (2016). Gartner Magic Quadrant for SIEM 2016: Not just for compliance anymore. Retrieved from https://techbeacon.com/highlights-gartner-magic-quadrant-siem-2016.

Morrissey, M. (2015). AlienVault USM Review: “Making sense of the logging overload.”. Retrieved from https://www.trustradius.com/reviews/alienvault-unified-security-management-2015-09-29-23-21-40.

Murke, S. (2015). Splunk Enterprise Review: “For Real-Time Data Analyzing Get Splunk.”. Retrieved from https://www.trustradius.com/reviews/splunk-2015-12-07-23-21-33.

Nicolett, M., & Kavanagh, K. M. (2011). Magic Quadrant for Security Information and Event Management.

Oltsik, J. (2013). The Big Data Security Analytics Era Is Here.

Oprea, A., Li, Z., Yen, T.-F., Chin, S. H., & Alrwais, S. (2015). Detection of early-stage enterprise infection by mining large-scale log data. Paper presented at the Dependable Systems and Networks (DSN), 2015 45th Annual IEEE/IFIP International Conference on.

Pearson, S. (2013). Privacy, security and trust in cloud computing Privacy and Security for Cloud Computing (pp. 3-42): Springer.

Radford, C. J. (2014). Challenges and Solutions Protecting Data within Amazon Web Services. Network Security, 2014(6), 5-8. doi:10.1016/S1353-4858(14)70058-3

Rowinski, M. (2016). IBM Security Closes Acquisition of Resilient Systems. Retrieved from http://www-03.ibm.com/press/us/en/pressrelease/49472.wss, IBM News Release.

SANS. (2013). SANS Security Analytic Survey Retrieved from https://www.sans.org/reading-room/whitepapers/analyst/security-analytics-survey-34980, White Paper.

Shamsolmoali, P., & Zareapoor, M. (2016). Data Security Model In Cloud Computing. Proceedings of 2nd International Conference on Computer Science Networks and Information Technology, Held on 27th – 28th Aug 2016, in Montreal, Canada.

Splunk. (2017). The Six Essential Capabilities of an Analytics-Driven SIEM. Retrieved from https://www.splunk.com/en_us/form/the-six-essential-capabilities-of-analytics-driven-siem.html, White Paper.

Sullivan, D. (2015). Three reasons to deploy security analytics software in the enterprise. Retrieved from http://searchsecurity.techtarget.com/feature/Three-reasons-to-deploy-security-analytics-software-in-the-enterprise.

Sullivan, D. (2016). Comparing the top big data security analytics tools. Retrieved from http://searchsecurity.techtarget.com/feature/Comparing-the-top-big-data-security-analytics-tools.

Taitingfong, K. (2015). Splunk Enterprise Review: “Splunk – the most flexible SIEM tool on the market.”. Retrieved from https://www.trustradius.com/reviews/splunk-2015-12-01-16-51-52.

Tinsley, K. (2015). Splunk Acquires Caspida. Retrieved from https://www.splunk.com/en_us/newsroom/press-releases/2015/splunk-acquires-caspida.html, Splunk Press Release.

Vanhees, K. (2018). AlienVault USM Review: “Nothing is what it SIEMs”. Retrieved from https://www.trustradius.com/reviews/alienvault-unified-security-management-2015-09-29-01-55-26.

Verified-User1. (2017). IBM Security QRadar Review: “Qradar – Big League SIEM Solution”. Retrieved from https://www.trustradius.com/products/ibm-security-qradar/reviews. Verified-User2. (2017). IBM Security QRadar Review: “IBM QRadar – A go-to SIEM product”. Retrieved from https://www.trustradius.com/reviews/ibm-security-qradar-2017-06-21-04-06-13, Trust Radius Reviews.

Big Data Analytics Application to Solve Known Security Issues

Dr. Aly, O.
Computer Science

Introduction

The purpose of this discussion is to identify two advantages of applying Big Data Analytics to solve known security issues such as malware detection, network hacking, spam, and so forth.  The discussion and the analysis include the reasons and rationale for utilizing Big Data Analytics to solve security issues.  The discussion begins with a brief overview of Big Data Analytics and the Security Threats.

Big Data Analytics

Big Data (BD) is the major topic across some domains and fields such as management and marketing, scientific research, national security and government (Vivekanand & Vidyavathi, 2015).  BD enables making an informed decision as it shifts the reasoning from logical and causality-based to the acknowledgment of correlation links between events (De Mauro, Greco, & Grimaldi, 2015).   The public and private sectors are increasing their use of the Big Data Analytics (BDA) in different areas (Vivekanand & Vidyavathi, 2015).  The process of very large amounts of data is the main benefit of Big Data Analytics (Emani, Cullot, & Nicolle, 2015).  Big Data Analytics is defined in (Emani et al., 2015) as the use of advanced analytics techniques on Big Data. As elaborated by (Gupta & Jyoti, 2014), BDA is the process of analyzing Big Data to find hidden patterns, unknown correlations and other useful information which can be extracted to make a sound decision.  In (CSA, 2013), BDA is described as the process of analyzing and mining Big Data and can produce operational and business knowledge on an unprecedented scale and specificity.  The massive volume of semi-structured, unstructured data can be mined using the BDA (Gandomi & Haider, 2015; Gupta & Jyoti, 2014).  The need and the requirement to analyze and leverage trend data which are collected by organizations is one of the main drivers for BDA tools (CSA, 2013).  The value of BDA is increasing as the cash flow is increasing.  Figure 1 illustrates the graph for the value of BDA with dimensions of time and cumulative cash flow.  Thus, there is no doubt that BDA provides great benefits to organizations.  

Figure 1.  The Value of Big Data Analytics. Adapted from (Gupta & Jyoti, 2014).

Big Data Analytics for Security

BD is changing the analytics landscape (CSA, 2013).  BDA can be leveraged to enhance the information security and situational awareness (CSA, 2013).  For instance, BDA can be utilized to analyze financial transactions, log files, and network traffic to identify anomalies and suspicious activities, and to accelerate multiple sources of information into a coherent view (CSA, 2013).  The malicious attacks have been increasing lately.  Thus, the increasing security threats come along with increasing use of BD, BDA, and Cloud Computing technologies. The malicious attacks have become the major topic of government, organization, and industry (Gupta & Jyoti, 2014).  Big Data Security Analytics is used for the increasing practice of organization to gather and analyze security data to detect vulnerabilities and intrusions by attackers (Gupta & Jyoti, 2014).   The Advanced Persistent Threats (APT) is a subset of the malicious attacks and threats which are well-resourced and trained attacks which conduct multi-year intrusion campaigns targeting highly sensitive economic, proprietary or national security information (Gupta & Jyoti, 2014).  The aim APT is to maintain the persistent attack without getting detected inside their target environment (Gupta & Jyoti, 2014).

Thus, the main purpose of using BD techniques to analyze the data and apply same to implement enhanced data security techniques (Gupta & Jyoti, 2014).  Big Data technologies facilitate a wide range of industries to develop affordable infrastructures for security monitoring (Cardenas, Manadhata, & Rajan, 2013). Organizations can use various systems with a range of Security Analytics Sources (SAS).  These systems can generate messages or alerts and transmits them to the trusted server for analysis and action (Gupta & Jyoti, 2014).  The system can be Host-based Intrusion Detection System (HIDS), an antivirus engine which writes a Syslog or interfaces reporting events to remove service such as Security and Information Event Monitoring (SIEM) system (Gupta & Jyoti, 2014).  

There are very good reasons for BD to enter the security domain.  In (Gupta & Jyoti, 2014) three main reasons for BD to enter the enterprise security mainstream.  The first reason is the continuing problems with detection and response of threats because the existing security analytics tools are found inadequate to handle advanced virus, malware, stealthy attack techniques, and the growing army of well-organized global cyber attacks.  The second reason is the Moore’s Law and Open Source.  Security vendors are increasing the development cycles by customizing open source tools like Cassandra, Hadoop, MapReduce and Mahout for security analytics purposes which can help accelerate innovation to protect systems from threats.  The third reason is the tons of activity on the supply side (Gupta & Jyoti, 2014).  Organizations want security alerts from new vendors aside from HP, IBM, McAfee, and RSA Security.   Some vendors such as Hexis Cyber Solutions, Leidos, Narus, and Palantir will move beyond the government and extend into the private sector.  Others like Click Security, Forescale, and Netskope have intelligence backgrounds to deal with the malicious attacks (Gupta & Jyoti, 2014).

Fraud Detection is one of the most visible utilization of BDA (Cardenas et al., 2013; CSA, 2013).  Although Credit Card companies have conducted fraud detection for decades, the custom-built infrastructure to mine BD for fraud detection was not economical to adapt for other fraud detection uses (CSA, 2013).  The off-the-shelf BD tools and techniques are now attracting the attention to analytics for fraud detection in healthcare, insurance and other fields (CSA, 2013).  Examples of using BD for Security purposes include (1) Network Security, (2) Enterprise Event Analytics, (3) Netflow Monitoring to Identify Botnets, and (4) Advanced Persistent Threats Detection (APT).  The APT has two categories of (1) Beehive: Behavior Profiling for APT Detection, and (2) Using Large-Scale Distributed Computing to Unveil APTs.  For this discussion, the Network Security and Netflow Monitoring to Identify Botnets are the two examples for taking advantages of BDA for the security purposes (CSA, 2013).

Network Security

The case study by Zions Bancorporation is a good example for using BD for security purposes (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014).   The traditional SIEM could not handle the volume of the data generated for security purposes (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014).  Zions Bancorporation reported that using Hadoop clusters and business intelligence tools lead to parsing more data faster than the traditional SIEM tools (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014).   While the traditional SIEM system takes between twenty and one hours, the Hadoop system provides the result in a minute (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014).  The system enables users to mine meaningful security information from sources such as firewalls and security devices, website traffic, business processes and other transactions (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014).  The incorporation of unstructured data and multiple disparate datasets into a single analytical framework is one of the main promising features of BD (CSA, 2013; Raja & Rabbani, 2014).

Netflow Monitoring to Identify Botnets

Botnets are a major threat to the current Internet (Francois, Wang, Bronzi, State, & Engel, 2011). The traffic of botnet is mixed with a large volume of benign traffic due to ubiquitous high-speed networks (Francois et al., 2011).  These networks can be monitored using IP flow records.  However, their forensic analysis forms the major computational bottleneck (Francois et al., 2011).  The BotCloud research project by (Francois et al., 2011) leveraging Hadoop and MapReduce technology is a good example of taking advantage of BDA for security purpose.  In this project of (Francois et al., 2011), a distributed computing framework leveraging a host dependency model and adapted PageRank algorithm were proposed.  Moreover, the Hadoop cluster including MapReduce was utilized to analyze and detect densely interconnected hosts which are potential botnet members.  The large volume of Netflow data collected for data analysis was the reason for using MapReduce framework (CSA, 2013; Francois et al., 2011).  The project showed a good detection accuracy and a good efficiency based on Hadoop cluster. 

Conclusion

Big Data means Big Value for organizations at various levels including the security. BD is changing the analytics landscape.  BDA can be leveraged to enhance the information security and situational awareness to detect any abnormal activities.  For instance, BDA can be utilized to analyze financial transactions, log files, and network traffic to identify anomalies and suspicious activities, and to accelerate multiple sources of information into a coherent view.  Organizations can benefit greatly from BDA tools such as Hadoop and MapReduce for security purposes.  There are various reasons for using BD and BDA for security discussed in this DB.  In this discussion, the Network Security and Netflow Monitoring to Identify Botnets are the two examples for taking advantages of BDA for the security purposes.

References

Cardenas, A. A., Manadhata, P. K., & Rajan, S. P. (2013). Big data analytics for security. IEEE Security & Privacy, 11(6), 74-76.

CSA, C. S. A. (2013). Big Data Analytics for Security Intelligence. Big Data Working Group.

De Mauro, A., Greco, M., & Grimaldi, M. (2015). What is big data? A consensual definition and a review of key research topics. Paper presented at the AIP Conference Proceedings.

Emani, C. K., Cullot, N., & Nicolle, C. (2015). Understandable big data: A survey. Computer science review, 17, 70-81.

Francois, J., Wang, S., Bronzi, W., State, R., & Engel, T. (2011). Botcloud: Detecting botnets using MapReduce. Paper presented at the Information Forensics and Security (WIFS), 2011 IEEE International Workshop on.

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137-144.

Gupta, B., & Jyoti, K. (2014). Big data analytics with Hadoop to analyze targeted attacks on enterprise data.

McDaniel, P., & Smith, S. (2013). Big Data Analytics for Security. The University of Texas at Dallas.

Raja, M. C., & Rabbani, M. A. (2014). Big Data analytics security issues in a data-driven information system.

Vivekanand, M., & Vidyavathi, B. M. (2015). Security Challenges in Big Data: Review. International Journal of Advanced Research in Computer Science, 6(6).

Cloud Computing Security Issues

Dr. Aly, O.
Computer Science

Introduction

The purpose of this discussion is to discuss and analyze two security issues associated with the Cloud Computing system.  The analysis includes the causes for these two security issues and the solutions.  The discussion begins with an overview of the Security Issues when dealing with Cloud Computing.

Security Issues Associated with Cloud Computing

Cloud Computing and Big Data are the current buzz words in IT industry. Cloud Computing does not only solve the challenges of Big Data, but also offers benefits for businesses, organizations, and individuals such as:

  • Cost saving,
  • Access data from anywhere anytime,
  • Pay per use like any utility,
  • Data Storage,
  • Data Processing,
  • Elasticity,
  • Energy Efficiency,
  • Enhanced Productivity, and more (Botta, de Donato, Persico, & Pescapé, 2016; Carutasu, Botezatu, Botezatu, & Pirnau, 2016; El-Gazzar, 2014).

Despite the tremendous benefits of the Cloud Computing, the emerging technology of the Cloud Computing is confronted with many challenges. The top challenge is the Security, which is expressed by executives as number one concern for adopting Cloud Computing  (Avram, 2014; Awadhi, Salah, & Martin, 2013; Chaturvedi & Zarger, 2015; Hashizume, Rosado, Fernández-medina, & Fernandez, 2013; Pearson, 2013).

The security issues in Cloud Computing environment are distinguished from the security issues of the traditional distributed systems (Sakr & Gaber, 2014).   Various research studies, in an attempt, to justify this security challenge in the Cloud Computing environment, provide various reasons such as the underlying technologies of Cloud Computing have security issues, such as virtualization, and SOA (Service Oriented Architecture) (Inukollu, Arsi, & Ravuri, 2014).  Thus, the security issues that are associated with these technologies come along with the Cloud Computing (Inukollu et al., 2014).  The Cloud Computing Service Model of PaaS (Platform as a Service) is a good example because it is based on SOA (Service-Oriented Architecture) Model.  Thus, the Cloud Computing Service Model PaaS inherits all of the security issues that are associated with SOA technology (Almorsy, Grundy, & Müller, 2016).   In (Sakr & Gaber, 2014), factors such as multi-tenancy, trust asymmetry, global reach and insider threats contribute to the security issues associated with the Cloud Computing environment.  In (Tripathi & Mishra, 2011), eleven security issues and threats associated with the Cloud environment are identified (1) VM-Level attacks, (2) Abuse and Nefarious Use of Cloud Computing,  (3) Loss of Governances, (4) Lock-IN, (5) Insecure Interfaces and APIs, (6) Isolation Failure, (7) Data Loss or Leakage, (8) Account or Service Hijacking, (9) Management Interface Compromise, (10) Compliance Risks, and (11) Malicious Insiders. In the more recent report of (CSA, 2016), twelve critical issues to the Cloud security are identified and ranked in the order of severity.  Data Breaches is ranked at the top and regarded as the most severe security issue of the Cloud Computing environment.  The Weak Identity, Credential, and Access Management is the second severe security issue.  The Insecure APIs, System and Application Vulnerabilities, and Account Hijacking are the next ranked security issues.  Table 1 lists the twelve security issues associated with the Cloud Computing as reported by (CSA, 2016).

Table 1.  Top Twelve Security Issues of Cloud Computing in Order of Severity. Adapted from (CSA, 2016).

The discussion and analysis are limited to the top two security issues, which are the Data Breaches, and the Weak Identity, Credential and Access Management. The discussion and analysis cover the causes and the proposed solutions.

Data Breaches

The data breach occurs when the sensitive and confidential information or any private data not intended for the public is released, viewed, stolen or used by unauthorized users (CSA, 2016).   The data breach issue is not unique to the Cloud Computing environment (CSA, 2016).  However, it is consistently ranking as the top issue and concern for the Cloud users.  The Cloud environment is subject to the same threats as the traditional corporate network and new attack techniques due to the shared resources.  The sensitivity degree of the data determines the extent of the damage.  The impact of the Data Breach on users and organization is devastated.  For instance, in a single incident of a data breach in the USA, 40 million credit card numbers and about 70 million addresses, phone numbers and other private and personal information details were compromised (Soomro, Shah, & Ahmed, 2016).  The firm spent $61 million in less than one year of the breach for the damages and the recovery, besides the cash loss, the profit which dropped by 46% in one quarter of the year (Soomro et al., 2016).  The “BitDefender,” the anti-virus firm, and the British telecom provider “TalkTalk” are other good examples of the Data Breaches.  The private information such as username and passwords of the customers of “BitDefender” was stolen in mid-2015, and the hacker demanded a ransom of $15,000 (CSA, 2016; Fox-Brewster, 2015).  Multiple security incidents in 2014 and 2015 were reported by “TalkTalk” resulting in the theft of four million users’ private information (CSA, 2016; Gibbs, 2015). 

The organization is obliged to exercise certain security standards of care to ensure that sensitive information is not released to unauthorized users.  The Cloud providers have certain responsibilities in certain aspects of the Cloud Computing, and they usually provide the security measures for these aspects. However, the Cloud users also have certain aspects when using the Cloud Computing, and they are responsible for these aspects to protect their data in the Cloud.   The multi-factor authentication and encryptions are the two techniques that are proposed to secure the Cloud environment.  

Insufficient Identity, Credential, and Access Management

Data Breaches and the malicious attacks happen due to various reasons.  The lack of scalable Identity Access Management Systems can cause Data Breach (CSA, 2016).  The failure to use Multi-Factor Authentication, weak password use and a lack of ongoing automated rotation of cryptographic keys, passwords, and certificates can cause Data Breach (CSA, 2016).  Malicious attackers, who can masquerade as legitimate users or developers, can modify and delete data, issue control, and management functions, and snoop on data in transit or release malicious software which appears to originate from a legitimate source.   The insufficient identity, credential or key management can allow these malicious attackers or non-authorized users to access private and sensitive data and cause catastrophic damage to the users and the organizations as well.   The GitHub attack and the Dell root certificate are good examples of this security issues.  The GitHub is a good example of this security issue as the attackers scrape GitHub for Cloud service credentials, hijacked account to mine virtual currency (Sandvik 2014). Dell is another example which releases a fix for root certificate failure because all dell systems used the same secret key and the certificate which enables creating a certificate for any domain, which is trusted by Dell (Schwartz, 2015). 

The security issues require Cloud Computing systems to be protected so that unauthorized users should not have access to the private and sensitive information. Various solutions are proposed to solve this security issue of insufficient identity and access management.  A security framework in a distributed system to consider public key cryptography, software agents and XML binding technologies was proposed as indicated in (Prakash & Darbari).  The credential and cryptographic keys should not be embedded in source code or distributed in public repositories such as GitHub.  The keys should be properly secured using well-secured public key infrastructure (PKI) to ensure key-management (CSA, 2016).  The Identity Management Systems (IMS) should scale to handle the lifecycle management for millions of users and cloud service providers (CSP).  The IMS should support immediate de-provisioning of access to resources when events such as job termination or role change.   The Multi-Factor Authentication System (MAS) such as a smart card, phone authentication, should be required for user and operator of the Cloud service (CSA, 2016).  

References

Almorsy, M., Grundy, J., & Müller, I. (2016). An analysis of the cloud computing security problem. arXiv preprint arXiv:1609.01107.

Avram, M. G. (2014). Advantages and Challenges of Adopting Cloud Computing from an Enterprise Perspective. Procedia Technology, 12, 529-534. doi:10.1016/j.protcy.2013.12.525

Awadhi, E. A., Salah, K., & Martin, T. (2013, 17-20 Nov. 2013). Assessing the Security of the Cloud Environment. Paper presented at the GCC Conference and Exhibition (GCC), 2013 7th IEEE.

Botta, A., de Donato, W., Persico, V., & Pescapé, A. (2016). Integration of Cloud Computing and Internet Of Things: a Survey. Future Generation computer systems, 56, 684-700.

Carutasu, G., Botezatu, M., Botezatu, C., & Pirnau, M. (2016). Cloud Computing and Windows Azure.

Chaturvedi, D. A., & Zarger, S. A. (2015). A review of security models in cloud computing and an Innovative approach. International Journal of Computer Trends and Technology (IJCTT), 30(2), 87-92.

CSA. (2016). The Treacherous 12: Cloud Computing Top Threats in 2016. Cloud Security Alliance

Top Threats Working Group.

El-Gazzar, R. F. (2014). A literature review on cloud computing adoption issues in enterprises. Paper presented at the International Working Conference on Transfer and Diffusion of IT.

Fox-Brewster, T. (2015). Anti-Virus Firm BitDefender Admits Breach, Hacker Claims Stolen Passwords Are Unencrypted. Retrieved from https://www.forbes.com/sites/thomasbrewster/2015/07/31/bitdefender-hacked/#5a5f5b125ab2.

Gibbs, S. (2015). TalkTalk criticised for poor security and handling of hack attack. Retrieved from http://www.theguardian.com/technology/2015/oct/23/talktalk-criticised-for-poor-security-and-handling-of-hack-attack.

Hashizume, K., Rosado, D. G., Fernández-medina, E., & Fernandez, E. B. (2013). An analysis of security issues for cloud computing. Journal of internet services and applications, 4(1), 1-13. doi:10.1186/1869-0238-4-5

Inukollu, V. N., Arsi, S., & Ravuri, S. R. (2014). Security issues associated with big data in cloud computing. International Journal of Network Security & Its Applications, 6(3), 45.

Pearson, S. (2013). Privacy, security and trust in cloud computing Privacy and Security for Cloud Computing (pp. 3-42): Springer.

Prakash, V., & Darbari, M. A Review on Security Issues in Distributed Systems.

Sakr, S., & Gaber, M. (2014). Large Scale and big data: Processing and Management: CRC Press.

Sandvik , R. A. (2014). Attackers Scrape GitHub for Cloud Service Credentials, Hijack Account to Mine Virtual Currency. Retrieved from https://www.forbes.com/sites/runasandvik/2014/01/14/attackers-scrape-github-for-cloud-service-credentials-hijack-account-to-mine-virtual-currency/#71fe913c3196.

Schwartz, M. (2015). Dell Releases Fix for Root Certificate Fail. Retreived from http://www.bankinfosecurity.com/dell-releases-fix-for-root-certificate-fail-a-8701/op-1.

Soomro, Z. A., Shah, M. H., & Ahmed, J. (2016). Information security management needs more holistic approach: A literature review. International Journal of Information Management, 36(2), 215-225.

Tripathi, A., & Mishra, A. (2011, 14-16 Sept. 2011). Cloud computing security considerations. Paper presented at the 2011 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC).

Big Data Security Issues

Dr. Aly, O.
Computer Science

Introduction

The purpose of this discussion is to discuss and analyze two security issues associated with Big Data.  The analysis includes the causes for these two security issues and the solutions.  The discussion begins with an overview of the Security Issues when dealing with Big Data.

Security Issues Associated with Big Data

As indicated in (CSA & Big-Data-Working-Group, 2013), the velocity, volume and variety characteristics of Big Data magnify the security and privacy issues.  These security and privacy issues include issues such as the large-scale infrastructures in the Cloud, various data sources and formats, the acquisition of the data using streaming techniques, and the high-volume migration inside the Cloud (CSA & Big-Data-Working-Group, 2013).  Thus, the traditional security techniques which tended to be for small-scale, static data are found inadequate when dealing with Big Data (CSA & Big-Data-Working-Group, 2013).  Storing the organizations’ information, customers and patients in a secure manner is not a trivial process, and it gets more complicated in a Big Data environment (Al-Kahtani, 2017).   CSA identified the top ten Big Data Security and Privacy challenges illustrated in the Big Data Ecosystem in Figure 1, adapted from CSA.  These ten security challenges are categorized in four main categories in Big Data Ecosystems: (1) The infrastructure Security, (2) Data Privacy, (3) Data Management, and (4) Integrity and Reactive Security.

Figure 1.  Top Ten Security Challenges in Big Data Ecosystem, adapted from (Al-Kahtani, 2017).

Tremendous efforts from the researchers, practitioners and the industry are exerted to address the security issues associated with Big Data.  As indicated in (Arora & Bahuguna), Security of the Big Data is challenging due to two main vulnerabilities.  The first vulnerability includes the information leakage which gets increased by Big Data because of its characteristics of high volume and velocity (Arora & Bahuguna).  The second vulnerability reflects the privacy and prediction of people’s behavior risk get increased by the development of intelligent terminals (Arora & Bahuguna).  In (Al-Kahtani, 2017), the general security risks associated with Big Data environments are identified to include six security risk elements. The first security risk is associated with the implementation of a new technology, which can lead to new vulnerability discovery. The second security risk can be associated with the open source tools which can contain undocumented vulnerabilities and lack of update options such as backdoors.  The third security risk reflects the large cluster node attack surfaces which organizations are not prepared to monitor them.  The fourth security risk reflects the poor authentication of users and the weak remote access policies. The fifth security risk is associated with the organizations is unable to handle large processing of audit and access logs.  The sixth element includes the lack of data validation looking for malicious data input which can become lost in the large volume of the Big Data (Al-Kahtani, 2017).  With regard to the infrastructure, the common attacks can include false data injections, Denial of Service (DoS), worm and malware propagation, and botnet attacks (Al-Kahtani, 2017).   In (Khan et al., 2014), the security issues associated with the Big Data are categorized into privacy, integrity, availability, confidentiality, and governance.  Data leakage is a major privacy concern in Big Data.  The data integrity is a particular challenge for large-scale collaborative analysis, where data frequently changes (Khan et al., 2014).  The availability is critical when dealing with Big Data in the cloud.  It involves threats to data availability such as Denial of Service (DoS) and Mitigation of DoS attacks.  The confidentiality security issue refers to the distorted data from theft (Khan et al., 2014).  In (Mehta, 2017), the security issues associated with Big Date involves granular access, monitoring in real-time, granular audits, preserve privacy in data mining and analytics, encrypted data-centric security, data provenance and verification, and integrity and reactive security.  These security issues are similar to the ones discussed in (CSA & Big-Data-Working-Group, 2013; Sahafizadeh & Nematbakhsh, 2015; Yadav, 2016).

For this discussion, only two security issues associated with Big Data are discussed and analyzed with the proposed solutions to overcome them.  These two security issues are categorized under the Integrity and Reactive Security category of (CSA & Big-Data-Working-Group, 2013), which involves (1) End-point validation and filtering, and (2) Real-time Security Monitoring.  The End-point validation and filtering are categorized in (Demchenko, Ngo, de Laat, Membrey, & Gordijenko, 2013-15) under the Infrastructure Security category, while the Real-time Security Monitoring is categorized under the Data Management. 

End-Point Validation and Filtering Security Issue and Proposed Solutions

The end-points are the main components for Big Data collection (Yadav, 2016).  They provide input data for storage and processing.  Security is very important to ensure the use of the only authentic end-points, where the network is free from other end-points including the malicious ones (Yadav, 2016).   The data collected from various sources including end-point devices is required when dealing with Bid Data (CSA & Big-Data-Working-Group, 2013). The security information and event management system (SIEM) is an example of collecting logs from millions of software applications and hardware devices in an enterprise or organization network.  The input validation and filtering process during this data collection are very challenging and critical to the integrity and the trust of the data due to threats of the untrusted sources especially with the “bring-your-own-device” (BYOD) model which allows employees to bring their own devices to the workplace (CSA & Big-Data-Working-Group, 2013).  There are four threat models when dealing with validation and filtering security issue.  The malicious attacker may tamper with any of these devices such as the smartphone from where data is collected and retrieved with the aim of providing malicious input to a central data collection system is the first threat model.  The malicious attacker may perform ID cloning attacks such as Sybil attacks on the collected data with the aim of providing malicious input to a central data collection using the faked identities.  The malicious attacker can manipulate the input sources of sensed data is the third threat model when dealing with the validation and filtering security issue.  The last threat model for this security issue involves the malicious attacker compromising data in transmission from a benign source to the central collection system such as by performing a man-in-the-middle attack or a replay attack (CSA & Big-Data-Working-Group, 2013).

A use case scenario for this issue is the data which gets retrieved from weather sensors and feedback votes and are sent by a smartphone such as iPhone or Android applications have the similar validation and filtering problem (CSA & Big-Data-Working-Group, 2013).  The security issue of the validation and filtering of this example gets further complicated when the volume of the data collected gets increased (CSA & Big-Data-Working-Group, 2013).  The algorithm is required to validate the input for large data sets to validate and filter the data from any malicious and untrusting data (CSA & Big-Data-Working-Group, 2013). 

The solutions to the validation security issue are categorized into two categories.  The first category is to prevent the malicious attacker from generating and sending malicious input to the central collection system (CSA & Big-Data-Working-Group, 2013).  The second category is to detect and filter malicious input at the central system in case the malicious attacker was successful sending the malicious data to the central collection system (CSA & Big-Data-Working-Group, 2013).  

The first solution to prevent malicious attacks requires tamper-proof software and defense against the “Sybil” attacks.   The researchers and industry have exerted tremendous efforts to design and implement tamper-proof secure software and tools.  The security for PC-based platforms and applications have been widely studied.  However, the mobile devices and the application security still an active area for research (CSA & Big-Data-Working-Group, 2013).  Thus, the determined malicious attacker may succeed in tamping the mobile devices.  Trusted Platform Module (TPM) was proposed to ensure the integrity of raw sensor data, and data derived from raw data (CSA & Big-Data-Working-Group, 2013).  However, the TPM solution is not found in mobile devices universally.  Thus, the malicious attacker can manipulate the sensor input such as GPS signals (CSA & Big-Data-Working-Group, 2013).  Various defense techniques against the fake ID using the ID cloning attacks and Sybil attacks have been proposed such as P2P (Peer-To-Peer) systems, Recommender Systems (RS), Vehicular Networks, and Wireless Sensor Network (CSA & Big-Data-Working-Group, 2013).  Many of these defense techniques propose the Trusted Certificates and Trusted Devices to prevent Sybil attacks.  However, in large enterprise settings and organizations with millions of entities, the management of certificates become an additional challenge.  Thus, additional solutions for resource testing are proposed to provide minimal defense against the Sybil attacks by discouraging Sybil attacks instead of preventing it (CSA & Big-Data-Working-Group, 2013).   The Big Data analytical techniques can be used to detect and filter malicious input at the central collection system.  Malicious input from the malicious attacker may appear as outliers.  Thus, statistical analysis and outlier detection techniques can be used to detect and filter out the malicious output (CSA & Big-Data-Working-Group, 2013).

Real-time Security Monitoring

The Real-Time Security Monitoring is described as one of the most challenging Big Data Analytics issues (CSA & Big-Data-Working-Group, 2013; Sakr & Gaber, 2014).  This challenging issue is a two-dimensional issue including the monitoring of the Big Data infrastructure itself, and the use of the same infrastructure for Big Data Analytics.  The performance monitoring and the health of the nodes in the Big Data infrastructure is an example of the first side of this issue. A good example of the other side of this issue is the health care provider using monitoring tools to look for fraudulent claims to get a better real-time alert and compliance monitoring (CSA & Big-Data-Working-Group, 2013).   The Real-Time Security Monitoring is challenging because the security devices send some alerts which can lead to a massive number of false positives, which are often ignored due to the limited human capacity for analysis.  This problem becomes further challenging with Big Data due to the characteristics of the Big Data volume, the velocity of the data streams.   However, the technologies of the Big Data can provide an opportunity to process and analyze different types of data rapidly, and real-time anomaly detection based on scalable security analytics (CSA & Big-Data-Working-Group, 2013). 

A use case scenario for this issue is the health industry by reducing the fraud related to claims (CSA & Big-Data-Working-Group, 2013).  Moreover, the stored data are extremely sensitive and must comply with the patient privacy regulations such as HIPAA, and it must be carefully protected.   The Real-Time detection of the anomalous retrieval of private information of the patients enables the healthcare provider to rapidly repair damage and prevent further misuse (CSA & Big-Data-Working-Group, 2013).

The security of the Big Data infrastructure and platform must be secured which is a requirement for the Real-Time Security Monitoring.  Big Data infrastructure threats include (1) rogue admin access to applications or nodes, (2) web application threats, and (3) eavesdropping on the line.  Thus, the security of Big Data ecosystem and infrastructure must include each component and the integration of these components.  For instance, when using Hadoop cluster in a Public Cloud, the security for the Big Data should include the Security of the Public Cloud, the Security of Hadoop clusters and all nodes in the cluster, the Security of the Monitoring Applications, and the Security of the Input Sources such as devices and sensors (CSA & Big-Data-Working-Group, 2013).  The threats also include the attack on the Big Data Analytics tools which are used to identify the malicious attacks.  For instance, evasion attacks can be used to prevent from being detected; the data poisoning attacks can be used to minimize the trustworthiness and integrity of the datasets which are used to train Big Data analytics algorithms (CSA & Big-Data-Working-Group, 2013).  Moreover, barriers such as legal regulations become important when dealing with the Real-Time Security Monitoring challenges of the Big Data.  Big Data Analytics (BDA) can be employed to monitor anomalous connection to the cluster environment and to mine the logging events to identify any suspicious activities (CSA & Big-Data-Working-Group, 2013).  When dealing with the Real-Time Security Monitoring, various factors such as technical, legal and ethical must be taken into consideration (CSA & Big-Data-Working-Group, 2013).

Conclusion

This discussion focused on two security issues associated with Big Data.  The discussion and the analysis included the cause of these two security issues and the solutions.  The discussion began with an overview of the Security Issues when dealing with Big Data.  The categories of the threats are described not only by (CSA & Big-Data-Working-Group, 2013) but also by researchers.  CSA identified the top ten challenges when dealing with Big Data.  Various researchers also identified the threats and security challenges associated with Big Data.  Some of these security challenges and threats include the secure computations in distributed programming framework. The security of data storage and transactions logs, the End-Point input validation and filtering, and Real-Time Security Monitoring.  The two security issues chosen for this discussion are the End-Point input validation and filtering, and the Real-Time Security Monitoring.   Various solutions are proposed to reduce and prevent the attacks and threats when dealing with Big Data. However, there is no perfect solution yet for the threats and security issues associated with Big Data due to the nature of the Big Data and the fact that the mobile devices are still active research areas for security. 

References

Al-Kahtani, M. S. (2017). Security and Privacy in Big Data. International Journal of Computer Engineering and Information Technology, 9(2).

Arora, M., & Bahuguna, H. Big Data Security–The Big Challenge.

CSA, & Big-Data-Working-Group. (2013). Expanded Top Ten Big Data Security and Privacy Challenges. Cloud Security Alliance.

Demchenko, Y., Ngo, C., de Laat, C., Membrey, P., & Gordijenko, D. (2013-15). Big security for big data: addressing security challenges for the big data infrastructure. Paper presented at the Workshop on Secure Data Management.

Khan, N., Yaqoob, I., Hashem, I. A. T., Inayat, Z., Mahmoud Ali, W. K., Alam, M., . . . Gani, A. (2014). Big Data: Survey, Technologies, Opportunities, and Challenges. The Scientific World Journal, 2014.

Mehta, T. M. P. M. P. (2017). Security and Privacy–A Big Concern in Big Data A Case Study on Tracking and Monitoring System.

Sahafizadeh, E., & Nematbakhsh, M. A. (2015). A Survey on Security Issues in Big Data and NoSQL. Int’l J. Advances in Computer Science, 4(4), 2322-5157.

Sakr, S., & Gaber, M. (2014). Large Scale and big data: Processing and Management: CRC Press.

Yadav, N. (2016). Top Ten Big Data Security and Privacy Challenge. Retrieved from https://www.infosecurity-magazine.com/opinions/big-data-security-privacy/.