Dr. Aly, O.
Computer Science
Introduction
The purpose of this discussion is to discuss and analyze two security issues associated with Big Data. The analysis includes the causes for these two security issues and the solutions. The discussion begins with an overview of the Security Issues when dealing with Big Data.
Security Issues Associated with Big Data
As indicated in (CSA & Big-Data-Working-Group, 2013), the velocity, volume and variety characteristics of Big Data magnify the security and privacy issues. These security and privacy issues include issues such as the large-scale infrastructures in the Cloud, various data sources and formats, the acquisition of the data using streaming techniques, and the high-volume migration inside the Cloud (CSA & Big-Data-Working-Group, 2013). Thus, the traditional security techniques which tended to be for small-scale, static data are found inadequate when dealing with Big Data (CSA & Big-Data-Working-Group, 2013). Storing the organizations’ information, customers and patients in a secure manner is not a trivial process, and it gets more complicated in a Big Data environment (Al-Kahtani, 2017). CSA identified the top ten Big Data Security and Privacy challenges illustrated in the Big Data Ecosystem in Figure 1, adapted from CSA. These ten security challenges are categorized in four main categories in Big Data Ecosystems: (1) The infrastructure Security, (2) Data Privacy, (3) Data Management, and (4) Integrity and Reactive Security.

Figure 1. Top Ten Security Challenges in Big Data Ecosystem, adapted from (Al-Kahtani, 2017).
Tremendous efforts from the researchers, practitioners and the industry are exerted to address the security issues associated with Big Data. As indicated in (Arora & Bahuguna), Security of the Big Data is challenging due to two main vulnerabilities. The first vulnerability includes the information leakage which gets increased by Big Data because of its characteristics of high volume and velocity (Arora & Bahuguna). The second vulnerability reflects the privacy and prediction of people’s behavior risk get increased by the development of intelligent terminals (Arora & Bahuguna). In (Al-Kahtani, 2017), the general security risks associated with Big Data environments are identified to include six security risk elements. The first security risk is associated with the implementation of a new technology, which can lead to new vulnerability discovery. The second security risk can be associated with the open source tools which can contain undocumented vulnerabilities and lack of update options such as backdoors. The third security risk reflects the large cluster node attack surfaces which organizations are not prepared to monitor them. The fourth security risk reflects the poor authentication of users and the weak remote access policies. The fifth security risk is associated with the organizations is unable to handle large processing of audit and access logs. The sixth element includes the lack of data validation looking for malicious data input which can become lost in the large volume of the Big Data (Al-Kahtani, 2017). With regard to the infrastructure, the common attacks can include false data injections, Denial of Service (DoS), worm and malware propagation, and botnet attacks (Al-Kahtani, 2017). In (Khan et al., 2014), the security issues associated with the Big Data are categorized into privacy, integrity, availability, confidentiality, and governance. Data leakage is a major privacy concern in Big Data. The data integrity is a particular challenge for large-scale collaborative analysis, where data frequently changes (Khan et al., 2014). The availability is critical when dealing with Big Data in the cloud. It involves threats to data availability such as Denial of Service (DoS) and Mitigation of DoS attacks. The confidentiality security issue refers to the distorted data from theft (Khan et al., 2014). In (Mehta, 2017), the security issues associated with Big Date involves granular access, monitoring in real-time, granular audits, preserve privacy in data mining and analytics, encrypted data-centric security, data provenance and verification, and integrity and reactive security. These security issues are similar to the ones discussed in (CSA & Big-Data-Working-Group, 2013; Sahafizadeh & Nematbakhsh, 2015; Yadav, 2016).
For this discussion, only two security issues associated with Big Data are discussed and analyzed with the proposed solutions to overcome them. These two security issues are categorized under the Integrity and Reactive Security category of (CSA & Big-Data-Working-Group, 2013), which involves (1) End-point validation and filtering, and (2) Real-time Security Monitoring. The End-point validation and filtering are categorized in (Demchenko, Ngo, de Laat, Membrey, & Gordijenko, 2013-15) under the Infrastructure Security category, while the Real-time Security Monitoring is categorized under the Data Management.
End-Point Validation and Filtering Security Issue and Proposed Solutions
The end-points are the main components for Big Data collection (Yadav, 2016). They provide input data for storage and processing. Security is very important to ensure the use of the only authentic end-points, where the network is free from other end-points including the malicious ones (Yadav, 2016). The data collected from various sources including end-point devices is required when dealing with Bid Data (CSA & Big-Data-Working-Group, 2013). The security information and event management system (SIEM) is an example of collecting logs from millions of software applications and hardware devices in an enterprise or organization network. The input validation and filtering process during this data collection are very challenging and critical to the integrity and the trust of the data due to threats of the untrusted sources especially with the “bring-your-own-device” (BYOD) model which allows employees to bring their own devices to the workplace (CSA & Big-Data-Working-Group, 2013). There are four threat models when dealing with validation and filtering security issue. The malicious attacker may tamper with any of these devices such as the smartphone from where data is collected and retrieved with the aim of providing malicious input to a central data collection system is the first threat model. The malicious attacker may perform ID cloning attacks such as Sybil attacks on the collected data with the aim of providing malicious input to a central data collection using the faked identities. The malicious attacker can manipulate the input sources of sensed data is the third threat model when dealing with the validation and filtering security issue. The last threat model for this security issue involves the malicious attacker compromising data in transmission from a benign source to the central collection system such as by performing a man-in-the-middle attack or a replay attack (CSA & Big-Data-Working-Group, 2013).
A use case scenario for this issue is the data which gets retrieved from weather sensors and feedback votes and are sent by a smartphone such as iPhone or Android applications have the similar validation and filtering problem (CSA & Big-Data-Working-Group, 2013). The security issue of the validation and filtering of this example gets further complicated when the volume of the data collected gets increased (CSA & Big-Data-Working-Group, 2013). The algorithm is required to validate the input for large data sets to validate and filter the data from any malicious and untrusting data (CSA & Big-Data-Working-Group, 2013).
The solutions to the validation security issue are categorized into two categories. The first category is to prevent the malicious attacker from generating and sending malicious input to the central collection system (CSA & Big-Data-Working-Group, 2013). The second category is to detect and filter malicious input at the central system in case the malicious attacker was successful sending the malicious data to the central collection system (CSA & Big-Data-Working-Group, 2013).
The first solution to prevent malicious attacks requires tamper-proof software and defense against the “Sybil” attacks. The researchers and industry have exerted tremendous efforts to design and implement tamper-proof secure software and tools. The security for PC-based platforms and applications have been widely studied. However, the mobile devices and the application security still an active area for research (CSA & Big-Data-Working-Group, 2013). Thus, the determined malicious attacker may succeed in tamping the mobile devices. Trusted Platform Module (TPM) was proposed to ensure the integrity of raw sensor data, and data derived from raw data (CSA & Big-Data-Working-Group, 2013). However, the TPM solution is not found in mobile devices universally. Thus, the malicious attacker can manipulate the sensor input such as GPS signals (CSA & Big-Data-Working-Group, 2013). Various defense techniques against the fake ID using the ID cloning attacks and Sybil attacks have been proposed such as P2P (Peer-To-Peer) systems, Recommender Systems (RS), Vehicular Networks, and Wireless Sensor Network (CSA & Big-Data-Working-Group, 2013). Many of these defense techniques propose the Trusted Certificates and Trusted Devices to prevent Sybil attacks. However, in large enterprise settings and organizations with millions of entities, the management of certificates become an additional challenge. Thus, additional solutions for resource testing are proposed to provide minimal defense against the Sybil attacks by discouraging Sybil attacks instead of preventing it (CSA & Big-Data-Working-Group, 2013). The Big Data analytical techniques can be used to detect and filter malicious input at the central collection system. Malicious input from the malicious attacker may appear as outliers. Thus, statistical analysis and outlier detection techniques can be used to detect and filter out the malicious output (CSA & Big-Data-Working-Group, 2013).
Real-time Security Monitoring
The Real-Time Security Monitoring is described as one of the most challenging Big Data Analytics issues (CSA & Big-Data-Working-Group, 2013; Sakr & Gaber, 2014). This challenging issue is a two-dimensional issue including the monitoring of the Big Data infrastructure itself, and the use of the same infrastructure for Big Data Analytics. The performance monitoring and the health of the nodes in the Big Data infrastructure is an example of the first side of this issue. A good example of the other side of this issue is the health care provider using monitoring tools to look for fraudulent claims to get a better real-time alert and compliance monitoring (CSA & Big-Data-Working-Group, 2013). The Real-Time Security Monitoring is challenging because the security devices send some alerts which can lead to a massive number of false positives, which are often ignored due to the limited human capacity for analysis. This problem becomes further challenging with Big Data due to the characteristics of the Big Data volume, the velocity of the data streams. However, the technologies of the Big Data can provide an opportunity to process and analyze different types of data rapidly, and real-time anomaly detection based on scalable security analytics (CSA & Big-Data-Working-Group, 2013).
A use case scenario for this issue is the health industry by reducing the fraud related to claims (CSA & Big-Data-Working-Group, 2013). Moreover, the stored data are extremely sensitive and must comply with the patient privacy regulations such as HIPAA, and it must be carefully protected. The Real-Time detection of the anomalous retrieval of private information of the patients enables the healthcare provider to rapidly repair damage and prevent further misuse (CSA & Big-Data-Working-Group, 2013).
The security of the Big Data infrastructure and platform must be secured which is a requirement for the Real-Time Security Monitoring. Big Data infrastructure threats include (1) rogue admin access to applications or nodes, (2) web application threats, and (3) eavesdropping on the line. Thus, the security of Big Data ecosystem and infrastructure must include each component and the integration of these components. For instance, when using Hadoop cluster in a Public Cloud, the security for the Big Data should include the Security of the Public Cloud, the Security of Hadoop clusters and all nodes in the cluster, the Security of the Monitoring Applications, and the Security of the Input Sources such as devices and sensors (CSA & Big-Data-Working-Group, 2013). The threats also include the attack on the Big Data Analytics tools which are used to identify the malicious attacks. For instance, evasion attacks can be used to prevent from being detected; the data poisoning attacks can be used to minimize the trustworthiness and integrity of the datasets which are used to train Big Data analytics algorithms (CSA & Big-Data-Working-Group, 2013). Moreover, barriers such as legal regulations become important when dealing with the Real-Time Security Monitoring challenges of the Big Data. Big Data Analytics (BDA) can be employed to monitor anomalous connection to the cluster environment and to mine the logging events to identify any suspicious activities (CSA & Big-Data-Working-Group, 2013). When dealing with the Real-Time Security Monitoring, various factors such as technical, legal and ethical must be taken into consideration (CSA & Big-Data-Working-Group, 2013).
Conclusion
This discussion focused on two security issues associated with Big Data. The discussion and the analysis included the cause of these two security issues and the solutions. The discussion began with an overview of the Security Issues when dealing with Big Data. The categories of the threats are described not only by (CSA & Big-Data-Working-Group, 2013) but also by researchers. CSA identified the top ten challenges when dealing with Big Data. Various researchers also identified the threats and security challenges associated with Big Data. Some of these security challenges and threats include the secure computations in distributed programming framework. The security of data storage and transactions logs, the End-Point input validation and filtering, and Real-Time Security Monitoring. The two security issues chosen for this discussion are the End-Point input validation and filtering, and the Real-Time Security Monitoring. Various solutions are proposed to reduce and prevent the attacks and threats when dealing with Big Data. However, there is no perfect solution yet for the threats and security issues associated with Big Data due to the nature of the Big Data and the fact that the mobile devices are still active research areas for security.
References
Al-Kahtani, M. S. (2017). Security and Privacy in Big Data. International Journal of Computer Engineering and Information Technology, 9(2).
Arora, M., & Bahuguna, H. Big Data Security–The Big Challenge.
CSA, & Big-Data-Working-Group. (2013). Expanded Top Ten Big Data Security and Privacy Challenges. Cloud Security Alliance.
Demchenko, Y., Ngo, C., de Laat, C., Membrey, P., & Gordijenko, D. (2013-15). Big security for big data: addressing security challenges for the big data infrastructure. Paper presented at the Workshop on Secure Data Management.
Khan, N., Yaqoob, I., Hashem, I. A. T., Inayat, Z., Mahmoud Ali, W. K., Alam, M., . . . Gani, A. (2014). Big Data: Survey, Technologies, Opportunities, and Challenges. The Scientific World Journal, 2014.
Mehta, T. M. P. M. P. (2017). Security and Privacy–A Big Concern in Big Data A Case Study on Tracking and Monitoring System.
Sahafizadeh, E., & Nematbakhsh, M. A. (2015). A Survey on Security Issues in Big Data and NoSQL. Int’l J. Advances in Computer Science, 4(4), 2322-5157.
Sakr, S., & Gaber, M. (2014). Large Scale and big data: Processing and Management: CRC Press.
Yadav, N. (2016). Top Ten Big Data Security and Privacy Challenge. Retrieved from https://www.infosecurity-magazine.com/opinions/big-data-security-privacy/.