Information Technology Requirements in Healthcare

Dr. O. Aly
Computer Science

The purpose of this discussion is to address one of the sectors that utilizes a few unique information technology (IT) requirements. The selected sector for this discussion is health care. The discussion addresses the IT needs based on a case study. The discussion begins with Information Technology Key Role in Business, followed by the Healthcare Industry Case Study.

Information Technology Key Role in Business

Information technology (IT) is a critical resource for businesses in the age of Big Data and Big Data Analytics (Dewett & Jones, 2001; Pearlson & Saunders, 2001). IT supports and consumes a significant amount of the resources of enterprises. IT needs to be managed wisely like other significant types of business resources such as people, money, and machines. These resources must return a value to the business. Thus, enterprises must carefully evaluate its resources including the IT resources that can be efficiently and effectively used.

Information system and technology are now integrated with almost every aspect of every business. IT and IS play significant roles in business, as it simplifies the organizational activities and processes. Enterprises can gain competitive advantages when utilizing appropriate information technology. The inadequate information system can cause a breakdown in providing services to customers or developing products which can harm sales and eventually the businesses (Bhatt & Grover, 2005; Brynjolfsson & Hitt, 2000; Pearlson & Saunders, 2001). The same thing applies when inefficient business processes sustained by ill-fitting information system and technology as they increase the cost on the business without any return on investment or value. The lag in the implementation or poor process adaptation reduce the profits and the growth and can place the business behind other competitors. The failure of the information system and technology in business is caused primarily by ignoring them during the planning of the business strategy and organizational strategy. IT will fail to support business goals and organizational systems because it was not considered in the business and organizational strategy. When the business strategy is misaligned with the organizational strategy, IT is subject to failure (Pearlson & Saunders, 2001).

IT Support to Business Goals

Enterprises should invest in IT resources that will benefit them. They should make investment in systems that supports their business goals including gaining competitive advantages (Bhatt & Grover, 2005). Although IT represents a significant investment in businesses, yet, the poorly chosen information system can become an obstacle to achieving the business goals (Dewett & Jones, 2001; Henderson & Venkatraman, 1999; Pearlson & Saunders, 2001). When the IT does not allow the business to achieve its goals, or lack the capacity required to collect, store, and transfer critical information for the business, the results can be disastrous, leading to dissatisfied customers, or excessive costs for production. The Toys R US store is an excellent example of such an issue (Pearlson & Saunders, 2001). The well-publicized website was not designed to process and fulfill orders fast enough. The site could be redesigned with an additional cost which could have been saved if the IT strategy and business goals were discussed together to be aligned together.

IT Support to Organizational Systems

Organizations systems including people, work processes, and structure represent the core elements of the business. Enterprises should plan to enable these systems to work together efficiently to achieve the business goals (Henderson & Venkatraman, 1999; Pearlson & Saunders, 2001; Ryssel, Ritter, & Georg Gemünden, 2004). When the IT of the business fails to support the business’ organization systems, the result is a misalignment of the resources needed to achieve the business goals. For instance, when organizations decide to use Enterprise Resource Planning (ERP) system, the system often dictates how many business processes are executed. When enterprises deploy a technology, they should think through various aspects such as how the technology will be used in the organization, who will use it, how they will use it, how to make sure the application chosen accomplishes what is intended. For instance, an organization which plans to institute a wide-scale telecommuting program would need an information system strategy that is compatible with its organization strategy (Pearlson & Saunders, 2001). The desktop PCs located within the corporate office are not the right solution for a telecommuting organization. Laptop computers application that are accessible online anywhere and anytime are a most appropriate solution. If a business only allows the purchase of desktop PCs and only builds systems accessible from desks within the office, the telecommuting program is subject to failure. Thus, information systems implementation should support the organizational systems and should be aligned with the business goals.

Advantages of IT in Business

Business is able to transform local business to international business with the advent of information system and internet (Bhatt & Grover, 2005; Zimmer, 2018). Organizations are under pressures to take advantages of information technology to gain competitive advantages. They are turning to information technology to streamline services and enhance the performance. IT has become an essential feature in the landscape of the business that aid business to decrease the costs, improve communication, develop recognition, and release more innovative and attractive products.

IT streamlines communication as effective communication is critical to an organization’s success (Bhatt & Grover, 2005; Zimmer, 2018). A key advantage of information system lies in its ability to streamline communication both internally and externally. For instance, online meeting and video conferencing platform such as Skype, WebEx provide business the opportunity to collaborate virtually in real-time, reducing costs associated with bringing clients on-site or communicating with staff who work remotely. IT enables Enterprises to connect almost effortlessly with international suppliers and consumers.

IT can enhance the competitive advantages in the marketplace of the business by facilitating strategic thinking and knowledge transfer (Bhatt & Grover, 2005; Zimmer, 2018). When using IT as a strategic investment and not as a means to an end, IT provides business with the tools they need to properly evaluate the market and implement strategies needed for a competitive edge.

IT stores and safeguards information, as information management is another domain of IT (Bhatt & Grover, 2005; Zimmer, 2018). IT is essential to any business that must store and safeguard sensitive information such as financial data for long periods. Various security techniques can be applied to ensure the data is stored in a secure place. Organizations should evaluate the options available to store their data such as locally using local data center or cloud-based storage methods.

IT cuts costs and eliminate waste (Bhatt & Grover, 2005; Zimmer, 2018). Although IT implementation at the beginning will be expensive, in the long run, it becomes incredibly cost-effective by streamlining the operational and managerial processes of the business. Thus, investing in the appropriate IT is key for a business to gain a return on investment. For instance, the implementation of online training programs is a classic example of IT improving the internal processes of the business by reducing the costs and employees’ time spent outside of work, and travel costs. Information technology enables organizations to implement more with less investment without sacrificing quality or value.

Healthcare Industry Case Study

The healthcare industry generated extensive data driven by keeping patients’ records, complying with regulations and policies, and patients care (Raghupathi & Raghupathi, 2014). The current trend is digitalizing this explosive growth of the data in the age of Big Data (BD) and Big Data Analytics (BDA) (Raghupathi & Raghupathi, 2014). BDA has made a revolution in healthcare by transforming the valuable information, knowledge to predict epidemics, cure diseases, improve quality of life, and avoid preventable deaths (Van-Dai, Chuan-Ming, & Nkabinde, 2016). Various applications of BDA in healthcare include pervasive health, fraud detection, pharmaceutical discoveries, clinical decision support system, computer-aided diagnosis, and biomedical applications.

Healthcare Big Data Benefits and Challenges

Healthcare sector employs BDA in various aspect of healthcare such as detecting diseases at early stages, providing evidence-based medicine, minimizing doses of medication to avoid any side effects, and delivering useful medicine base on genetic analysis. The use of BD and BDA can reduce the re-admission rate, and thereby the healthcare related costs for patients are reduced. Healthcare BDA can be used to detect spreading diseases earlier before the disease gets spread using real-time analytics (Archenaa & Anita, 2015; Raghupathi & Raghupathi, 2014; Wang, Kung, & Byrd, 2018). Example of the application of BDA in the healthcare system is Kaiser Permanente implementing a HealthConnect technique to ensure data exchange across all medical facilities and promote the use of electronic health records (Fox & Vaidyanathan, 2016).

Despite the various benefits of BD and BDA in the healthcare sector, various challenges and issues are emerging from the application of BDA in healthcare. The nature of the healthcare industry poses challenging to BDA (Groves, Kayyali, Knott, & Kuiken, 2016). The episodic culture, the data puddles, and the IT leadership are the three significant challenges of the healthcare industry to apply BDA. The episodic culture addresses the conservative culture of the healthcare and the lack of IT technologies mindset creating rigid culture. Few providers have overcome this rigid culture and started to use the BDA technology. The data puddles reflect the silo nature of healthcare. Silo is described as one of the most significant flaws in the healthcare sector (Wicklund, 2014). The use of the technology properly is lacking in healthcare sector resulting in making the industry fall behind other industries. All silos use their methods to collect data from labs, diagnosis, radiology, emergency, case management and so forth. The IT leadership is another challenge is caused by the rigid culture of the healthcare industry. The lack of the latest technologies among the IT leadership in the healthcare industry is a severe problem.

Healthcare Data Sources for Data Analytics

The current healthcare data is collected from clinical and non-clinical sources (InformationBuilders, 2018; Van-Dai et al., 2016; Zia & Khan, 2017). The electronic healthcare records are digital copies of the medical history of the patients. It contains a variety of data relevant to the care of the patients such as demographics, medical problems, medications, body mass index, medical history, laboratory test data, radiology reports, clinical notes, and payment information. These electronic healthcare records are the most critical data in healthcare data analytics, because it provides effective and efficient methods for the providers and organizations to share data (Botta, de Donato, Persico, & Pescapé, 2016; Palanisamy & Thirunavukarasu, 2017; Van-Dai et al., 2016; Wang et al., 2018).

The biomedical imaging data plays a crucial role in healthcare data to aid disease monitoring, treatment planning and prognosis. This data can be used to generate quantitative information and make inferences from the images that can provide insights into a medical condition. The images analytics is more complicated due to the noises of the data associated with the images and is one of the significant limitations with biomedical analysis (Ji, Ganchev, O’Droma, Zhang, & Zhang, 2014; Malik & Sangwan, 2015; Van-Dai et al., 2016).

The sensing data is ubiquitous in the medical domain both for real-time and for historical data analysis. The sensing data involve several forms of medical data collection instruments such as the electrocardiogram (ECG) and electroencephalogram (EEG) which are vital sensors to collect signals from various parts of the human body. The sensing data plays a significant role for intensive care units (ICU) and real-time remote monitoring of patients with specific conditions such as diabetes or high blood pressure. The real-time and long-term analysis of various trends and treatment in remote monitoring programs can help providers monitor the state of those patients with certain conditions(Van-Dai et al., 2016).

The biomedical signals are collected from many sources such as hearts, blood pressure, oxygen saturation levels, blood glucose, nerve conduction, and brain activity. Examples of biomedical signals include electroneurogram (ENG), electromyogram (EMG), electrocardiogram (ECG), electroencephalogram (EEG), electrogastrogram (EGG), and phonocardiogram (PCG). The biomedical signals real-time analytics will provide better management of chronic diseases, earlier detection of adverse events such as heart attacks, and strokes and earlier diagnosis of disease. These biomedical signals can be discrete or continuous based on the kind of care or severity of a particular pathological condition (Malik & Sangwan, 2015; Van-Dai et al., 2016).

The genomic data analysis helps better understand the relationship between various genetic, mutations, and disease conditions. It has great potentials in the development of various gene therapies to cure certain conditions. Furthermore, the genomic data analytics can assist in translating genetic discoveries into personalized medicine practice (Liang & Kelemen, 2016; Luo, Wu, Gopukumar, & Zhao, 2016; Palanisamy & Thirunavukarasu, 2017; Van-Dai et al., 2016).

The clinical text data analytics using the data mining are the transformation process of the information from clinical notes stored in unstructured data format to useful patterns. The manual coding of clinical notes is costly and time-consuming, because of their unstructured nature, heterogeneity, different format, and context across different patients and practitioners. Various methods such as natural language processing (NLP) and information retrieval can be used to extract useful knowledge from large volume of clinical text and automatically encoding clinical information in a timely manner (Ghani, Zheng, Wei, & Friedman, 2014; Sun & Reddy, 2013; Van-Dai et al., 2016).

The social network healthcare data analytics is based on various kinds of collected social media sources such as social networking sites, e.g., Facebook, Twitter, Web Logs, to discover new patterns and knowledge that can be leveraged to model and predict global health trends such as outbreaks of infections epidemics (InformationBuilders, 2018; Luo et al., 2016; Van-Dai et al., 2016; Zia & Khan, 2017).

IT Requirements for Healthcare Sector

The basic requirement for the implementation of this proposal included not only the tools and required software, but also the training at all levels from staff, to nurses, to clinicians, to patients. The list of the requirements is divided into system requirement, implementation requirement, and training requirements.

Cloud Computing Technology Adoption Requirement

The volume is one of the significant characteristics of BD, especially in the healthcare industry (Manyika et al., 2011). Based on the challenges addressed earlier when dealing with BD and BDA in healthcare, the system requirements cannot be met using the traditional on-premise technology center, as it cannot handle the intensive computation requirements of BD, and the storage requirement for all the medical information from various hospitals from the four States (Hu, Wen, Chua, & Li, 2014). Thus, the cloud computing environment is found to be more appropriate and a solution for the implantation of this proposal. Cloud computing plays a significant role in BDA (Assunção, Calheiros, Bianchi, Netto, & Buyya, 2015). The massive computation and storage requirement of BDA brings the critical need for cloud computing emerging technology (Mehmood, Natgunanathan, Xiang, Hua, & Guo, 2016). Cloud computing offers various benefits such as cost reduction, elasticity, pay per use, availability, reliability, and maintainability (Gupta, Gupta, & Mohania, 2012; Kritikos, Kirkham, Kryza, & Massonet, 2017). However, although cloud computing offers various benefits, it has security and privacy issues using the standard deployment models of public cloud, private cloud, hybrid cloud, and community cloud. Thus, one of the major requirements is to adopt the Virtual Private Cloud as it has been regarded as the most prominent approach to trusted computing technology (Abdul, Jena, Prasad, & Balraju, 2014).

Security Requirement

Cloud computing has been facing various threats (Cloud Security Alliance, 2013, 2016, 2017). Records showed that over the last three years from 2015 until 2017, the number of breaches, lost medical records, and settlements of fines are staggering (Thompson, 2017). The Office of Civil Rights (OCR) issued 22 resolution agreements, requiring monetary settlements approaching $36 million (Thompson, 2017). Table 1 shows the data categories and the total for each year.

Table 1. Approximation of Records Lost by Category Disclosed on HHS.gov (Thompson, 2017)

Furthermore, a recent report published by HIPAA showed the first three months of 2018 experienced 77 healthcare data breaches reported to the OCR (HIPAA, 2018d). In the second quarter of 2018, at least 3.14 million healthcare records were exposed (HIPAA, 2018a). In the third quarter of 2018, 4.39 million records exposed in 117 breaches (HIPAA, 2018c).

Thus, the protection of the patients’ private information requires the technology to extract, analyze, and correlated potentially sensitive dataset (HIPAA, 2018b). The implementation of BDA requires security measures and safeguards to protect the privacy of the patients in the healthcare industry (HIPAA, 2018b). Sensitive data should be encrypted to prevent the exposure of data in the event of theft (Abernathy & McMillan, 2016). The security requirements involve security at the VPC cloud deployment model as well as at the local hospitals in each State (Regola & Chawla, 2013). The security at the VPC cloud deployment model should involve the implementation of security groups and network access control lists to allow access to the right individuals to the right applications and patients’ records. Security group in VPC acts as the first line of defense firewall for the associated instances of the VPC (McKelvey, Curran, Gordon, Devlin, & Johnston, 2015). The network access control lists act as the second layer of defense firewall for the associated subnets, controlling the inbound and the outbound traffic at the subnet level (McKelvey et al., 2015).

The security at the local hospitals level in each State is mandatory to protect patients’ records and comply with HIPAA regulations (Regola & Chawla, 2013). The medical equipment must be secured with authentication and authorization techniques so that only the medical staff, nurses and clinicians have access to the medical devices based on their role. The general access should be prohibited as every member of the hospital has a different role with different responses. The encryption should be used to hide the meaning or intent of communication from unintended users (Stewart, Chapple, & Gibson, 2015). The encryption is an essential element in security control especially for the data in transit (Stewart et al., 2015). The hospital in all four State should implement the encryption security control using the same type of the encryption across the hospitals such as PKI, cryptographic application, and cryptography and symmetric key algorithm (Stewart et al., 2015).

The system requirements should also include the identity management systems that can correspond with the hospitals in each state. The identity management system provides authentication and authorization techniques allowing only those who should have access to the patients’ medical records. The proposal requires the implementation of various encryption techniques such as secure socket layer (SSL), Transport Layer Security (TLS), and Internet Protocol Security (IPSec) to protect information transferred in public network (Zhang & Liu, 2010).

Hadoop Implementation for Data Stream Processing Requirement

Hadoop has three significant limitations, which must be addressed in this design. The first limitation is the lack of technical support and document for open source Hadoop (Guo, 2013). Thus, this design requires the Enterprise Edition of Hadoop to get around this limitation using Cloudera, Hortonworks, and MapR (Guo, 2013). The final decision for which product will be determined by the cost analysis team. The second limitation is that Hadoop is not optimal for real-time data processing (Guo, 2013). The solution for this limitation will require the integration of real-time streaming program as Spark or Storm or Kafka (Guo, 2013; Palanisamy & Thirunavukarasu, 2017). This requirement of integrating Spark is discussed below in a separate requirement for this design (Guo, 2013). The third limitation is that Hadoop is not a good fit for large graph dataset (Guo, 2013). The solution for this limitation requires the integration of GraphLab which is also discussed below in a separate requirement for this design.

Conclusion

Information technology (IT) play a significant role in various industries including the healthcare sector. This project discussed the IT role in businesses, the requirement to be aligned with the strategic goal and organizational system of the business. If IT systems are not included during the planning of the business strategy and organizational strategy, the IT integration into the business at a later stage is very likely to set for failure. IT offers various advantages to business including the competitive advantages in the marketplace. Healthcare industry is no exception to integrate IT systems. Healthcare sector has been suffering from various challenges including the high cost of services and inefficient service to patients. The case study showed the need for IT systems requirements that can place the industry into competitive advantages offering better care to patients with low cost. Various IT integrations have been used lately in the healthcare industry including Big Data Analytics, Hadoop technology, security systems, and cloud computing. Kaiser Permanente, for instance, applied Big Data Analytics using HealthConnet to provide care to patients with lower cost and better care, which are aligned with the strategic goal of its business.

References

Abdul, A. M., Jena, S., Prasad, S. D., & Balraju, M. (2014). Trusted Environment In Virtual Cloud. International Journal of Advanced Research in Computer Science, 5(4).

Abernathy, R., & McMillan, T. (2016). CISSP Cert Guide: Pearson IT Certification.

Archenaa, J., & Anita, E. M. (2015). A survey of big data analytics in healthcare and government. Procedia Computer Science, 50, 408-413.

Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A. S., & Buyya, R. (2015). Big Data Computing and Clouds: Trends and Future Directions. Journal of Parallel and Distributed Computing, 79, 3-15. doi:10.1016/j.jpdc.2014.08.003

Bansal, A., Deshpande, A., Ghare, P., Dhikale, S., & Bodkhe, B. (2014). Healthcare data analysis using dynamic slot allocation in Hadoop. International Journal of Recent Technology and Engineering, 3(5), 15-18.

Bhatt, G. D., & Grover, V. (2005). Types of information technology capabilities and their role in competitive advantage: An empirical study. Journal of management information systems, 22(2), 253-277.

Botta, A., de Donato, W., Persico, V., & Pescapé, A. (2016). Integration of Cloud Computing and Internet Of Things: a Survey. Future Generation computer systems, 56, 684-700.

Brynjolfsson, E., & Hitt, L. M. (2000). Beyond computation: Information technology, organizational transformation and business performance. Journal of Economic perspectives, 14(4), 23-48.

Cloud Security Alliance. (2013). The Notorious Nine: Cloud Computing Top Threats in 2013. Cloud Security Alliance: Top Threats Working Group.

Cloud Security Alliance. (2016). The Treacherous 12: Cloud Computing Top Threats in 2016. Cloud Security Alliance: Top Threats Working Group.

Cloud Security Alliance. (2017). The Treacherous 12 Top Threats to Cloud Computing. Cloud Security Alliance: Top Threats Working Group.

Dewett, T., & Jones, G. R. (2001). The role of information technology in the organization: a review, model, and assessment. Journal of Management, 27(3), 313-346.

Dhotre, P., Shimpi, S., Suryawanshi, P., & Sanghati, M. (2015). Health Care Analysis Using Hadoop. Internationaljournalofscientific&tech nologyresearch, 4(12), 279r281.

Fox, M., & Vaidyanathan, G. (2016). Impacts of Healthcare Big Data: A Framwork With Legal and Ethical Insights. Issues in Information Systems, 17(3).

Ghani, K. R., Zheng, K., Wei, J. T., & Friedman, C. P. (2014). Harnessing big data for health care and research: are urologists ready? European urology, 66(6), 975-977.

Groves, P., Kayyali, B., Knott, D., & Kuiken, S. V. (2016). The ‘Big Data’ Revolution in Healthcare: Accelerating Value and Innovation.

Guo, S. (2013). Hadoop operations and cluster management cookbook: Packt Publishing Ltd.

Gupta, R., Gupta, H., & Mohania, M. (2012). Cloud Computing and Big Data Analytics: What is New From Databases Perspective? Paper presented at the International Conference on Big Data Analytics, Springer-Verlag Berlin Heidelberg.

Henderson, J. C., & Venkatraman, H. (1999). Strategic alignment: Leveraging information technology for transforming organizations. IBM systems journal, 38(2.3), 472-484.

HIPAA. (2018a). At Least 3.14 Million Healthcare Records Were Exposed in Q2, 2018. Retrieved 11/22/2018 from https://www.hipaajournal.com/q2-2018-healthcare-data-breach-report/.

HIPAA. (2018b). How to Defend Against Insider Threats in Healthcare. Retrieved 8/22/2018 from https://www.hipaajournal.com/category/healthcare-cybersecurity/.

HIPAA. (2018c). Q3 Healthcare Data Breach Report: 4.39 Million Records Exposed in 117 Breaches. Retrieved 11/22/2018 from https://www.hipaajournal.com/q3-healthcare-data-breach-report-4-39-million-records-exposed-in-117-breaches/.

HIPAA. (2018d). Report: Healthcare Data Breaches in Q1, 2018. Retrieved 5/15/2018 from https://www.hipaajournal.com/report-healthcare-data-breaches-in-q1-2018/.

Hu, H., Wen, Y., Chua, T., & Li, X. (2014). Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. Practical Innovation, Open Solution, 2, 652-687. doi:10.1109/ACCESS.2014.2332453

InformationBuilders. (2018). Data In Motion – Big Data Analytics in Healthcare. Retrieved from http://docs.media.bitpipe.com/io_10x/io_109369/item_674791/datainmotionbigdataanalytics.pdf, White Paper.

Ji, Z., Ganchev, I., O’Droma, M., Zhang, X., & Zhang, X. (2014). A cloud-based X73 ubiquitous mobile healthcare system: design and implementation. The Scientific World Journal, 2014.

Kritikos, K., Kirkham, T., Kryza, B., & Massonet, P. (2017). Towards a Security-Enhanced PaaS Platform for Multi-Cloud Applications. Future Generation computer systems, 67, 206-226. doi:10.1016/j.future.2016.10.008

Liang, Y., & Kelemen, A. (2016). Big Data Science and its Applications in Health and Medical Research: Challenges and Opportunities. Austin Journal of Biometrics & Biostatistics, 7(3).

Luo, J., Wu, M., Gopukumar, D., & Zhao, Y. (2016). Big data application in biomedical research and health care: a literature review. Biomedical informatics insights, 8, BII. S31559.

Malik, L., & Sangwan, S. (2015). MapReduce Framework Implementation on the Prescriptive Analytics of Health Industry. International Journal of Computer Science and Mobile Computing, ISSN, 675-688.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute.

McKelvey, N., Curran, K., Gordon, B., Devlin, E., & Johnston, K. (2015). Cloud Computing and Security in the Future Guide to Security Assurance for Cloud Computing (pp. 95-108): Springer.

Mehmood, A., Natgunanathan, I., Xiang, Y., Hua, G., & Guo, S. (2016). Protection of Big Data Privacy. Institute of Electrical and Electronic Engineers, 4, 1821-1834. doi:10.1109/ACCESS.2016.2558446

Palanisamy, V., & Thirunavukarasu, R. (2017). Implications of Big Data Analytics in developing Healthcare Frameworks–A review. Journal of King Saud University-Computer and Information Sciences.

Pearlson, K., & Saunders, C. (2001). Managing and Using Information Systems: A Strategic Approach. 2001: USA: John Wiley & Sons.

Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2(1), 1.

Regola, N., & Chawla, N. (2013). Storing and Using Health Data in a Virtual Private Cloud. Journal of medical Internet research, 15(3), 1-12. doi:10.2196/jmir.2076

Ryssel, R., Ritter, T., & Georg Gemünden, H. (2004). The impact of information technology deployment on trust, commitment and value creation in business relationships. Journal of business & industrial marketing, 19(3), 197-207.

Stewart, J., Chapple, M., & Gibson, D. (2015). ISC Official Study Guide. CISSP Security Professional Official Study Guide (7th ed.): Wiley.

Sun, J., & Reddy, C. (2013). Big Data Analytics for Healthcare. Retrieved from https://www.siam.org/meetings/sdm13/sun.pdf.

Thompson, E. C. (2017). Building a HIPAA-Compliant Cybersecurity Program, Using NIST 800-30 and CSF to Secure Protected Health Information.

Van-Dai, T., Chuan-Ming, L., & Nkabinde, G. W. (2016, 5-7 July 2016). Big data stream computing in healthcare real-time analytics. Paper presented at the 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA).

Wang, Y., Kung, L. A., & Byrd, T. A. (2018). Big Data Analytics: Understanding its Capabilities and Potential Benefits for Healthcare Organizations. Technological Forecasting and Social Change, 126, 3-13. doi:10.1016/j.techfore.2015.12.019

Wicklund, E. (2014). ‘Silo’ one of healthcare’s biggest flaws. Retrieved from http://www.healthcareitnews.com/news/silo-one-healthcares-biggest-flaws.

Zhang, R., & Liu, L. (2010). Security models and requirements for healthcare application clouds. Paper presented at the Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference on.

Zia, U. A., & Khan, N. (2017). An Analysis of Big Data Approaches in Healthcare Sector. International Journal of Technical Research & Science, 2(4), 254-264.

Zimmer, T. (2018). What Are the Advantages of Information Technology in Business?

Critical Information Technology Solutions Used to Gain Competitive Advantages

Dr. O. Aly
Computer Science

Abstract

The purpose of this project is to discuss critical information technology solutions used to gain competitive advantages. The discussion begins with Big Data and Big Data Analytics addressing essential topics such as the Hadoop ecosystem, NoSQL databases, Spark integration for real-time data processing, and Big Data Visualization. Cloud computing is an emerging technology to solve Big Data challenges such as storage for the large volume of the data, and the high-speed data processing to extract value from data. Enterprise Resource Planning (ERP) is a system that can aid organizations to gain competitive advantages if implemented right. The project discusses various success factor for the ERP system. Big Data plays a significant role in ERP, which is also discussed in this project. The last technology addressed in this project is the Customer Relationship Management (CRM), its building blocks and integration. The project addresses the challenges and costs associated with CRM. The best practice of CRM is addressed which can assist in the successful implementation of CRM. In summary, enterprises should evaluate various information technology systems that are developed to aid them to gain competitive advantages.

Keywords: Big Data Analytics; Cloud Computing; ERP; CRM.

Introduction

Enterprises should evaluate various information technologies to gain competitive advantages in the market. Big Data and Big Data Analytics are one of the significant topics in information technology and computer science. Cloud computing is another critical topic in the same domains, as cloud computing emerged to solve the challenge of Big Data. Thus, this project begins with these top information technologies. The discussion covers various major topics in Big Data such as the Hadoop ecosystem, Spark for real-time processing. The discussion of the cloud computing covers the various service models and deployment models which cloud computing offers.

The most common business areas that require information technology support include Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), Product Life Cycle Management (PLM), Supply Chain Management (SCM), and Supplier Relationship Management (SRM) (DuttaRoy, 2016). Thus, this project discusses ERP and CRM as additional critical information technology systems that aid Enterprises gain competitive advantages.

Big Data and Big Data Analytics

Big Data is now the buzzword in the field of computer science and information technology. Big Data attracted the attention of various sectors, researchers, academia, government and even the media (Géczy, 2014; Kaisler, Armour, Espinosa, & Money, 2013). In the 2011 report of the International Data Corporation (IDC), it is reporting that the amount of the information which will be created and replicated will exceed 1.8 zettabytes which are 1.8 trillion gigabytes in 2011. This amount of information is growing by a factor of 9 in just five years (Gantz & Reinsel, 2011).

BD and BDA are terms that have been used interchangeably and described as the next frontier for innovation, competitions, and productivity (Maltby, 2011; Manyika et al., 2011). BD has a multi-V model with unique characteristics, such as volume referring to the large dataset, velocity refers to the speed of the computation as well as data generation, and variety referring to the various data types such as semi-structured and unstructured (Assunção, Calheiros, Bianchi, Netto, & Buyya, 2015; Hu, Wen, Chua, & Li, 2014). BD is described as the next frontier for competition, innovation, and productivity. Various industries have taken this opportunity and applied BD and BDA in their business models (Manyika et al., 2011). There are many technologies such as Cloud Computing, Hadoop Map/Reduce Hive, and others have emerged to deal with the phenomena of the Big Data. Data without analysis has no value to organizations.

Hadoop Ecosystem

While the velocity of BD leads to the speed of generating large volume of data and requires speed in data processing (Hu et al., 2014), the variety of the data requires specific technology capabilities to handle various types of dataset such as structured, semi-structured, and unstructured data (Bansal, Deshpande, Ghare, Dhikale, & Bodkhe, 2014; Hu et al., 2014). Hadoop ecosystem is found to be the most appropriate system that is required to implement BDA (Bansal et al., 2014; Dhotre, Shimpi, Suryawanshi, & Sanghati, 2015). Hadoop technologies have been in the front-runner for Big Data application (Bansal et al., 2014; Chrimes, Zamani, Moa, & Kuo, 2018). Hadoop ecosystem will be part of the implementation requirement as it is proven to serve well with intensive computation using large datasets (Raghupathi & Raghupathi, 2014; Wang, Kung, & Byrd, 2018). The Hadoop version that is required is version 2.x to include YARN for resource management (Karanth, 2014). Hadoop 2.x also include HDFS snapshots to provide a read-only image of the entire or a particular subset of a filesystem to protect against user errors, backup, and disaster recovery (Karanth, 2014). The Hadoop platform can be implemented to gain more insight into various areas (Raghupathi & Raghupathi, 2014; Wang et al., 2018). Hadoop ecosystem involves Hadoop Distributed File System, MapReduce, and NoSQL database such as HBase, and Hive to handle a large volume of dataset using various algorithms and machine learning to extract values from the medical records that are structured, semi-structured, and unstructured (Raghupathi & Raghupathi, 2014; Wang et al., 2018). Other components to support Hadoop ecosystem include Oozie for workflow, Pig for scripting, and Mahout for machine learning which is part of the artificial intelligence (AI) (Ankam, 2016; Karanth, 2014). Hadoop ecosystem includes other tools such as Flume for log collector, Sqoop for data exchange, and Zookeeper for coordination (Ankam, 2016; Karanth, 2014). HCatalog is a required component to manage the metadata in Hadoop (Ankam, 2016; Karanth, 2014). Figure 1 shows the Hadoop ecosystem before integrating Spark for real-time analytics.

Figure 1. Hadoop Architecture Overview (Alguliyev & Imamverdiyev, 2014).

NoSQL Databases

In the age of BD and BDA, the traditional data store is found inadequate to handle not only the large volume of the dataset but also the various types of the data format such as unstructured and semi-structured (Hu et al., 2014). Thus, Not Only SQL (NoSQL) database is emerged to meet the requirement of the BDA. These NoSQL data stores are used for modern, and scalable databases (Sahafizadeh & Nematbakhsh, 2015). The scalability feature of the NoSQL data stores enables the systems to increase the throughput when the demand increases during the processing of the data (Sahafizadeh & Nematbakhsh, 2015). The platform can incorporate two scalability types to support the large volume of the datasets; the horizontal and vertical scalability. The horizontal scaling allows the distribution of the workload across many servers and nodes to increase the throughput, while the vertical scaling requires more processors, more memories and faster hardware to be installed on a single server (Sahafizadeh & Nematbakhsh, 2015).

NoSQL data stores have various types such as MongoDB, CouchDB, Redis, Voldemort, Cassandra, Big Table, Riak, HBase, Hypertable, ZooKeeper, Vertica, Neo4j, db4o, and DynamoDB. These data stores are categorized into four types: document-oriented, column-oriented or column-family stores, graph database, and key-value (EMC, 2015; Hashem et al., 2015). The document-oriented data store can store and retrieve collections of data and documents using complex data forms in various formats such as XML and JSON as well as PDF and MS word (EMC, 2015; Hashem et al., 2015). MongoDB and CouchDB are examples of document-oriented data stores (EMC, 2015; Hashem et al., 2015). The column-oriented data store can store the content in columns aside from rows with the attributes of the columns stored contiguously (Hashem et al., 2015). This type of datastore can store and render blog entries, tags, and feedback (Hashem et al., 2015). Cassandra, DynamoDB, and HBase are examples of column-oriented data stores (EMC, 2015; Hashem et al., 2015). The key-value can store and scale large volumes of data and contains value and a key to access the value (EMC, 2015; Hashem et al., 2015). The value can be complicated, but this type of data stores can be useful in storing the user’s login ID as the key referencing the value of patients. Redis and Riak are examples of the key-value NoSQL data store (Alexandru, Alexandru, Coardos, & Tudora, 2016). Each of these NoSQL data stores has its limitations and advantages. The graph NoSQL database can store and represent data using graph models with nodes, edges, and properties related to one another through relations which will be useful for unstructured medical data such as images, and lab results. Neo4j is an example of this type of graph NoSQL database (Hashem et al., 2015). Figure 2 summarizes these NoSQL data stores, data types for storage, and examples.

Figure 2. Big Data Analytics NoSQL Data Store Types.

Spark Integration for Real-Time Data Processing

While the architecture of Hadoop ecosystem has been designed in various scenarios for data storage, data management statistical analysis, and statistical association between various data sources distributed computing and batch processing, businesses requires real-time data processing to gain competitive advantages. However, the real-time data processes cannot be met by Hadoop alone (Basu, 2014). Real-time analytics will tremendous value to the healthcare proposed system. Thus, Apache Spark is another component which is required for real-time data processing. Spark allows in-memory processing for fast response time, bypassing MapReduce operations (Basu, 2014). With Spark integration with Hadoop, stream processing, machine learning, interactive analytics, and data integration will be possible (Scott, 2015). Spark will run on top of Hadoop to benefit from YARN and the underlying storage of HDFS, HBase and other Hadoop ecosystem building blocks (Scott, 2015). Figure 3 shows the core engines of the Spark.

Figure 3. Spark Core Engines (Scott, 2015).

Big Data Visualization

Visualization is one of the most powerful presentations of the data (Jayasingh, Patra, & Mahesh, 2016). It helps in viewing the data in a more meaningful way in the form of graphs, images, pie charts that can be understood easily. It helps in synthesizing a large volume of data set such as healthcare data to get at the core of such raw big data and convey the key points from the data for insight (Meyer, M., 2018). Some of the commercial visualization tools include Tableau, Spotfire, QlikView, and Adobe Illustrator. However, the most commonly used visualization tools in healthcare include Tableau, PowerBI, and QlikView.

Cloud Computing Technology

Numerous studies discussed and addressed the definition of cloud computing, as it was not well defined (Foster, Zhao, Raicu, & Lu, 2008). As an effort to identify precisely the term cloud computing IT practitioners, the academics and research community came up with various definitions. (Vaquero, Rodero-Merino, Caceres, & Lindner, 2008) suggested twenty-two definitions to cloud computing from different research studies. The underlying concepts of cloud computing rely heavily on providing computing power, storage services, software services, and platform services on demand to customers over the internet (Lewis, 2010). The access to cloud computing services can scale up or down as needed, and the consumers use the pay-per-use or pay-as-you-go model (Armbrust et al., 2009; Lewis, 2010).

The National Institute of Standards and Technology (NIST) proposed an official definition of cloud computing. Cloud computing enables ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources such as network, servers, storage, applications, and services. Organizations can quickly provision and release these resources with minimal effort of management or interaction from a service provider (Mell & Grance, 2011).

Cloud Computing Essential Characteristics

The essential characteristics of cloud computing technology identified by NIST include on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service (Mell & Grance, 2011). The on-demand self-service feature provides cloud consumers the computing capabilities such as server time and network storage as needed automatically eliminating the need for any human interaction with a service provider. The broad network access feature provides capabilities to cloud consumers over the network and the use of various devices such as mobile phones, and tablets from anywhere enabling the heterogeneous client platforms. The resource pooling feature provides a multi-tenant model that serve multiple consumers sharing the pool of resources. This feature provides location independence, where the consumers do not know the exact location of the provided resources. The consumer may be able to specify the location at a higher level of abstraction such as country, state, or datacenter (Mell & Grance, 2011). The rapid elasticity feature provides capabilities to scale horizontally and vertically to meet the demand. The measured services feature enables the measurement of the consumption of resources such as processing, storage, and bandwidth. The resource utilization can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized services (Mell & Grance, 2011).

Cloud Computing Three Essential Service Models

Cloud computing offers three essential service models as Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS) (Mell & Grance, 2011). The IaaS layer provides the capability to the consumers to provision storage, processing, networks, and other fundamental computing resources. Using IaaS, the consumer can deploy and run arbitrary software, which can include operating systems and application. When using IaaS, the users do not manage or control the underlying infrastructure of the cloud. The consumers have control over the storage, the operating systems, and the deployed application and limited control of some networking components such as host firewall. The PaaS allows the cloud computing consumers to deploy applications that are created using programming languages, libraries, services, and tools supported by the providers. Using PaaS, the cloud computing consumers do not manage or control the underlying infrastructure of the cloud including network, servers, operating systems, or storage. The consumers have control over the deployed applications and possibly configuration settings for the application-hosting environment. The SaaS allows cloud computing consumers to use the provider’s applications running on the infrastructure of the cloud. The SaaS service model consumers can access the applications from various client devices through either a thin client interface, such as a web-based email from a web browser, or a program interface. The SaaS consumers do not control or manage the underlying infrastructure of the cloud such as network, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings (Mell & Grance, 2011).

Cloud Computing Four Essential Deployment Models

Cloud computing offers four essential deployment models known as public cloud, private cloud, community cloud, and hybrid cloud (Mell & Grance, 2011). The public cloud reflects the infrastructure of the cloud available to the general public. It can be managed, owned and operated by organizations, academic entities, government entities, or a combination of them. This deployment model resides on the premises of the cloud provider. The private cloud is the cloud infrastructure designed exclusively for a single organization. This deployment model can be managed, owned and operated by the organization, or a third party or a combination of both. This model may reside either on-premises or off-premises. The community cloud is the cloud infrastructure designed exclusively for a specific community of consumers from organizations that have such as security requirement, compliance consideration, and policy. One or more of organizations in the community, a third party or some combination of them can manage, own, operate the community cloud. The community cloud can reside on-premises or off-premises. The hybrid cloud is the cloud infrastructure combining two or more cloud infrastructures such as private, public, or community (Mell & Grance, 2011). Figure 4 presents the full representation of cloud computing technology per NIST including the standard service models, deployment models, and essential characteristics.

Figure 4. Overview of Cloud Computing based on NIST’s Definitions.

Cloud Computing Role in Big Data and Big Data Analytics

Cloud computing plays a significant role in BDA (Assunção et al., 2015). The massive computation and storage requirement of BDA brings the critical need for cloud computing emerging technology (Mehmood, Natgunanathan, Xiang, Hua, & Guo, 2016). Cloud computing offers various benefits such as cost reduction, elasticity, pay per use, availability, reliability, and maintainability (Gupta, Gupta, & Mohania, 2012; Kritikos, Kirkham, Kryza, & Massonet, 2017). However, although cloud computing offers various benefits, it has security and privacy issues using the standard deployment models of public cloud, private cloud, hybrid cloud, and community cloud.

Enterprise Resource Planning (ERP)

American Production and Inventory Control Society (2001), as cited in (Madanhire & Mbohwa, 2016) defined ERP as a method for the effective planning and controlling of all resources needed to take, make, ship and account for customer orders in a manufacturing, distribution or service organization. This functions integration can be achieved through a software package solution offered by vendors to support the seamless integration of all information flowing through the enterprise, such as financial, accounting and human resources. ERP is a business management software that is designed to integrate data sources and processes of the entire organization into a combined system (Bahssas, AlBar, & Hoque, 2015).

ERP system is a popular solution which is used by the organization to integrate and automate various processes, performance improvements, and cost reduction. ERP provides business with a real-time view of its core business processes such as production, planning, manufacturing, inventory management and development (Bahssas et al., 2015). The ERP software is a multi-module application that integrates activities across functional departments such as production, planning, purchasing, inventory control, product distribution, and order tracking. It allows the automation and integration of business process by enabling data and information sharing to reach best practices in managing the process of the business.

ERP involves various modules such as accounting, finance, supply chain, human resources, customer information and others (Bahssas et al., 2015; Madanhire & Mbohwa, 2016). ERP production planning module is used to optimize the utilization of manufacturing capacity, parts, components, and material resources. ERP purchases module is used to streamline procurement of required raw materials, as it automates the process of identifying potential suppliers, negotiating prices, placing orders to suppliers and related billing processes. ERP inventory control module is used to facilitate the process of maintaining an appropriate level of stocks in the warehouse through identifying inventory requirements, setting targets, providing replenishment techniques and options, monitoring item usage, reconciling inventory balances and reporting inventory status. ERP sales module is used for order placement, order scheduling, shipping and invoicing. ERP marketing module is used to support lead generation, direct mailing campaign. ERP financial module is used to gather financial data from various departments and generate reports such as balance sheet, general ledger, trial balance. ERP human resources (HR) module is used to maintain a complete employee database to include contact information, salary details, attendance and so forth (Madanhire & Mbohwa, 2016).

Innovations in technology trends have forced ERP designers to establish new development. Thus, new ERP system designs are implemented to satisfy organizations and customers by evolving new ERP business models. Furthermore, one of the biggest challenges for ERP is to keep speed with the manufacturing sector which has been moving rapidly from product-centric to customer-centric focus (Bahssas et al., 2015). Most ERP vendors are required to add a variety of functions and modules to their core systems.

Critical Factors for Successful ERP Implementation

The implementation of ERP systems is costly, and organizations should be careful when implementing it to ensure its success. Some believe that ERP systems could hurt their business because of the potential problems of ERP (Umble, Haft, & Umble, 2003). Various studies identified success factors for ERP. (Umble et al., 2003) addressed the most prominent factors for successful implementation of ERP. The first critical success factor is that organizations should have a clear understanding of the strategic goals. The commitment by top management is another success factor. Successful ERP implementation requires excellent project management. The existing organizational structure and processes found in most enterprises are not compatible with the structure, tools, and types of information provided by ERP systems. Thus, organizational change management is required to ensure the successful implementation of ERP. ERP implementation teams should be composed of highly skilled professionals that are chosen for their skills, past accomplishments, reputation, and flexibility. Data accuracy is another success factor for ERP implementation. The education and training are another success factor for the implementation of the ERP system. (Bahssas et al., 2015) Indicated that reserving 10-15% of the total ERP implementation budget for training will give an organization an 80% chance of successful implementation. Focused performance measures must be included from the beginning of the implementation because if the system is not associated with compensation, it will not be successful.

Big Data and Big Data Analytics Role in ERP

Big Data Analytics plays a significant role in ERP applications (Carlton, 2014; ERP Solutions, 2018; Woodie, 2016). Enterprise data comprises various departments such as HR, finance, CRM and other essential business functions of a business. This data can be leveraged to make ERP functionality better. When Big Data tools are brought together with the ERP system, can unfold valuable insights that can businesses make smarter decisions (Carlton, 2014; Cornell University, 2017; Wailgum, 2018). Many ERP systems fail to make use of real-time inventory and supply chains data because these systems lack the intelligence to make predictions about products demands (Carlton, 2014; ERP Solutions, 2018). Big Data tools can predict demand and help determine what company needs to go forward (ERP Solutions, 2018). Infor co-president Duncan Angove established Dynamic Science Labs (DSL) aiming to use data science techniques to solve a particular class of business problems for its customers. Employees with big data, math, and coding skills were hired in Cambridge, Massachusetts-based organization to develop proof of concept (POC) (Woodie, 2016). Big Data systems such as Apache’s Hadoop are creating node-level operating transparencies which affect nearly every current ERP module in real-time (Carlton, 2014). Managers will be able to quickly leverage ERP Big Data capabilities, thereby enhancing information density and speeding up overall decision-making. In brief, Big Data and Big Data Analytics impact business at all levels, and ERP is no exception.

Customer Relationship Management (CRM)

Customer Relationship Management (CRM) systems assist organizations to manage customer interaction and customer data, automate marketing, sales, and customer support, assess business information and managing partner, vendor, and employee relationships. A quality CRM system can be scalable to serve the needs of small, medium or large business (Financesonline, 2018). CRM systems can be customized to allow business is taking actionable customer insights using back-end analytics, identify opportunities with predictive analytics, personalize customer support, and streamline operations based on the history of the customers’ interaction with the business. Organizations must be aware of the CRM system software available to select the most appropriate CRM system that can better serve their needs.

Various reports identified various CRM systems. The best CRM systems include Salesforce CRM, Hubspot CRM, Fresh sales, Pipedrive, Insightly, Zoho CRM, Nimble, PipelineDeals, Nutshell CRM, Microsoft Dynamics CRM, SalesforceIQ, Spiro, and ExxpertApps. Table 1 shows the best CRM systems available in the market.

Table 1. CRM Systems (Financesonline, 2018).

Customer satisfaction is the critical element to the success of the business (Bygstad, 2003; Pearlson & Saunders, 2001). Businesses need to continuously satisfy customers, understand their needs and expectations, provide high-quality products or service at a competitive price to maintain success. These interactions needed to be tracked by the business and analyzed in an organized way to foster long-lasting customer relationships which get transformed into long-term success.

CRM can aid business increase sales efficiency, drive the satisfaction of customers, streamline the process of the business and make it more efficient, and identify and resolve bottlenecks at any of the operational processes from marketing, sales to the product development (Ahearne, Rapp, Mariadoss, & Ganesan, 2012; Bygstad, 2003). The development of customer relationship is not a trivial or straightforward task. When it is done right, it places the business in a competitive edge. However, the implementation of CRM is challenging.

CRM Challenges and Costs

The implementation of CRM demonstrates the value of customers to the business and placing customer service on top priority (Pearlson & Saunders, 2001). CRM plays a significant role in collaborating the effort between customer service, marketing, and sales in an organization. However, the implementation of CRM is challenging especially for small business and startups. Various reports addressed various challenges when implementing CRM. The cost is the most significant challenges organizations are confronted with when implementing the CRM solution (Sage Software, 2015). The development of a clear objective to achieve with the CRM system is another challenge when implementing CRM. Organizations are confronted with the type of deployment whether it should be on-premise or cloud-based CRM. Other challenges involve the employees’ training, the right CRM solution provider and the integration plan in advance (Sage Software, 2015).

The cost of CRM systems varies from one vendor to another based on the features and deployment key such as data importing, analytics, email integrations, mobile accessibility, email marketing, multi-channel support, SaaS platform, on-premise platform, and SaaS and on-premise. Some vendors offer CRM for small and medium, or small only, while others offer CRM systems for small, medium and large businesses. In a report by (Business-Software, 2019), the cost is categorized for more expensive to least expensive using the dollar sign as $$$$ for most expensive, $$$ for expensive, $$ for less expensive and $ for least expensive. Each vendor CRM system has certain features which must be examined by organizations before making the decision to adopt such a system. Table 2 provides an idea about the cost from the most expensive, expensive, less expensive, to least expensive.

Table 2. CRM System Costs based on the Report by (Business-Software, 2019).

The Building Blocks of CRM Systems and Their Integration

Understanding the buildings blocks of the CRM system can assist in the implementation and integration of CRM systems. CRM involves four core building blocks (Meyer, Matthias & Kolbe, 2005). The acquirement and continuous update of the knowledge base on the needs of customers, motivations, and behavior over the lifetime of the relationship with customers. The application of the customers’ knowledge to continuously improve performance through a process of learning from success and failures is the second building block of CRM system. The integration of marketing, sales, and service activities to achieve a common goal is another building block of the CRM system. The last building block of the CRM system involves the implementation of appropriate systems to support customer knowledge acquisition, sharing, and the measurement of CRM effectiveness.

CRM integration is a critical building block for CRM success (Meyer, Matthias, 2005). The process of integrating CRM involves various organizational and operational functions of the business such as marketing, sales and service activities. CRM requires detailed business processes which can be categorized into three core elements; CRM delivery process, CRM support process, and CRM analysis process. The delivery process involves direct contact with customers to cover part of the customer process such as campaign management, sales management, service management, and complaint management. The support process involves direct contact with the customer that are not designed to fulfill supporting functions within the CRM context such as market research and loyalty management. The analysis process consolidates and analyzes the knowledge of customers collected in other CRM processes. The result of this analysis process is passed to the delivery process, support process and to the service innovation and service production processes to enhance their effectiveness such as customer scoring and lead management, customer profiling and segmentation, feedback and knowledge management.

Best Practices in Implementing These CRM Systems

Various studies and reports addressed best practices in the implementation and integration of CRM systems into the business (Salesforce, 2018; Schiff, 2018). Organizations must choose a CRM that fits their needs. Not every CRM is created equally, and if organizations choose a CRM system without properly researching its features, capabilities, and weaknesses, organizations could end up committed to a system that is not appropriate for the business, and as a result, could lose money. Organizations should decide whether CRM should be cloud-based or on-premise base CRM (Salesforce, 2018; Schiff, 2018; Wailgum, 2008). Organizations should decide whether CRM should be a service contract or one that costs more upfront to install. Business should also decide whether it needs in-depth, highly customizable features, or basic functionality will be sufficient to serve the needs of the business. Organizations should analyze the options and decide on the CRM system that is most appropriate for the business which can serve the needs to build strong customer relationship and gain a competitive edge in the market.

Well-trained personnel and workforce will help organizations achieve its strategic CRM goal. If organizations do not invest in the training of the workforce on how to utilize the CRM system, CRM tools will become useless. The CRM systems become effective as organizations allow them to be. When the workforce is not using the CRM system to its full potentials, or if the workforce is misusing the CRM systems, CRM will not perform its functions properly and will not serve the needs of the business as expected (Salesforce, 2018; Schiff, 2018).

Automation is another critical factor for best practice when implementing CRM systems. Tasks that are associated with data entry can be automated so that CRM systems will be up to date. The automation will increase the efficiency of the CRM systems as well as the business overall (Salesforce, 2018; Schiff, 2018). One of the significant benefits of CRM is its potential in improving and enhancing the cooperative efforts across departments of the business. When the same information is accessible across various departments, CRM systems eliminate confusions that can be caused by using different terms and different information. Data without analysis is not meaningless. Organizations should consider mining the data to get the value that can aid in making sound business decisions. CRM systems are designed to capture and organize massive amounts of data. If organizations do not take advantages of this massive amount of data to turn it into actionable data, the implementation of CRM will be so limited. The best CRM systems are those that come with built-in analytics features which use advanced programming to mine all captured data and use that information to produce valuable conclusions which can be used for future business decisions. When organizations take advantages of the CRM built-in analytical feature and analyze the data that CRM system procures, the valuable information can provide insight for business decisions (Salesforce, 2018). The last element for best practice of the implementation of CRM is for organizations to keep it simple. The best CRM system is the one that will best fit the needs and requirements of the business. The simplicity is a crucial element when implementing CRM. Organizations should implement CRM that is not complex while it is useful and provides everything the business needs. Organizations should also consider making changes to the CRM policies where necessary. The effectiveness of day-to-day operations will be the best indicator of whether the CRM performs as expected, and if it is not, some changes must be made until it performs as expected (Salesforce, 2018; Wailgum, 2008).

Conclusion

This project discussed critical information technology solutions used to gain competitive advantages. The discussion began with Big Data and Big Data Analytics addressing essential topics such as the Hadoop ecosystem, NoSQL databases, Spark integration for real-time data processing, and Big Data Visualization. Cloud computing is an emerging technology to solve Big Data challenges such as storage for the large volume of the data, and the high-speed data processing to extract value from data. Enterprise Resource Planning (ERP) is a system that can aid organizations to gain competitive advantages if implemented right. The project discussed various success factor for the ERP system. Big Data plays a significant role in ERP, which is also discussed in this project. The last technology addressed in this project is the Customer Relationship Management (CRM), its building blocks and integration. The project addressed the challenges and costs associated with CRM. The best practice of CRM is addressed which can assist in the successful implementation of CRM. In summary, enterprises should evaluate various information technology systems that are developed to aid them to gain competitive advantages.

References

Ahearne, M., Rapp, A., Mariadoss, B. J., & Ganesan, S. (2012). Challenges of CRM implementation in business-to-business markets: A contingency perspective. Journal of Personal Selling & Sales Management, 32(1), 117-129.

Alexandru, A., Alexandru, C., Coardos, D., & Tudora, E. (2016). Healthcare, Big Data and Cloud Computing. management, 1, 2.

Alguliyev, R., & Imamverdiyev, Y. (2014). Big data: big promises for information security. Paper presented at the Application of Information and Communication Technologies (AICT), 2014 IEEE 8th International Conference on.

Ankam, V. (2016). Big Data Analytics: Packt Publishing Ltd.

Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R. H., Konwinski, A., . . . Stoica, I. (2009). Above The Clouds: A Berkeley View of Cloud Computing. Electrical Engineering and Computer Sciences University of California at Berkeley.

Bahssas, D. M., AlBar, A. M., & Hoque, M. R. (2015). Enterprise resource planning (ERP) systems: design, trends and deployment. The International Technology Management Review, 5(2), 72-81.

Basu, A. (2014). Real-Time Healthcare Analytics on Apache Hadoop* using Spark* and Shark. Retrieved from https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/big-data-real-time-healthcare-analytics-whitepaper.pdf.

Business-Software. (2019). Top 40 CRM Software Report.

Bygstad, B. (2003). The implementation puzzle of CRM systems in knowledge based organizations. Information Resources Management Journal (IRMJ), 16(4), 33-45.

Carlton, R. (2014). 5 Ways Big Data is Changing ERP Software. Retrieved from https://www.erpfocus.com/five-ways-big-data-is-changing-erp-software-2733.html.

Chrimes, D., Zamani, H., Moa, B., & Kuo, A. (2018). Simulations of Hadoop/MapReduce-Based Platform to Support its Usability of Big Data Analytics in Healthcare.

Cornell University. (2017). Enterprise Information Systems. Retrieved from https://it.cornell.edu/strategic-plan/enterprise-information-systems.

Dhotre, P., Shimpi, S., Suryawanshi, P., & Sanghati, M. (2015). Health Care Analysis Using Hadoop. Internationaljournalofscientific&tech nologyresearch, 4(12), 279r281.

DuttaRoy, S. (2016). SAP Business Analytics: A Best Practices Guide for Implementing Business Analytics Using SAP: Springer.

EMC. (2015). Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. (1st ed.): Wiley.

ERP Solutions. (2018). The Role of Big Data Analytics in ERP Applications. Retrieved from https://erpsolutions.oodles.io/big-data-analytics-in-erp/.

Financesonline. (2018). 15 Best CRM Systems for Your Business. Retrieved from https://financesonline.com/15-best-crm-software-systems-business/.

Foster, I., Zhao, Y., Raicu, I., & Lu, S. (2008). Cloud Computing and Grid Computing 360-Degree Compared. Paper presented at the 2008 Grid Computing Environments Workshop.

Gantz, J., & Reinsel, D. (2011). Extracting Value From Chaos. International Data Corporation, 1142, 1-12.

Géczy, P. (2014). Big data characteristics. The Macrotheme Review, 3(6), 94-104.

Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The Rise of “Big Data” on Cloud Computing: Review and Open Research Issues. Information Systems, 47, 98-115. doi:10.1016/j.is.2014.07.006

Hu, H., Wen, Y., Chua, T., & Li, X. (2014). Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. Practical Innovation, Open Solution, 2, 652-687. doi:10.1109/ACCESS.2014.2332453

Jayasingh, B. B., Patra, M. R., & Mahesh, D. B. (2016, 14-17 Dec. 2016). Security issues and challenges of big data analytics and visualization. Paper presented at the 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I).

Kaisler, S., Armour, F., Espinosa, J. A., & Money, W. (2013). Big Data: Issues and Challenges Moving Forward. Paper presented at the Hawaii International Conference on System Sciences

Karanth, S. (2014). Mastering Hadoop: Packt Publishing Ltd.

Lewis, G. (2010). Basics About Cloud Computing. Software Engineering Institute Carnegie Mellon University, Pittsburgh.

Madanhire, I., & Mbohwa, C. (2016). Enterprise resource planning (ERP) in improving operational efficiency: Case study. Procedia Cirp, 40, 225-229.

Maltby, D. (2011). Big Data Analytics. Paper presented at the Annual Meeting of the Association for Information Science and Technology.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute.

Mehmood, A., Natgunanathan, I., Xiang, Y., Hua, G., & Guo, S. (2016). Protection of Big Data Privacy. Institute of Electrical and Electronic Engineers, 4, 1821-1834. doi:10.1109/ACCESS.2016.2558446

Mell, P., & Grance, T. (2011). The NIST Definition of Cloud Computing. National Institute of Standards and Technology (NIST), 800-145, 1-7.

Meyer, M. (2005). Multidisciplinarity of CRM Integration and its Implications. Paper presented at the System Sciences, 2005. HICSS’05. Proceedings of the 38th Annual Hawaii International Conference on.

Meyer, M. (2018). The Rise of Healthcare Data Visualization.

Meyer, M., & Kolbe, L. M. (2005). Integration of customer relationship management: status quo and implications for research and practice. Journal of strategic marketing, 13(3), 175-198.

Pearlson, K., & Saunders, C. (2001). Managing and Using Information Systems: A Strategic Approach. 2001: USA: John Wiley & Sons.

Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2(1), 1.

Sage Software. (2015). Top Challenges in CRM Implementation.

Sahafizadeh, E., & Nematbakhsh, M. A. (2015). A Survey on Security Issues in Big Data and NoSQL. Int’l J. Advances in Computer Science, 4(4), 2322-5157.

Salesforce. (2018). 7 CRM Best Practices to Get the Most out of your CRM. Retrieved from https://www.salesforce.com/crm/best-practices/.

Schiff, J. L. (2018). 8 CRM implementation best practices.

Scott, J. A. (2015). Getting Started with Spark: MapR Technologies, Inc.

Umble, E. J., Haft, R. R., & Umble, M. M. (2003). Enterprise resource planning: Implementation procedures and critical success factors. European Journal of Operational Research, 146(2), 241-257.

Vaquero, L. M., Rodero-Merino, L., Caceres, J., & Lindner, M. (2008). A Break in the Clouds: Towards a Cloud Definition. Association for Computing Machinery: Computer Communication Review, 39(1), 50-55.

Wailgum, T. (2008). Five Best Practices for Implementing SaaS CRM. Retrieved from https://www.cio.com/article/2435928/customer-relationship-management/five-best-practices-for-implementing-saas-crm.html.

Wailgum, T. (2018). What is CRM? Software for Managing Customer Data. Retrieved from https://www.cio.com/article/2439505/customer-relationship-management/customer-relationship-management-crm-definition-and-solutions.html.

Woodie, A. (2016). Making ERP Better with Big Data. Retrieved from https://www.datanami.com/2016/07/08/making-erp-better-big-data/.

The Impact of Cloud Computing Technology on Information Security Governance Decisions

Dr. O. Aly
Computer Science

Information security plays a significant role in the context of information technology (IT) governance. The critical decisions as part of governance for the information security needs are in the areas of information security strategy, policies, infrastructure, training, and investments for tools. Cloud computing emerging technology provides a new business model for accessing computing infrastructure on a virtualized, scalable, and lower-cost basis. The purpose of this discussion is to address the impact of cloud computing on changing decisions related to information security governance.

Cloud Computing Technology

“Cloud computing and big data are conjoined” (Hashem et al., 2015). This statement can raise the question about the reason for such a relationship. Big Data has been characterized by what is often referred to as a multi-V model such as variety, velocity, volume, veracity, and value (Assunção, Calheiros, Bianchi, Netto, & Buyya, 2015). While variety represents the data types, the velocity reflects the rate at which the data is produced and processed (Assunção et al., 2015). The volume defines the amount of data, and the veracity reflects how much the data can be trusted given the reliability of its source. The value, on the other hand, represents the monetary worth which organizations can derive from adopting Big Data computing. The characteristics of Big Data including the explosive growth rate, challenges and issues came along (Jagadish et al., 2014; Meeker & Hong, 2014; Misra, Sharma, Gulia, & Bana, 2014; Nasser & Tariq, 2015; Zhou, Chawla, Jin, & Williams, 2014). The growth rate is regarded to be a significant challenge for IT researchers and practitioners to design appropriate systems that handle the data effectively, and analyze it to extract relevant meaning for decision-making (Kaisler, Armour, Espinosa, & Money, 2013). Other challenges include data storage, data management and data processing (Fernández et al., 2014; Kaisler et al., 2013); Big Data variety, Big Data integration and cleaning, Big Data reduction, Big Data query and indexing, and Bid Data analysis and mining (Chen et al., 2013).

Traditional systems could not face all these challenges of BD. Cloud computing technology emerged to address these challenges of BD. Cloud computing is regarded as the solution and the answer to BD challenges and issues (Fernández et al., 2014). Organizations and businesses are under pressure to quickly adopt and implement technologies such as cloud computing to address the challenges of the Big Data storage, and processing demands (Hashem et al., 2015). Besides, the increasing demand of the Big Data on networks, storage, and servers outsourcing the data to the cloud may seem to be a practical and useful option and approach when dealing with Big Data (Katal, Wazid, & Goudar, 2013). During the last two decades, this increasing demand for data storage and data security has been growing at a fast pace (Gupta, 2015). Such a demand lead to the emerging cloud computing technology (Gupta, 2015). Issues such as scalability of the Big Data has also pointed towards the cloud computing technology, which can aggregate multiple disparate workloads with varying performance goals into significant clusters in the cloud (Katal et al., 2013).

Various studies provided a different definition to cloud computing. However, the National Institute of Standards and Technology (NIST) proposed an official definition of cloud computing. NIST defined cloud computing as “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., network, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” (page 2) (Mell & Grance, 2011).

Cloud computing technology offers various deployment models of public cloud, private cloud, hybrid cloud, and community cloud. The public cloud is the least secure cloud model (Puthal, Sahoo, Mishra, & Swain, 2015). The private cloud has also been referred by (Armbrust et al., 2009) as internal datacenters, which are not available to the general public. Community cloud supports the specific community with particular concerns such as security requirements, policy and compliance consideration, and mission (Yang & Tate, 2012; Zissis & Lekkas, 2012). It also offers three major service models such as Infrastructure-as-a-Service (IaaS), Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS) (Mell & Grance, 2011).

Cloud computing offers various benefits from technological benefits such as data and storage, APIs, metering and tools, to economic benefits such as pay per use, cost reduction and return on investment, to non-functional benefits such as elasticity, reliability, and availability (Chang, 2015). Despite these benefits, and the increasing trend in the adoption of cloud computing is still not widely used. Security concerns related to virtualization, hardware, network, data, and service providers act as significant obstacles in adopting cloud computing in IT industry (Balasubramanian & Mala, 2015; Kazim & Zhu, 2015). The security and privacy concern has been one of the major obstacle preventing the full adoption of the technology (Shahzad, 2014). (Purcell, 2014) have stated that “The advantages of cloud computing are tempered by two major concerns – security and loss of control.” The uncertainty about security has lead executives to state that security is their number one concern for deploying cloud computing (Hashizume, Rosado, Fernández-medina, & Fernandez, 2013).

Cloud Computing Governance and Data Governance

The enforcement of regulatory laws such as Health and Human Services Health Insurance Portability and Accountability Act (HIPAA) and Sarbanes-Oxley becomes an issue especially when adopting cloud computing (Ali, Khan, & Vasilakos, 2015). Cloud computing fosters security concerns that hamper the fast rate adoption of the cloud computing. Thus, cloud governance and data governance are highly recommended when adopting cloud computing.

Cloud governance is defined as the control and processes that make sure policies are enforced (Saidah & Abdelbaki, 2014). It is a framework applied to all related parties and the business process securely to ensure that the cloud supports the goal of the organization and comply with all required regulations and rules. Cloud governance model should be aligned with the corporate governance and IT governance. It has to comply with the strategy of the organization to accomplish the business goals. Various studies proposed various cloud governance models.

(Saidah & Abdelbaki, 2014) proposed a cloud governance model that provides three models; policy model, operational model, and management model. The policy model invovle data policy, service policy, business process management policy and exit policy. The operational model include authentication, authorization, audit, monitoring, adaptations, medata repository, and asset management. The management model includes policy management, security management, and service management. Figure 1 illustrates the proposed cloud governance model.

Figure 1. The Proposed Cloud Governance Model (Saidah & Abdelbaki, 2014).

(Rebollo, Mellado, & Fernández-Medina, 2013) proposed a security governance framework for cloud computing environment (ISGcloud). The proposed governance framework is founded upon two main standards. It implements the core governance principles of the ISO/IEC 38500 governance standard. The framework proposed a cloud service lifecycle based on the ISO/IEC 27036 outsourcing security draft.

When organizations decide to adopt the cloud computing technology, careful considerations must be made toward the deployment model as well as to the service model to understand the security requirements and the governance strategies (Al-Ruithe, Benkhelifa, & Hameed, 2016). Data governance for cloud computing is not nice to have but is required by rules and regulations to protect the privacy of the users and employees.

The loss of control on the data is the most significant issue when adopting cloud computing because the data is stored on a computer belonging to the cloud provider. This loss of governance and control could have a potentially severe impact on the strategy of the organization, and the capacity to meet its mission and goals (Al-Ruithe et al., 2016). The loss of control and governance of the data can lead to the impossibility of complying with security requirements, a lack of confidentiality, integrity, and availability of data, and a deterioration of performance and quality of services, not to mention the introduction of compliance challenges. Thus, organizations must be aware of the best practice for safeguarding, governing and operating data when adopting cloud computing technology. NIST offers many recommendations when adopting cloud computing technology (Al-Ruithe et al., 2016). The organization should consider data governance strategy before adopting cloud computing. This recommendation demonstrates the importance of data governance for organizations which intend to move their data and services to cloud computing environment as policies, rules, and distribution of responsibilities between cloud actors will have to be set. The development of policies and data governance will assist organizations in monitoring compliance with the current regulations and rules. The primary benefit of data governance when using cloud environment is to ensure security measures, privacy protection and quality of data.

The implementation of data governance for cloud computing changes based on the roles and responsibilities in the internal process of the organization (Al-Ruithe et al., 2016). Thus, organizations are expected to face many issues. The lack of understanding of data governance is one of the major issues. The lack of training and lack of communication plan are additional issues which organizations will face. The lack of support is another obstacle which includes lack of top management support, lack of compliance enforcement and lack of cloud regulation. Lack of policies, process and defined roles in the organization are one of the main obstacles to implement data governance in the cloud. The lack of resources including lack of funding, technology, people, and skills is considered another data governance obstacle.

Conclusion

This discussion addressed cloud computing technology and its relationship with BD and BDA. Cloud computing technology emerged as a solution to the challenges that BD and BDA faced. However, cloud computing is confronted with security and privacy challenges. Executives expressed security as the number one concern for cloud computing adoption. The governance of cloud computing will provide a secure environment to protect data from loss or malicious attacks. Organizations are required to comply with the various security and privacy regulations and rules. Organizations under pressure for data protection especially when using cloud computing technology. Thus, they are required to implement the data governance and cloud computing governance framework to ensure such compliance.

References

Al-Ruithe, M., Benkhelifa, E., & Hameed, K. (2016). A Conceptual Framework for Designing Data Governance for Cloud Computing. Procedia Computer Science, 94, 160-167. doi:10.1016/j.procs.2016.08.025

Ali, M., Khan, S. U., & Vasilakos, A. V. (2015). Security in cloud computing: Opportunities and challenges. Information Sciences, 305, 357-383. doi:10.1016/j.ins.2015.01.025

Balasubramanian, V., & Mala, T. (2015). A Review On Various Data Security Issues In Cloud Computing Environment And Its Solutions. Journal of Engineering and Applied Sciences, 10(2).

Chang, V. (2015). A Proposed Framework for Cloud Computing Adoption. International Journal of Organizational and Collective Intelligence, 6(3).

Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S., & Zhou, X. (2013). Big Data Challenge: a Data Management Perspective. Frontiers of Computer Science, 7(2), 157-164. doi:10.1007/s11704-013-3903-7

Fernández, A., Del Río, S., López, V., Bawakid, A., del Jesus, M. J., Benítez, J. M., & Herrera, F. (2014). Big Data with Cloud Computing: An Insight on the Computing Environment, MapReduce, and Programming Frameworks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(5), 380-409. doi:10.1002/widm.1134

Gupta, U. (2015). Survey on Security Issues in File Management in Cloud Computing Environment. Department of Computer Science and Information Systems, Birla Institute of Technology and Science, Pilani.

Hashizume, K., Rosado, D. G., Fernández-medina, E., & Fernandez, E. B. (2013). An analysis of security issues for cloud computing. Journal of internet services and applications, 4(1), 1-13. doi:10.1186/1869-0238-4-5

Jagadish, H. V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R., & Shahabi, C. (2014). Big Data and Its Technical Challenges. Communications of the Association for Computing Machinery, 57(7), 86-94. doi:10.1145/2611567

Kaisler, S., Armour, F., Espinosa, J. A., & Money, W. (2013). Big Data: Issues and Challenges Moving Forward. Paper presented at the Hawaii International Conference on System Sciences

Katal, A., Wazid, M., & Goudar, R. H. (2013). Big Data: Issues, Challenges, Tools and Good Practices. Paper presented at the International Conference on Contemporary Computing.

Kazim, M., & Zhu, S. Y. (2015). A Survey on Top Security Threats in Cloud Computing. International Journal Advanced Computer Science and Application, 6(3), 109-113.

Meeker, W., & Hong, Y. (2014). Reliability Meets Big Data: Opportunities and Challenges. Quality Engineering, 26(1), 102-116. doi:10.1080/08982112.2014.846119

Mell, P., & Grance, T. (2011). The NIST Definition of Cloud Computing. National Institute of Standards and Technology (NIST), 800-145, 1-7.

Misra, A., Sharma, A., Gulia, P., & Bana, A. (2014). Big Data: Challenges and Opportunities. International Journal of Innovative Technology and Exploring Engineering, 4(2).

Nasser, T., & Tariq, R. S. (2015). Big Data Challenges. Journal of Computer Engineering & Information Technology, 9307, 1-10. doi:10.4172/2324

Purcell, B. M. (2014). Big Data Using Cloud Computing. Journal of Technology Research, 5, 1-9.

Puthal, D., Sahoo, B., Mishra, S., & Swain, S. (2015). Cloud Computing Features, Issues, and Challenges: a Big Picture. Paper presented at the Computational Intelligence and Networks (CINE), 2015 International Conference on Computational Intelligence & Networks.

Rebollo, O., Mellado, D., & Fernández-Medina, E. (2013). Introducing a security governance framework for cloud computing. Paper presented at the Proceedings of the 10th International Workshop on Security in Information Systems (WOSIS), Angers, France.

Saidah, A. S., & Abdelbaki, N. (2014). A New Cloud Computing Governance Framework.

Shahzad, F. (2014). State-of-the-art Survey on Cloud Computing Security Challenges, Approaches and Solutions. Procedia Computer Science, 37, 357-362. doi:10.1016/j.procs.2014.08.053

Yang, H., & Tate, M. (2012). A Descriptive Literature Review and Classification of Cloud Computing Research. Communications of the Association for Information Systems, 31(2), 35-60.

Zhou, Z., Chawla, N., Jin, Y., & Williams, G. (2014). Big Data Opportunities and Challenges: Discussions from Data Analytics Perspectives. Institute of Electrical and Electronic Engineers: Computational Intelligence Magazine, 9(4), 62-74.

Zissis, D., & Lekkas, D. (2012). Is Cloud Computing Finally Beginning to Mature? International Journal of Cloud Computing and Services Science, 1(4), 172. doi:10.11591/closer.v1i4.1248

Installation and Configuration of Openstack and AWS

Dr. Aly, O.
Computer Science

Abstract

The purpose of this project was to articulate all the steps for the installation and configuration of OpenStack and Amazon Web Services. The project begins with an overview of OpenStack. It is divided into three main phases. The first Phase discusses and analyzes the differences between the Networking techniques in AWS and OpenStack. Phase 2 discusses the required configurations to deploy the OpenStack Controller. Phase 2 also discusses and analyzes the expansion of OpenStack to include additional node as the Compute node. Phase 3 discusses the issues encountered during the installation and configuration of OpenStack and AWS services. A virtual bridge for the provider network was configured where all VMs traffic reaches the Internet through the external bridge. The floating IP also must be disallowed to avoid dropping the packet when they reach AWS. In this project, OpenStack using the Controller Node and an additional Compute Node is deployed and accessed successfully using Horizon dashboard. Elastic Cloud Compute (EC2) is also installed and configured successfully using the default VPC, the default Security Group and Access Control List.

Keywords: OpenStack, Amazon Web Services (AWS).

Introduction

OpenStack is a result of initiatives from Rackspace and NASA in 2010 because NASA could not store its data in the Public Cloud for security reasons. OpenStack is an open source project which can be utilized by leading vendors to bring AWS-like ability and agility to the private cloud. OpenStack has been growing since its inception in 2010 to include 500 member companies as part of the OpenStack Foundation with platinum and gold members from the largest IT vendors globally. Examples of these platinum members include RedHat, Suse, IBM, Hewlett Packard Enterprise, Ubuntu, AT&T and Rackspace (Armstrong, 2016).

OpenStack provides primarily an Infrastructure-as-s-Service (IaaS) function within the Private cloud, where it makes centralized storage, commodity computes, and networking features available to end users to self-service their needs, through the Horizon dashboard or a set of common APIs. Many organizations are deploying OpenStack in-house to develop their own data centers. The implementation of the OpenStack is less likely to fail when utilizing professional service support from known vendors and can create alternative solutions to Microsoft Azure and AWS. Examples of these professional service vendors include Red Hat, Suse, HP, Canonical, Mirantis, and so forth. They provide different methods of installing the platform (Armstrong, 2016).

The release cycle of the OpenStack is six months during which an upstream release is created. OpenStack Foundation creates the upstream release and governs it. Example of the public cloud deployment of OpenStack includes AT&T, RackSpace, and GoDaddy. Thus, OpenStack is not exclusively used for private cloud. However, OpenStack has been increasingly popular as a Private Cloud alternative to AWS Public Cloud. OpenStack is now widely used for Network Function Virtualization (NFV) (Armstrong, 2016).

OpenStack and AWS utilize different approaches to Networking. This section begins with AWS Networking, followed by OpenStack Networking.

Phase 1: OpenStack Networking vs. AWS Networking

1.1 AWS Networking

Virtual Private Cloud (VPC) is a hybrid cloud comprising of public and private clouds. The VPC is the default setting for new AWS users. The VPC can also be connected to a network of users or the private data center of the organization. The underlying concept of connecting the VPC to the private data center of the organization is the use of the gateway and virtual private cloud gateway (VPG). The VPG is two redundant VPN tunnels, which gets instantiated from the private network of the user or the organization. The gateway of the organization exposes a set of external static addresses from the site of the organization, which are using Network Address Translation-Traversal (NAT-T) to hide the address. The organization can use one gateway device to access multiple VPCs. The VPC provides an isolated view of all provisioned instances. Identity and Access Management (IAM) of AWS is used to set up user account to access the VPC. Figure 1 illustrates an example of the AWS VPC with virtual machines or instances mapped with one or more security groups and connected to different subnets connected to the VPC router (Armstrong, 2016; AWS, 2017).

Figure 1. VPC of AWS showing multiple instances using Security Group.

The networking is simplified by VPC using software and allowing users and organizations to perform a certain set of networking operations such as mapping the subnet, using Domain Name System (DNS), Public and Private IP addresses assignments, security group and access control list application. When organizations create a virtual machine or instance, a default VPC is assigned to it automatically. All VPC comes with a default router which can have additional custom routes and the routing priority to forwarding traffic to specific subnets based on the requirements of the organizations and users. Figure 2 illustrates VPC using Private IP, Public IPs, and the Main Route Table, adapted from (Armstrong, 2016; AWS, 2017).

Figure 2. AWS VPC Configuration Example (AWS, 2017).

With respect to the IP Addressing of AWS, a mandatory private IP is assigned automatically to every virtual machine or instance, also a public IP and DNS entry unless the instance is a dedicated instance. The Private IP is used to route traffic among instances when there is a need for a virtual machine to communicate with another virtual machine that is close to it on the same subnet. The Public IP, on the other hand, are accessible through the Internet. If there is a need for a persistent Public IP address for a virtual machine, the Elastic IP addressed feature is provided by AWS which is limited to five per VPC account only. When using Elastic IP addresses, the IP address can be mapped quickly to another instance in case of a failure of the instance. When using AWS, it can take up to 24 hours for the DNS Time to live (TTL) of a Public IP address to propagate. Moreover, AWS supports a Maximum Transmission Unit (MTU) of 1,500 regarding throughput which can be passed to an instance in AWS. The organization must consider this feature for application performance consideration (Armstrong, 2016; AWS, 2017).

AWS uses Security Groups and Access Control Lists. The SG in AWS is used to group a collection of access control rules with implicit denies. The SG in AWS can be associated with one or more network interfaces of instances. The SG acts as the firewall for the instances. There is a default SG which gets applied automatically if no other security group is specified with the instantiated instance. The default SG allow all outbound traffic and all inbound traffic only from other instances within the same VPC. The default SG group cannot be deleted. With the custom SG, no inbound traffic, but all outbound traffic allowed. The user can add Access Control List (ACL) rules are associated with the SG governing the inbound traffic using AWS console (Armstrong, 2016; AWS, 2017).

The VPC of AWS ha access to different regions and availability zone of shared computer dictating the data center which the instance and virtual machine will be deployed in. The availability zone AZ is an isolated location residing in a region which is a geographic area isolated by design. Thus, AZ can be a subset of a region. Organizations and users can place resources in different locations for redundancy for recovery consideration. AWS supports the use of more than one AV when deploying production workloads on AWS. Moreover, organizations and users can replicate the instances and data across regions (Armstrong, 2016; AWS, 2017).

Elastic Load Balancing (ELB) feature is also offered by AWS, which can be configured within a VPC. The ELB can be external or internal. When the ELB is external, it allows the creation of the internet-facing entry point into the VP using an associated DNS entry and balances load among the instances in the VPC. The SG is assigned to the ELB to control the access to ports which need to be used (Armstrong, 2016; AWS, 2017).

1.2 OpenStack Networking

OpenStack is deployed in a data center on multiple controllers. These controllers contain all services of the OpenStack. These controllers can be installed on virtual machine, bare metal physical servers, or containers. When these controllers get deployed in a production environment, they host all OpenStack services in a high availability and redundancy platform. Different installers to install OpenStack are offered by different OpenStack vendors. Examples of these installers include RedHat Director, Mirantis Fuel, HPs HPE installed, and Juju for Canonical. All these installers install controllers. They are also used to scale out compute nodes on the OpenStack cloud (Armstrong, 2016; OpenStack, 2018b).

With respect to the services of the OpenStack, there are eleven core services which are installed on the OpenStack controlled. These core services include Keystone, Heat, Glance, Cinder, Nova, Horizon, Rabbitmq, Galera, Swift, Ironic and Neutron. Figure 3 summarizes each core service of the OpenStack (OpenStack, 2018a). The Neutron architecture is similar in constructs to AWS regarding Neutron Networking services (Armstrong, 2016; OpenStack, 2018b).

Figure 3. Summary of OpenStack Core Services (OpenStack, 2018a)

In OpenStack, a Project is referred to as a Tenant providing an isolated view of everything which a team has provisioned in the OpenStack cloud. Using the Keystone Identity service, different users can be set up for a Project (Tenant). These accounts can be integrated with LDAP such as Active Directory to support customizable permission model (Armstrong, 2016; OpenStack, 2018b).

The Neutron Service of OpenStack performs all networking related tasks and functions. These functions and tasks include seven major steps. The first step includes the creation of instances or virtual machine mapped to networks. The second step includes the assignment of IP addresses using the built-in DHCP service. The third step includes the application of DNS entries to instances from named servers. The fourth step includes the assignment of Private and Floating IP addressed. The fifth step incluse the creation or the associatoiin of the network subnet, followed by creating the routers. The last step is the application of the Security Groups (Armstrong, 2016; OpenStack, 2018b).

The compute nodes of the OpenStack are deployed using a Hypervisor which uses Open vSwitch. Most vendor distributions of OpenStack provide KVM Hypervisor by default, which gets deployed and configured on each computes node by the OpenStack Installer. The compute nodes in OpenStack are connected to the access layer of the STP 3-tier model. In modern networks, they are connected to the Leave switches, with VLANs connected to each computes node in the OpenStack cloud. The networks of the Tenant are used to provide isolation among tenants and use VXLAN and GRE tunneling to connect the layer two network (Armstrong, 2016; OpenStack, 2018b).

The configuration and setup of simple networking using Neutron in a Project (Tenant) network requires two different networks; an internal network and an external network. The internal network is used for traffic among instances in the Project, where the subnet name and range are specified in the Subnet. The external network is used to make the internal network accessible from outside of the OpenStack. A router is also used in OpenStack to route packets to the network, which will be associated with the networks. The external network needs to be set as the router’s gateway. The last step in the network configuration connects one router to an internal and external network. Instances are provisioned in OpenStack onto the internal Private Network by selecting the Private Network NIC during the deployment of the instance. OpenStack assigns pools of Public IPs known as Floating IP addresses from an external network for instances which need to be externally routable outside of the OpenStack (Armstrong, 2016; OpenStack, 2018b).

OpenStack uses SG like AWS to set up firewall rules between instances. However, OpenStack, unlike AWS, supports both ingress and egress ACL rules, whereas AWS allows all outbound communications. OpenStack can work with both ingress and egress rules. SSH access must be configured as an ACL rule against the parent SG in OpenStack which is pushed down to Open vSwitch into kernel space on each Hypervisor. When the internal and external networks are set up and configured for the Project (Tenant), instances are ready to be launched on the Private network. Users can access the instance from Horizon dashboard (Armstrong, 2016; OpenStack, 2018b).

With respect to regions and availability zones in OpenStack, like AWS, OpenStack uses regions and AZ. The compute nodes in OpenStack (Hypervisors) can be assigned to different AZ, which is a virtual separation of computing resources. The AZ in OpenStack can be segmented into host aggregated. However, a compute node can be assigned to only one AZ in OpenStack, while it can be a part of multiple host aggregates in the same AZ (Armstrong, 2016; OpenStack, 2018b).

OpenStack offers Load-Balancer-as-a-Service (LBaaS) which allows incoming requests to be distributed evenly among the designated instances using a Virtual IP (VIP). Examples of the popular LBaaS plugins in OpenStack include Citrix NetScaler, F5, HaProxy, and Avi networks. The underlying concept of LBaaS on OpenStack is to allow organizations and users to use LBaaS as a broker to the load balancing solutions, using APIs of the OpenStack or using the Horizon dashboard to configure the Load Balancer (Armstrong, 2016; OpenStack, 2018b).

Phase 2: AWS and OpenStack Setup and Configuration

This project deployed OpenStack on AWS and limited to the configuration of the controller node. In the same project, the OpenStack cloud is expanded to add a compute node. The topology for this project is illustrated in Figure 4. Port 9000 will be configured to be accessed from the browser on the client. The Compute Node VM will be using a different IP address than that IP address for the OpenStack Node. A Private Network will be configured using the Vagrant software. NAT interface will be configured and mapped to the Compute Node and the OpenStack Controller Node as illustrated in Figure 4.

Figure 4. This Project’s Topology.

The Controller Node is configured to have one processor, 4 GB memory, and 5 G storage. The Compute Node is configured to have one processor, 2 GB memory, and 10 GB storage. The installation must be performed on a 64bit version of distribution on each node. VirtualBox is used in this project. The Vagrant software is also used in this project. Another software called Sublime Text is installed to configure the Vagrant file and avoid any control characters at the end of each line which can cause problems. The project is using the Pike release.

2.1 Amazon Machine Images (AMI) Elastic Cloud Compute (EC2) AMI Configuration

The project requires AWS account, to select the image which can be used for OpenStack Deployment. Multi-Factor Authentication is implemented to access the account. Amazon Machine Image (AMI) Elastic Compute Cloud (EC2) is selected from the pool of the AMIs for this project. The Free Tier EC2 instance is configured with the default Security Group (SG) and Access Control List (ACL) rules as discussed earlier. EC2 AMI is a template which contains the software configuration such as operating system, application server, and applications required to launch and instantiate the instance. The EC2 AMI is configured to use the default VPC.

2.2 OpenStack Controller Node Configuration

The Controller Node is configured first to use the IP Address identified in the topology. This configuration is implemented using Vagrant software and Vagrant file.

Connect to the controller using the Vagrant software. To start the Controller from Vagrant, execute:
- $vagrant up the controller.
Verify the Controller is running successfully.
- $vagrant status
Verify the NAT address using eth0.
- $ifconfig -a
Verify the Private IP Address using eth1. The IP address shows the same IP address configured in the configuration file.

Access the Controller Node of the OpenStack from the Browser using the Port 9000.

Verify the Hypervisors from Horizon interface.

2.3 OpenStack Compute Node Configuration

The OpenStack Cloud is expanded by adding a Compute Node. The configuration of the compute node is performed using the Vagrant file.

Connect to the computer using Vagrant command. The Compute Node is using node1 as the hostname. To start the Compute Node from Vagrant, execute the following command:
- $vagrant up node1.
Verify the Compute Node is running successfully.
- $vagrant status
Access node1 using SSH.

Check OpenStack Services:
- $sudo systemctl list-units devstack@*

Verify the NAT address using eth0.
- $ifconfig -a
Verify the Private IP Address using eth1. The IP address shows the same IP address configured in the configuration file.

Access the Controller Node of the OpenStack from the Browser using the Port 9000. Verify the Hypervisors from Horizon interface.

Phase 3: Issues Deploying OpenStack on AWS

There are some issues encountered during the deployment of OpenStack on AWS. The issue which impacted EC2 AMI involved the MAC address which must be registered in the AWS network environment. Moreover, the MAC address and the IP address must be mapped together because the packets will not be allowed to flow if the MAC address and the IP address are different.

3.1 Neutron Networking

During the configuration of the OpenStack Neutron Networking, a virtual bridge for the Provider Network is configured where all VMs traffic will reach the Internet through the external bridge which is followed by the actual physical NIC of eth1. Thus, NIC with a special type of configuration will be configured as the external interface as shown in the topology for this project (Figure 4).

3.2 Disable Floating IP

The floating IP must be disabled because it will send the packet through the router’s gateway with the IP address as a floating IP address, which will result in dropping the packets once they reach AWS because they will reach the switch with no registered IP and MAC address. In this project, the NAT is configured to access the public address externally as shown in the topology in Figure 4.

Conclusion

The purpose of this project was to articulate all the steps for the installation and configuration of OpenStack and Amazon Web Services. The project began with an overview of OpenStack. It is divided into three main phases. The first Phase discussed and analyzed the differences between the Networking techniques in AWS and OpenStack. Phase 2 discussed the required configurations to deploy the OpenStack Controller. Phase 2 also discussed and analyzed the expansion of OpenStack to include additional node as the Compute node. Phase 3 discussed the issues encountered during the installation and configuration of OpenStack and AWS services. A virtual bridge for the provider network was configured where all VMs traffic reaches the Internet through the external bridge. The floating IP also must be disallowed to avoid dropping the packet when they reach AWS. In this project, OpenStack using the Controller Node and an additional Compute Node was deployed and accessed successfully using Horizon dashboard. Elastic Cloud Compute (EC2) was also installed and configured successfully using the default VPC, the default Security Group, and Access Control List.

References

Armstrong, S. (2016). DevOps for Networking: Packt Publishing Ltd.

AWS. (2017). Virtual Private Cloud: User Guide. Retrieved from: http://docs.aws.amazon.com/AmazonVPC/latest/UserGuide/vpc-ug.pdf.

OpenStack. (2018a). Introduction to OpenStack. Retrieved from https://docs.openstack.org/security-guide/introduction/introduction-to-openstack.html.

OpenStack. (2018b). OpenStack Overview. Retrieved from https://docs.openstack.org/install-guide/overview.html.

The Use of Cloud Computing Technology in Healthcare Industry

Dr. Aly, O.
Computer Science

Introduction

The purpose of this discussion is to discuss and analyze the use of the Cloud Computing technology in the healthcare industry. It also discusses and analyzes the present issue related to healthcare data in the Cloud, advantages, and disadvantages of having the data into the Public and Private Cloud. The discussion also provides a use case scenario.

Healthcare in Cloud Computing

As indicated in (Chen & Hoang, 2011), the healthcare industry is moving slowly toward the Cloud Computing technology due to the sensitive nature of the healthcare data. There is a fear among healthcare organizations to employ Cloud Computing because of the privacy and security issues which can cause data leak from the Cloud to unauthorized users. Various researchers exerted tremendous effort to propose cloud framework to ensure data protection framework for the healthcare industry.

In (Chen & Hoang, 2011), the researchers proposed a robust data protection framework that is surrounded by a chain of protection schemes from Access Control, Monitoring, to Active Auditing. The proposed framework includes three major models for this chain of protection schemes. The first component of this proposed robust framework includes Cloud-based, Privacy-aware, and Role-based Access Control (CPRBAC) model. The second model includes the Triggerable Data File Structure (TDFS) model. The third component includes the Active Auditing Scheme (AAS). In (Regola & Chawla, 2013), the researchers presented a prototype infrastructure in Amazon’s Virtual Private Cloud to allow researchers and practitioners to utilize the data in a HIPAA-compliant environment. In (Yu, Kollipara, Penmetsa, & Elliadka, 2013), the researchers provided an approach for a distributed storage system using a combination of RDBMS and NoSQL databases to ensure optimal system performance and scalability. These three research studies are examples of the tremendous effort exerted by researchers in the domain of healthcare to ensure security.

Healthcare Use Case

The Healthcare Information System supports clinical and medical activities related to patient care. The system is an integration of several components where each component serves a specific need of a medical system. These components include Radiology Information System (RIS), Picture Archiving and Communication System (PACS), Laboratory Information System (LIS), and Policy and Procedure Management System (PPMS) (Yu et al., 2013).

In (Yu et al., 2013), the researchers focused on the RIS which is a software used to manage the patients and their radiology data such as ultrasound scans, X-rays, CT-scans, audio, and video. The patient activities management include examination scheduling, patient data processing and monitoring, and analysis of the patient records statistics. The radiology data management include the processing of file records, formatting and storing radiology data with a digital signature, and tracking the film records. The RIS deals with very large of unstructured and structured data. The RIS is often used with the PACS and requires very large storage space.

The researchers examined two NoSQL databases for this project: MongoDB and Cassandra. They found that MongoDB is more apt for Healthcare Information Systems. Table 1 summarizes the comparison between MongoDB and Cassandra, adapted from (Yu et al., 2013).

Table 1. Comparison between MongoDB and Cassandra for Healthcare data (Yu et al., 2013).

The RIS Framework in this project included the System Architecture, Cloud Architecture. The System Architecture was deployed in AWS using EC2 (Elastic Compute Cloud), which can be accessed by request from a browser using HTML or a mobile client application. The application server was placed in the Public Cloud. The database was placed in the Private Cloud. When the system requires communication with the Database in the Private, the request must go through various security measures and pass through the security of the Private Cloud and the firewall to connect to the storage server. The request talks to either SQL or NoSQL database based on the data management logic model. The System Architecture is deployed in the Cloud. The Cloud Architecture involved Public Cloud and Private Cloud. The Private Cloud was used to store all sensitive data. The storage server controls the SQL and NoSQL databases along with the security and backup capabilities and functionalities. The NAS server was used as the storage solution to deal with the large volume of the healthcare data (Yu et al., 2013).

Advantages and Disadvantages of healthcare data in the Cloud

The Cloud Computing offer various advantages to several industries, including healthcare industry. The major benefits of using the Cloud Computing technology for healthcare include Scalability, Data Storage, Data Sharing and Data Availability, Reliability and Efficiency, and Cost Reduction (Pullarao & Thirupathi Rao, 2013). The major challenge when using Cloud Computing in the healthcare industry is the security. However, as demonstrated in the above use case, the risks of leaking data from the Cloud to unauthorized users can be mitigated and eliminated by using the Private Cloud which has additional security measures. Public Cloud should never be used for storing sensitive data. In the above use case, the Public Cloud was used only for the application layer, with security measures such as access control and SSL to access the data from the browser.

References

Chen, L., & Hoang, D. B. (2011, 16-18 Nov. 2011). Towards Scalable, Fine-Grained, Intrusion-Tolerant Data Protection Models for Healthcare Cloud. Paper presented at the 2011IEEE 10th International Conference on Trust, Security, and Privacy in Computing and Communications.

Pullarao, K., & Thirupathi Rao, K. (2013). A secure approach for storing and using health data in a private cloud computing environment. International Journal of Advanced Research in Computer Science, 4(9).

Regola, N., & Chawla, N. (2013). Storing and using health data in a virtual private cloud. Journal of medical Internet research, 15(3), e63.

Yu, W. D., Kollipara, M., Penmetsa, R., & Elliadka, S. (2013, 9-12 Oct. 2013). A distributed storage solution for cloud-based e-Healthcare Information System. Paper presented at the 2013 IEEE 15th International Conference on e-Health Networking, Applications, and Services (Healthcom 2013).

Current State of Data Storage for Big Data

Dr. Aly, O.
Computer Science

Introduction

The purpose of this discussion is to discuss and analyze the current state of data storage for big data. The discussion also discusses and analyzes the impact of the Big Data storage on the organizational process.

Big Data and Big Data Analytics Brief Overview

The term Big Data refers to the explosive growth in the volume of the data which are difficult to store, process and analyze. The volume of the Big Data is only one feature. However, the major 3Vs characterized Big Data with volume, variety, and velocity. The variety of the data is reflected by the different types of data collected from sensors, smartphones, or social networks. Thus, the data collected forms additional types of data such as unstructured, and semi-structured besides the structured type. The velocity characteristic of the Big Data reflects the speed of the data transfer, where the content of the data is continuously changing. These three major features characterize the Big Data nature. Big Data is classified even further to include Data Sources, Content Format, Data Stores, Data Staging, and Data Processing. Figure 1 summarizes the Big Data Classifications, adapted from (Hashem et al., 2015).

Figure 1. Big Data Classification. Adapted from (Hashem et al., 2015).

Big Data without Analytics has no value. Big Data Analytics (BDA) is the process of examining large datasets containing a variety of data types such as unstructured, semi-structured and structured. The purpose of the BDA is to uncover hidden patterns, market trends, unknown correlations, customer preferences and other useful business information that can help the organization (Arora & Bahuguna, 2016). BDA has been used in various industries such as healthcare.

Big Data Storage

The explosive growth of the data has challenged the capabilities of the existing storage technologies to store and manage data. Organizations have been utilizing the traditional storage techniques to store data through the structured relational database. However, the Big Data and BDA require distributed storage technology based on the Cloud Computing instead of the local storage attached to a computer or electronic device. Cloud Computing technologies provide a powerful framework which performs complex large-scale computing tasks and span a range of IT functions from storage and computation to database and application services. Organizations and users adopt the Cloud Computing technologies because of the need and requirements to store, process and analyze a large amount of data (Hashem et al., 2015).

Various storage technologies have been emerged to meet the requirements when dealing with large volume of data. These storage technologies include Direct Attached Storage (DAS), Network Attached Storage (NAS), and Storage Area Network (SAN). When using DAS, various hard disk drives are directly connected to the servers. Each hard disk drive receives a certain amount of I/O resource managed by the application. The DAS technology is a good fit for servers that are interconnected on a small scale. The NAS technology provides a storage device which supports a network through a switch or hub via TCP/IP protocols. When using NAS, data is transferred as files. The I/O in the NAS technology is less burden than in DAS because the NAS server can indirectly access a storage device through the networks. NAS technology can orient the networks such as scalable and bandwidth-intensive networks including the high-speed networks of optical-fiber connections. The SAN system of data storage is independent with respect to storage on the local area network. Data management and sharing are maximized by using the multipath data switching which is conducted among internal nodes. The organization data storage system of DAS, NAS, SAN can be divided into three categories: disc array, connection and network sub-systems, and storage management software. The disk array provides the storage system. The connect and network sub-systems provides connection to one or more disc arrays and servers. The storage management software monitors the data sharing, storage management and disaster recovery tasks for multiple servers (Hashem et al., 2015).

When dealing with Big Data and BDA, the storage system is not physically separated from the processing system. There are various storage types such as hard drives, solid-state memory, object storage, optical storage and cloud storage. Each type has advantages as well as limitations. Thus, organizations must examine the goal and the objectives of the data storage first prior selecting any of these storage media. Table 1 shows a comparison of storage media, adapted from (Hashem et al., 2015).

Table 1. Comparison of Storage Media. Adapted from (Hashem et al., 2015).

The Hadoop Distributed File System (HDFS) is a primary component in Hadoop technology, which is emerged to deal with Big Data and BDA. The other major component of Hadoop technology is MapReduce. The Hadoop framework is described to be the de facto standard for Big Data storage and processing (Jinquan, Jie, Shengsheng, Yan, & Yuanhao, 2012). The HDFS is a distributed file system which is designed to run on top of the local file systems of the cluster nodes. It stores extremely large files for streaming purpose. HDFS is highly faulted tolerant and can scale up from a single server to thousands of nodes, where each offers local computation and storage.

The Cloud Computing technology can meet the requirement of the Big Data and BDA offering effective framework and platform for computational purpose as well as for storage purpose. Thus, organizations which tend to take advantage of Big Data and BDA utilize the Cloud Computing technology. However, the use of the Cloud Computing does not come without a price. Security and privacy have been major concerns to Cloud Computing users and organizations. Although Cloud Computing offers several benefits to organizations from scalability, fault tolerance, to data storage, yet, it is curbed by the security and privacy. Organizations must take the appropriate security measures for data in storage, transit, and processing, such as SSL, Encryption, Access Control, Multi-Factor Authentication and so forth.

In summary, Big Data comes with Big Storage requirement. Organizations have been facing various challenges when dealing with Big Data, such as data storage and data processing. Data storage issue is partially solved by Cloud Computing technology. However, until the security and privacy issues are resolved in the Cloud Computing platform, organizations must apply robust security measures to mitigate and alleviate the security risks.

References

Arora, M., & Bahuguna, H. (2016). Big Data Security–The Big Challenge.

Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The rise of “big data” on cloud computing: Review and open research issues. Information Systems, 47, 98-115.

Jinquan, D., Jie, H., Shengsheng, H., Yan, L., & Yuanhao, S. (2012). The Hadoop Stack: New Paradigm for Big Data Storage and Processing. Intel Technology Journal, 16(4), 92-110.

Building Blocks of a System for Healthcare Big Data Analytics

Dr. Aly, O.
Computer Science

Introduction

The purpose of this discussion is to create the building blocks of a system for healthcare Big Data Analytics and compare the building block design to a DNA networked cluster currently used by an organization in the current market.

The discussion begins with the Cloud Computing Building Blocks, followed by Big Data Analytics Building Blocks, and DNA Sequencing. The discussion also addresses the building blocks for the health analytics and the building blocks for DNA Sequencing System, and the comparison between both systems.

Cloud Computing Building Blocks

The Cloud Computing model contains two elements: the front end and the back end. Both elements are connected to the network. The user interacts with the system using the front end, while the cloud itself is the back end. The front end is the client which the user uses to access the cloud through a device such as a smartphone, tablet, and laptops. The backend represented by the Cloud provides applications, computers, servers and data storage which creates the services (IBM, 2012).

As indicated in (Macias & Thomas, 2011), three building blocks are required to enable Cloud Computing. The first block is the “Infrastructure,” where the organization can optimize data center consolidation, enhance network performance, connect anyone, anywhere seamlessly, and implement pre-configured solutions. The second block is the “Applications,” where the organization can identify applications for rapid deployment, and utilize automation and orchestration features. The third block is the “Services,” where the organization can determine the right implementation model, and create a phased cloud migration plan.

In (Mousannif, Khalil, & Kotsis, 2013-14), the building blocks for the Cloud Computing involve the physical layer, the virtualization layer, and the service layer. Virtualization is a basic building block in Cloud Computing. Virtualization is the technology which hides the physical characteristics of the computing platform from the front end users. Virtualization provides an abstract and emulated computing platform. The clusters and grids are features and characteristics in Cloud Computing for high-performance computing applications such as simulations. Other building blocks of the Cloud Computing include Service-Oriented Architectures (SOA) and Web Services (Mousannif et al., 2013-14).

Big Data Building Block

As indicated in (Verhaeghe, n.d.), there are four major building blocks for Big Data Analytics. The first building block is Big Data Management to enable organization capture, store and protect the data. The second building block for the Big Data is the Big Data Analytics to extract value from the data. Big Data Integration is the third building block to ensure the application of governance over the data. The last building block in Big Data is the Big Data Applications for the organization to apply the first three building blocks using the Big Data technologies.

DNA Sequencing

DNA stands for Deoxyribonucleic Acid which represents the smallest building block of life (Matthews, 2016). As indicated in (Salzberg, 1999), advances in biotechnology have produced enormous volumes of DNA-related information. However, the rate of data generation is outpacing the ability of the scientists to analyze the data. DNA Sequencing is a technique used to determine the order of the four chemical building blocks, called “bases,” which make up the DNA molecule (genome.gov, 2015). The sequence provides the kind of genetic information which is carried in a particular DNA segment. DNA sequencing can provide valuable information about the role of inheritance in susceptibility to disease and response to the influence of environment. Moreover, DNA sequencing provides rapid and cost-effective diagnosis and treatments. Markov chains and hidden Markov models are probabilistic techniques which can be used to analyze the result of the DNA sequencing (Han, Pei, & Kamber, 2011). Example of the DNA Sequencing application is discussed and analyzed in (Leung et al., 2011), where the researchers employed Data Mining on DNA Sequences biological data sets for Hepatitis B Virus.

DNA Sequencing was performed on non-networked computers, using a limited subset of data due to the limited computer processing speed (Matthews, 2016). However, DNA Sequencing has been experiencing various advanced technologies and techniques. Predictive Analytic is an example of these techniques which are applied to DNA Sequencing resulting Predictive Genomics. Cloud Computing plays a significant role in the success of the Predictive Genomics for two major reasons. The first reason is the volume of the genomic data, while the second reason is the low cost (Matthews, 2016). Cloud Computing is becoming a valuable tool for various domains including the DNA Sequencing. As cited in (Blaisdell, 2017), the study of the Transparency Market Research showed that the healthcare Cloud Computing market is going to evolve further, reaching up to $6.8 Billion by 2018.

Building Block for Healthcare System

Healthcare data requires protection due to the security and privacy concerns. Thus, Private Cloud will be used in this use case. To build a Private Cloud, the virtualization layer, the physical layer, and the service layer are required. The virtualization layer consists a hypervisor to allow multiple operating systems to share a single hardware system. The hypervisor is a program which controls the host processors and resources by allocating the resources to each operating system. Two types of hypervisors: native and also called bare-metal or type 1 and hosted also called type 2. Type 1 runs directly on the physical hardware while Type 2 runs on a host operating system which runs on the physical hardware. Examples of the native hypervisor include VMware’s ESXi, Microsoft’s Hyper-V. Example of the hosted hypervisor includes Oracle VirtualBox and VMware’s Workstation. The physical layer can consist of two computer pools one for PC and the other for the server (Mousannif et al., 2013-14).

In (Archenaa & Anita, 2015), the researchers illustrated the secure Healthcare Analytic System. The Electronic health record is a heterogeneous dataset which is given as input to HDFS through Flume and Sqoop. The analysis of the data is performed using MapReduce and Hive by implementing Machine Learning algorithm to analyze the similar pattern of data, and to predict the risk for patient health condition at an early stage. HBase database is used for storing the multi-structured data. STORM is used to perform live streaming and any emergency conditions such as patient temperature rate falling beyond the expected level. Lambda function is also used in this healthcare system. The final component of a building block in Healthcare system involves the reports generated by the top layer tools such as “Hunk.” Figure 1 illustrates the Healthcare System, adapted from

Figure 1. Healthcare Analytics System. Adapted from (Archenaa & Anita, 2015)

Building Block for DNA and Next Generation Sequencing System

Besides the DNA Sequencing, there is a next-generation sequencing (NGS) which is increasing exponentially since 2007 (Bhuvaneshwar et al., 2015). In (Bhuvaneshwar et al., 2015), the Globus Genomic System is proposed as an enhanced Galaxy workflow system made available as a service offering users the capability to process and transfer data easily, reliably and quickly. This system addresses the end-to-end NGS analysis requirements and is implemented using Amazon Cloud Computing Infrastructure. Figure 2 illustrates the framework for the Globus Genomic System taking into account the security measures for protecting the data. Examples of healthcare organizations which are using Genomic Sequencing include Kaiser Permanente in Northern California, and Geisinger Health System in Pennsylvania (Khoury & Feero, 2017).

Figure 2. Globus Genomics System for Next Generation Sequencing (NGS). Adapted from (Bhuvaneshwar et al., 2015).

In summary, Cloud Computing has reshaped the healthcare industry in many aspects. Healthcare Cloud Computing and Analytics provide many benefits from the easy access to the electronic patient records to DNA Sequencing and NGS. The building blocks of the Cloud Computing must be implemented with care for security and privacy consideration to protect the patients’ data from unauthorized users. The building blocks for Healthcare Analytics system involves advanced technologies such as Hadoop, MapReduce, STORM, Flume as illustrated in Figure 1. The building blocks for DNA Sequencing and NGS System involves Dynamic Worker Pool, HTCondor, Shared File System, Elastic Provisioner, Globus Transfer and Nexus, and Galaxy as illustrated in Figure 2. Each system has the required building blocks to perform the analytics tasks.

References

Archenaa, J., & Anita, E. M. (2015). A survey of big data analytics in healthcare and government. Procedia Computer Science, 50, 408-413.

Bhuvaneshwar, K., Sulakhe, D., Gauba, R., Rodriguez, A., Madduri, R., Dave, U., . . . Madhavan, S. (2015). A case study for cloud-based high throughput analysis of NGS data using the globus genomics system. Computational and structural biotechnology journal, 13, 64-74.

Blaisdell, R. (2017). DNA Sequencing in the Cloud. Retrieved from https://rickscloud.com/dna-sequencing-in-the-cloud/.

genome.gov. (2015). DNA Sequencing. Retrieved from https://www.genome.gov/10001177/dna-sequencing-fact-sheet/.

Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques: Elsevier.

IBM. (2012). Cloud computing fundamentals: A different way to deliver computer resources. Retrieved from https://www.ibm.com/developerworks/cloud/library/cl-cloudintro/cl-cloudintro-pdf.pdf.

Khoury, M. J., & Feero, G. (2017). Genome Sequencing for Healthy Individuals? Think Big and Act Small! Retrieved from https://blogs.cdc.gov/genomics/2017/05/17/genome-sequencing-2/.

Leung, K., Lee, K., Wang, J., Ng, E. Y., Chan, H. L., Tsui, S. K., . . . Sung, J. J. (2011). Data mining on dna sequences of hepatitis b virus. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 8(2), 428-440.

Macias, F., & Thomas, G. (2011). Three Building Blocks to Enable the Cloud. Retrieved from https://www.cisco.com/c/dam/en_us/solutions/industries/docs/gov/white_paper_c11-675835.pdf.

Matthews, K. (2016). DNA Sequencing. Retrieved from https://cloudtweaks.com/2016/11/cloud-dna-sequencing/.

Mousannif, H., Khalil, I., & Kotsis, G. (2013-14). Collaborative learning in the clouds. Information Systems Frontiers, 15(2), 159-165. doi:10.1007/s10796-012-9364-y

Salzberg, S. L. (1999). Gene discovery in DNA sequences. IEEE Intelligent Systems and their Applications, 14(6), 44-48.

Verhaeghe, X. (n.d.). The Building Blocks of a Big Data Strategy. Retrieved from https://www.oracle.com/uk/big-data/features/bigdata-strategy/index.html.

Security Measures for Virtual and Cloud Environment

Dr. Aly, O.
Computer Science

Introduction

The purpose of this discussion is to discuss and analyze security measures for virtual and cloud environments. It also discusses and analyzes the current security models and the possibility for additional enhancements to increase the protection for these virtual and cloud environments.

Virtualization

Virtualization is a core technology in Cloud Computing technology. The purpose of Virtualization in Cloud Computing is to virtualize the resources to Cloud Computing Service Models such as Software-as-a-Service (SaaS), Infrastructure-as-a-Service (IaaS), and Platform-as-a-Service (PaaS) (Gupta, Srivastava, & Chauhan, 2016). Virtualization allows creating many instances of Virtual Machines (VMs) in a single physical operating system. The utilization of these VMS provides flexibility, agility, and scalability to the Cloud Computing resources. The VM is provided to the client to access resources at a remote location using the virtualization computing technique. Key features of Virtualization include the resource utilization using isolation among hardware, operating systems, and software. Another key feature of Virtualization is the multi-tenancy for simultaneous access of the VMs residing in a single physical machine. After the VM is created, it can be copied and migrated. These features of the Virtualization are double-edged as they provide flexibility, scalability, and agility, while they cause security challenges and concerns. The security concerns are one of the biggest obstacles to the widespread adoption of the Cloud Computing (Ali, Khan, & Vasilakos, 2015).

The hardware Virtualization using the physical machine is implemented using hypervisor. The hypervisor has two types: Type 1 and Type 2. Type 1 of the hypervisor is called “Bare Metal Hypervisor” as illustrated in Figure 1. Type 2 of the hypervisor is called “Hosted Hypervisor” as illustrated in Figure 2. The “Bare Metal Hypervisor” provides a layer between the physical system and the VMs, while the “Hosted Hypervisor” is deployed on the Operating System.

Figure 1. Hypervisor Type 1: Bare Metal Hypervisor. Adapted from (Gupta et al., 2016).

Figure 2: Hypervisor Type 2: Hosted Hypervisor. Adapted from (Gupta et al., 2016).

Virtualization has many security flaws to intruders. The traditional security measures that control physical systems are found inadequate or ineffective when dealing with the virtualized data center, hybrid and private Cloud environment (Gupta et al., 2016). Moreover, the default configuration of the hypervisor does not always include security measures that can protect the virtual and cloud environment.

One of the roles of the hypervisor is to control the management between the VMs and the physical resources. In Type 1 Hypervisor “Bare Metal Hypervisor,” the single point of failure increases the security breaches for the whole virtualized physical environment on the physical system. In Type 2 Hypervisor “Hosted Hypervisor,” the configuration exposes more threats than the “Bare Metal Hypervisor.” The VMs, which are hosted in the physical system, communicate with each other which can cause the loopholes to the intruders.

Virtualization is exposed to various types of threats and vulnerabilities. These vulnerabilities in Virtualization Security include VM Escape, VM Hoping, VM Theft, VM Sprawl, Insecure VM Migration, Sniffing and Spoofing. Figure 3 illustrates the vulnerabilities of the Virtualization.

Figure 3. Vulnerabilities of Virtualization. Adapted from (Gupta et al., 2016).

As indicated in (Gupta et al., 2016), Hypervisor should be inbuilt with the firewall security and disable access console (USB, NIC) to prevent unauthorized access. The access to the Role Based Access Control (RBAC) is effective to control Hyper jacking of VMs. The role and responsibilities should be defined to the users of the VMs to check the access authorization.

Security Principles, Security Mode. Security Models and Security Implementation

As indicated in (Abernathy & McMillan, 2016), the primary goal of all security measures is to provide protection and ensure that the measure is successful. Three major principles of security include confidentiality, integrity, and availability (CIA). These Security Principles are known as CIA triad. The confidentiality is provided if the data cannot be read either through access control and encryption for data as it exists on the hard drive or through encryption as the data is in transit. Confidentiality is the opposite of “disclosure” (Abernathy & McMillan, 2016). The Integrity is provided if the data is not changed in any way by unauthorized users. The integrity principle is provided through the hashing algorithm or a checksum. The availability principles provide the time the resources or data is available. The availability is measured as a percentage of “up” time with 99.9% of uptime representing more availability than 99% uptime. The availability principle ensures the availability and access of the data whenever it is needed. The availability principle is described as a prime goal of security. Most of the attacks result in a violation of one of these security principles of confidentiality, integrity, or availability. Thus, the defense-in-depth technique is recommended as an additional layer of security. For instance, even if the firewall is configured for protection, access control list should still be applied to resources to help prevent access to sensitive data in case the firewall gets breached. Thus, the defense-in-depth technique is highly recommended.

Security has four major Security Modes which are typically used by the Mandatory Access Control (MAC). These four security modes include Dedicated Security Mode, System High-Security Mode, Compartmented Security Mode, and Multi-Level Security Mode. The MAC operates in different security modes at different times based on variables such as sensitivity of data, the clearance level of the user, and the actions users are authorized to take. In all the four security modes, a non-disclosure agreement (NDA) must be signed, and the access to certain information is based on each mode.

Security Models provide a mapping technique for the security policymakers to the rules which a computer system must follow. Various types of the Security Models provide various approaches to implement such a mapping technique (Abernathy & McMillan, 2016).

State Machine Model,
Multi-Level Lattice Models,
Matrix-Based Models,
Non-Interface Models, and
Information Flow Models.

Moreover, there are formal Security Models which are incorporating security concepts and principles to guide the security design of systems. These formal Security Models include the following seven Models (Abernathy & McMillan, 2016). The detail for each model is beyond the scope of this discussion.

Bell-LaPadula Model.
Biba Model.
Clark-Wilson Integrity Model.
Lipner Model.
Brewer-Nash Model.
Graham-Denning Model.
Harrison-Ruzzo-Ullman Model.

With respect to the Security Implementation, there are standards which must be followed when implementing security measures for protection. These standards include ISO/IEC27001 and 27002 and PCI-DSS. The ISO/IEC27001 is the most popular standards, which is used by the organization to obtain certification for information security. These standard guides ensure that the information security management system (ISMS) of the organization is properly built, administered, maintained and progressed. The ISO/IEC 27002 standard provides a code of practice for information security management. This standard includes security measures such as access control, cryptography, compliance. The PCI-DSS v3.1 is specific for payment card industry.

Security Models in Cloud Computing

As Service Model is one of the main models in Cloud Computing. These services are offered through a Service Provider known as a Cloud Service Provider to the cloud users. Security and privacy are the main challenges and concern when using Cloud Computing environment. Although there is a demand to leverage the resources of the Cloud Computing to provide services to clients, there is also need and the requirement for the Cloud servers and resources not to learn any sensitive information about the data being managed, stored, or queried (Chaturvedi & Zarger, 2015). Effort should be exerted to improve the control of users to their data in the public environment. Cloud Computing Security Models include Multi-Tenancy Model, Cloud Cube Security Model, the Mapping Model of Cloud, Security and Compliance, and the Cloud Risk Accumulation Model of CSA (Chaturvedi & Zarger, 2015).

The Multi-Tenancy Model is described to be the major functional characteristic of Cloud Computing allowing multiple applications to provide cloud services to the clients. The user’s tenants are separated by virtual partitions, and each partition holds clients tenant’s data, customized settings and configuration settings. Virtualization in a physical machine allows users to share computing resources such as memory, processor I/O and storage to different users’ applications and amends the utilization of Cloud resources. SaaS is a good example of Multi-Tenant Model which provides scalability to serve a large number of clients based on Web service. This model of Multi-Tenancy is described by the security experts to be vulnerable and expose confidentiality which is regarded to be one of the Security Principles to risk between the tenants. Side channel attack is a significant risk in the Multi-Tenancy Model. This kind of attack is based on information obtained from bandwidth monitoring. Another risk of the Multi-Tenancy Model is the assignment of resources to the clients with unknown identity and intentions. Another security risk associated with Multi-Tenancy involves data storage of multiple tenants in the same database tablespaces or backup tapes.

The Cloud Cube Security Model is characterized by four main elements; Internal/External, Proprietary/Open, Parameterized/De-parameterized, and Insourced/Outsourced. The Mapping Model of Cloud, Security, and Compliance Model is another Model to provide a better method to analyze the gaps between cloud architecture and compliance framework and the corresponding security control strategies provided by the Cloud Service Provider, or third parties. The Cloud Risk Accumulation Model of CSA is the last Security Models of Cloud Computing. The three Cloud Models of IaaS, PaaS, and SaaS have various security requirements due to the layer dependencies.

Security Implementation: Virtual Private Cloud (VPC)

The VPC Deployment Model is a model that provides more security than the Public Deployment Model. In this Model, the user can apply Access Control at the instance level as well as at the network level. Policies are configured and assigned to groups based on the access role. The VPC as a Deployment Model of the Cloud Computing did solve problems such as the loss of authentication, loss of confidentiality, loss of availability, loss, and corruption of data (Abdul, Jena, Prasad, & Balraju, 2014). The VPC is logically isolated from other virtual networks in the cloud. As indicated in (Abdul et al., 2014), VPC is regarded as the most prominent approach to Trusted Computing technology. However, organizations must implement the security measures based on the requirements of the business. For instance, organizations and users have control to select the IP address range, create a subnet, route tables, network gateway and security as illustrated in Figure 4.

Figure 4. Virtual Private Cloud Security Implementation.

In summary, security measures must be implemented to protect the cloud environment. Virtualization imposes threats to the Cloud environment. The hypervisor is a major component of Virtualization. It is recommended that the Hypervisor should be inbuilt with the firewall security and disable access console (USB, NIC) to prevent unauthorized access. The access to the Role Based Access Control (RBAC) should be effective to control Hyper jacking of VMs. The role and responsibilities should be defined to the users of the VMs to check the access authorization. Virtual Private Cloud as a trusted deployment model of the Cloud Computing provides a more secure cloud environment than the Public Cloud. The Security Implementation must follow certain standards. The organization must comply with these standards to protect organizations and users.

References

Abdul, A. M., Jena, S., Prasad, S. D., & Balraju, M. (2014). Trusted Environment In Virtual Cloud. International Journal of Advanced Research in Computer Science, 5(4).

Abernathy, R., & McMillan, T. (2016). CISSP Cert Guide: Pearson IT Certification.

Ali, M., Khan, S. U., & Vasilakos, A. V. (2015). Security in cloud computing: Opportunities and challenges. Information Sciences, 305, 357-383. doi:10.1016/j.ins.2015.01.025

Chaturvedi, D. A., & Zarger, S. A. (2015). A review of security models in cloud computing and an Innovative approach. International Journal of Computer Trends and Technology (IJCTT), 30(2), 87-92.

Gupta, M., Srivastava, D. K., & Chauhan, D. S. (2016). Security Challenges of Virtualization in Cloud Computing. Paper presented at the Proceedings of the Second International Conference on Information and Communication Technology for Competitive Strategies, Udaipur, India.

Big Data Analytics Application to Solve Known Security Issues

Dr. Aly, O.
Computer Science

Introduction

The purpose of this discussion is to identify two advantages of applying Big Data Analytics to solve known security issues such as malware detection, network hacking, spam, and so forth. The discussion and the analysis include the reasons and rationale for utilizing Big Data Analytics to solve security issues. The discussion begins with a brief overview of Big Data Analytics and the Security Threats.

Big Data Analytics

Big Data (BD) is the major topic across some domains and fields such as management and marketing, scientific research, national security and government (Vivekanand & Vidyavathi, 2015). BD enables making an informed decision as it shifts the reasoning from logical and causality-based to the acknowledgment of correlation links between events (De Mauro, Greco, & Grimaldi, 2015). The public and private sectors are increasing their use of the Big Data Analytics (BDA) in different areas (Vivekanand & Vidyavathi, 2015). The process of very large amounts of data is the main benefit of Big Data Analytics (Emani, Cullot, & Nicolle, 2015). Big Data Analytics is defined in (Emani et al., 2015) as the use of advanced analytics techniques on Big Data. As elaborated by (Gupta & Jyoti, 2014), BDA is the process of analyzing Big Data to find hidden patterns, unknown correlations and other useful information which can be extracted to make a sound decision. In (CSA, 2013), BDA is described as the process of analyzing and mining Big Data and can produce operational and business knowledge on an unprecedented scale and specificity. The massive volume of semi-structured, unstructured data can be mined using the BDA (Gandomi & Haider, 2015; Gupta & Jyoti, 2014). The need and the requirement to analyze and leverage trend data which are collected by organizations is one of the main drivers for BDA tools (CSA, 2013). The value of BDA is increasing as the cash flow is increasing. Figure 1 illustrates the graph for the value of BDA with dimensions of time and cumulative cash flow. Thus, there is no doubt that BDA provides great benefits to organizations.

Figure 1. The Value of Big Data Analytics. Adapted from (Gupta & Jyoti, 2014).

Big Data Analytics for Security

BD is changing the analytics landscape (CSA, 2013). BDA can be leveraged to enhance the information security and situational awareness (CSA, 2013). For instance, BDA can be utilized to analyze financial transactions, log files, and network traffic to identify anomalies and suspicious activities, and to accelerate multiple sources of information into a coherent view (CSA, 2013). The malicious attacks have been increasing lately. Thus, the increasing security threats come along with increasing use of BD, BDA, and Cloud Computing technologies. The malicious attacks have become the major topic of government, organization, and industry (Gupta & Jyoti, 2014). Big Data Security Analytics is used for the increasing practice of organization to gather and analyze security data to detect vulnerabilities and intrusions by attackers (Gupta & Jyoti, 2014). The Advanced Persistent Threats (APT) is a subset of the malicious attacks and threats which are well-resourced and trained attacks which conduct multi-year intrusion campaigns targeting highly sensitive economic, proprietary or national security information (Gupta & Jyoti, 2014). The aim APT is to maintain the persistent attack without getting detected inside their target environment (Gupta & Jyoti, 2014).

Thus, the main purpose of using BD techniques to analyze the data and apply same to implement enhanced data security techniques (Gupta & Jyoti, 2014). Big Data technologies facilitate a wide range of industries to develop affordable infrastructures for security monitoring (Cardenas, Manadhata, & Rajan, 2013). Organizations can use various systems with a range of Security Analytics Sources (SAS). These systems can generate messages or alerts and transmits them to the trusted server for analysis and action (Gupta & Jyoti, 2014). The system can be Host-based Intrusion Detection System (HIDS), an antivirus engine which writes a Syslog or interfaces reporting events to remove service such as Security and Information Event Monitoring (SIEM) system (Gupta & Jyoti, 2014).

There are very good reasons for BD to enter the security domain. In (Gupta & Jyoti, 2014) three main reasons for BD to enter the enterprise security mainstream. The first reason is the continuing problems with detection and response of threats because the existing security analytics tools are found inadequate to handle advanced virus, malware, stealthy attack techniques, and the growing army of well-organized global cyber attacks. The second reason is the Moore’s Law and Open Source. Security vendors are increasing the development cycles by customizing open source tools like Cassandra, Hadoop, MapReduce and Mahout for security analytics purposes which can help accelerate innovation to protect systems from threats. The third reason is the tons of activity on the supply side (Gupta & Jyoti, 2014). Organizations want security alerts from new vendors aside from HP, IBM, McAfee, and RSA Security. Some vendors such as Hexis Cyber Solutions, Leidos, Narus, and Palantir will move beyond the government and extend into the private sector. Others like Click Security, Forescale, and Netskope have intelligence backgrounds to deal with the malicious attacks (Gupta & Jyoti, 2014).

Fraud Detection is one of the most visible utilization of BDA (Cardenas et al., 2013; CSA, 2013). Although Credit Card companies have conducted fraud detection for decades, the custom-built infrastructure to mine BD for fraud detection was not economical to adapt for other fraud detection uses (CSA, 2013). The off-the-shelf BD tools and techniques are now attracting the attention to analytics for fraud detection in healthcare, insurance and other fields (CSA, 2013). Examples of using BD for Security purposes include (1) Network Security, (2) Enterprise Event Analytics, (3) Netflow Monitoring to Identify Botnets, and (4) Advanced Persistent Threats Detection (APT). The APT has two categories of (1) Beehive: Behavior Profiling for APT Detection, and (2) Using Large-Scale Distributed Computing to Unveil APTs. For this discussion, the Network Security and Netflow Monitoring to Identify Botnets are the two examples for taking advantages of BDA for the security purposes (CSA, 2013).

Network Security

The case study by Zions Bancorporation is a good example for using BD for security purposes (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014). The traditional SIEM could not handle the volume of the data generated for security purposes (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014). Zions Bancorporation reported that using Hadoop clusters and business intelligence tools lead to parsing more data faster than the traditional SIEM tools (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014). While the traditional SIEM system takes between twenty and one hours, the Hadoop system provides the result in a minute (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014). The system enables users to mine meaningful security information from sources such as firewalls and security devices, website traffic, business processes and other transactions (Cardenas et al., 2013; CSA, 2013; McDaniel & Smith, 2013; Raja & Rabbani, 2014). The incorporation of unstructured data and multiple disparate datasets into a single analytical framework is one of the main promising features of BD (CSA, 2013; Raja & Rabbani, 2014).

Netflow Monitoring to Identify Botnets

Botnets are a major threat to the current Internet (Francois, Wang, Bronzi, State, & Engel, 2011). The traffic of botnet is mixed with a large volume of benign traffic due to ubiquitous high-speed networks (Francois et al., 2011). These networks can be monitored using IP flow records. However, their forensic analysis forms the major computational bottleneck (Francois et al., 2011). The BotCloud research project by (Francois et al., 2011) leveraging Hadoop and MapReduce technology is a good example of taking advantage of BDA for security purpose. In this project of (Francois et al., 2011), a distributed computing framework leveraging a host dependency model and adapted PageRank algorithm were proposed. Moreover, the Hadoop cluster including MapReduce was utilized to analyze and detect densely interconnected hosts which are potential botnet members. The large volume of Netflow data collected for data analysis was the reason for using MapReduce framework (CSA, 2013; Francois et al., 2011). The project showed a good detection accuracy and a good efficiency based on Hadoop cluster.

Conclusion

Big Data means Big Value for organizations at various levels including the security. BD is changing the analytics landscape. BDA can be leveraged to enhance the information security and situational awareness to detect any abnormal activities. For instance, BDA can be utilized to analyze financial transactions, log files, and network traffic to identify anomalies and suspicious activities, and to accelerate multiple sources of information into a coherent view. Organizations can benefit greatly from BDA tools such as Hadoop and MapReduce for security purposes. There are various reasons for using BD and BDA for security discussed in this DB. In this discussion, the Network Security and Netflow Monitoring to Identify Botnets are the two examples for taking advantages of BDA for the security purposes.

References

Cardenas, A. A., Manadhata, P. K., & Rajan, S. P. (2013). Big data analytics for security. IEEE Security & Privacy, 11(6), 74-76.

CSA, C. S. A. (2013). Big Data Analytics for Security Intelligence. Big Data Working Group.

De Mauro, A., Greco, M., & Grimaldi, M. (2015). What is big data? A consensual definition and a review of key research topics. Paper presented at the AIP Conference Proceedings.

Emani, C. K., Cullot, N., & Nicolle, C. (2015). Understandable big data: A survey. Computer science review, 17, 70-81.

Francois, J., Wang, S., Bronzi, W., State, R., & Engel, T. (2011). Botcloud: Detecting botnets using MapReduce. Paper presented at the Information Forensics and Security (WIFS), 2011 IEEE International Workshop on.

Gandomi, A., & Haider, M. (2015). Beyond the hype: Big data concepts, methods, and analytics. International Journal of Information Management, 35(2), 137-144.

Gupta, B., & Jyoti, K. (2014). Big data analytics with Hadoop to analyze targeted attacks on enterprise data.

McDaniel, P., & Smith, S. (2013). Big Data Analytics for Security. The University of Texas at Dallas.

Raja, M. C., & Rabbani, M. A. (2014). Big Data analytics security issues in a data-driven information system.

Vivekanand, M., & Vidyavathi, B. M. (2015). Security Challenges in Big Data: Review. International Journal of Advanced Research in Computer Science, 6(6).

Cloud Computing Security Issues

Dr. Aly, O.
Computer Science

Introduction

The purpose of this discussion is to discuss and analyze two security issues associated with the Cloud Computing system. The analysis includes the causes for these two security issues and the solutions. The discussion begins with an overview of the Security Issues when dealing with Cloud Computing.

Security Issues Associated with Cloud Computing

Cloud Computing and Big Data are the current buzz words in IT industry. Cloud Computing does not only solve the challenges of Big Data, but also offers benefits for businesses, organizations, and individuals such as:

Cost saving,
Access data from anywhere anytime,
Pay per use like any utility,
Data Storage,
Data Processing,
Elasticity,
Energy Efficiency,
Enhanced Productivity, and more (Botta, de Donato, Persico, & Pescapé, 2016; Carutasu, Botezatu, Botezatu, & Pirnau, 2016; El-Gazzar, 2014).

Despite the tremendous benefits of the Cloud Computing, the emerging technology of the Cloud Computing is confronted with many challenges. The top challenge is the Security, which is expressed by executives as number one concern for adopting Cloud Computing (Avram, 2014; Awadhi, Salah, & Martin, 2013; Chaturvedi & Zarger, 2015; Hashizume, Rosado, Fernández-medina, & Fernandez, 2013; Pearson, 2013).

The security issues in Cloud Computing environment are distinguished from the security issues of the traditional distributed systems (Sakr & Gaber, 2014). Various research studies, in an attempt, to justify this security challenge in the Cloud Computing environment, provide various reasons such as the underlying technologies of Cloud Computing have security issues, such as virtualization, and SOA (Service Oriented Architecture) (Inukollu, Arsi, & Ravuri, 2014). Thus, the security issues that are associated with these technologies come along with the Cloud Computing (Inukollu et al., 2014). The Cloud Computing Service Model of PaaS (Platform as a Service) is a good example because it is based on SOA (Service-Oriented Architecture) Model. Thus, the Cloud Computing Service Model PaaS inherits all of the security issues that are associated with SOA technology (Almorsy, Grundy, & Müller, 2016). In (Sakr & Gaber, 2014), factors such as multi-tenancy, trust asymmetry, global reach and insider threats contribute to the security issues associated with the Cloud Computing environment. In (Tripathi & Mishra, 2011), eleven security issues and threats associated with the Cloud environment are identified (1) VM-Level attacks, (2) Abuse and Nefarious Use of Cloud Computing, (3) Loss of Governances, (4) Lock-IN, (5) Insecure Interfaces and APIs, (6) Isolation Failure, (7) Data Loss or Leakage, (8) Account or Service Hijacking, (9) Management Interface Compromise, (10) Compliance Risks, and (11) Malicious Insiders. In the more recent report of (CSA, 2016), twelve critical issues to the Cloud security are identified and ranked in the order of severity. Data Breaches is ranked at the top and regarded as the most severe security issue of the Cloud Computing environment. The Weak Identity, Credential, and Access Management is the second severe security issue. The Insecure APIs, System and Application Vulnerabilities, and Account Hijacking are the next ranked security issues. Table 1 lists the twelve security issues associated with the Cloud Computing as reported by (CSA, 2016).

Table 1. Top Twelve Security Issues of Cloud Computing in Order of Severity. Adapted from (CSA, 2016).

The discussion and analysis are limited to the top two security issues, which are the Data Breaches, and the Weak Identity, Credential and Access Management. The discussion and analysis cover the causes and the proposed solutions.

Data Breaches

The data breach occurs when the sensitive and confidential information or any private data not intended for the public is released, viewed, stolen or used by unauthorized users (CSA, 2016). The data breach issue is not unique to the Cloud Computing environment (CSA, 2016). However, it is consistently ranking as the top issue and concern for the Cloud users. The Cloud environment is subject to the same threats as the traditional corporate network and new attack techniques due to the shared resources. The sensitivity degree of the data determines the extent of the damage. The impact of the Data Breach on users and organization is devastated. For instance, in a single incident of a data breach in the USA, 40 million credit card numbers and about 70 million addresses, phone numbers and other private and personal information details were compromised (Soomro, Shah, & Ahmed, 2016). The firm spent $61 million in less than one year of the breach for the damages and the recovery, besides the cash loss, the profit which dropped by 46% in one quarter of the year (Soomro et al., 2016). The “BitDefender,” the anti-virus firm, and the British telecom provider “TalkTalk” are other good examples of the Data Breaches. The private information such as username and passwords of the customers of “BitDefender” was stolen in mid-2015, and the hacker demanded a ransom of $15,000 (CSA, 2016; Fox-Brewster, 2015). Multiple security incidents in 2014 and 2015 were reported by “TalkTalk” resulting in the theft of four million users’ private information (CSA, 2016; Gibbs, 2015).

The organization is obliged to exercise certain security standards of care to ensure that sensitive information is not released to unauthorized users. The Cloud providers have certain responsibilities in certain aspects of the Cloud Computing, and they usually provide the security measures for these aspects. However, the Cloud users also have certain aspects when using the Cloud Computing, and they are responsible for these aspects to protect their data in the Cloud. The multi-factor authentication and encryptions are the two techniques that are proposed to secure the Cloud environment.

Insufficient Identity, Credential, and Access Management

Data Breaches and the malicious attacks happen due to various reasons. The lack of scalable Identity Access Management Systems can cause Data Breach (CSA, 2016). The failure to use Multi-Factor Authentication, weak password use and a lack of ongoing automated rotation of cryptographic keys, passwords, and certificates can cause Data Breach (CSA, 2016). Malicious attackers, who can masquerade as legitimate users or developers, can modify and delete data, issue control, and management functions, and snoop on data in transit or release malicious software which appears to originate from a legitimate source. The insufficient identity, credential or key management can allow these malicious attackers or non-authorized users to access private and sensitive data and cause catastrophic damage to the users and the organizations as well. The GitHub attack and the Dell root certificate are good examples of this security issues. The GitHub is a good example of this security issue as the attackers scrape GitHub for Cloud service credentials, hijacked account to mine virtual currency (Sandvik 2014). Dell is another example which releases a fix for root certificate failure because all dell systems used the same secret key and the certificate which enables creating a certificate for any domain, which is trusted by Dell (Schwartz, 2015).

The security issues require Cloud Computing systems to be protected so that unauthorized users should not have access to the private and sensitive information. Various solutions are proposed to solve this security issue of insufficient identity and access management. A security framework in a distributed system to consider public key cryptography, software agents and XML binding technologies was proposed as indicated in (Prakash & Darbari). The credential and cryptographic keys should not be embedded in source code or distributed in public repositories such as GitHub. The keys should be properly secured using well-secured public key infrastructure (PKI) to ensure key-management (CSA, 2016). The Identity Management Systems (IMS) should scale to handle the lifecycle management for millions of users and cloud service providers (CSP). The IMS should support immediate de-provisioning of access to resources when events such as job termination or role change. The Multi-Factor Authentication System (MAS) such as a smart card, phone authentication, should be required for user and operator of the Cloud service (CSA, 2016).

References

Almorsy, M., Grundy, J., & Müller, I. (2016). An analysis of the cloud computing security problem. arXiv preprint arXiv:1609.01107.

Avram, M. G. (2014). Advantages and Challenges of Adopting Cloud Computing from an Enterprise Perspective. Procedia Technology, 12, 529-534. doi:10.1016/j.protcy.2013.12.525

Awadhi, E. A., Salah, K., & Martin, T. (2013, 17-20 Nov. 2013). Assessing the Security of the Cloud Environment. Paper presented at the GCC Conference and Exhibition (GCC), 2013 7th IEEE.

Botta, A., de Donato, W., Persico, V., & Pescapé, A. (2016). Integration of Cloud Computing and Internet Of Things: a Survey. Future Generation computer systems, 56, 684-700.

Carutasu, G., Botezatu, M., Botezatu, C., & Pirnau, M. (2016). Cloud Computing and Windows Azure.

Chaturvedi, D. A., & Zarger, S. A. (2015). A review of security models in cloud computing and an Innovative approach. International Journal of Computer Trends and Technology (IJCTT), 30(2), 87-92.

CSA. (2016). The Treacherous 12: Cloud Computing Top Threats in 2016. Cloud Security Alliance

Top Threats Working Group.

El-Gazzar, R. F. (2014). A literature review on cloud computing adoption issues in enterprises. Paper presented at the International Working Conference on Transfer and Diffusion of IT.

Fox-Brewster, T. (2015). Anti-Virus Firm BitDefender Admits Breach, Hacker Claims Stolen Passwords Are Unencrypted. Retrieved from https://www.forbes.com/sites/thomasbrewster/2015/07/31/bitdefender-hacked/#5a5f5b125ab2.

Gibbs, S. (2015). TalkTalk criticised for poor security and handling of hack attack. Retrieved from http://www.theguardian.com/technology/2015/oct/23/talktalk-criticised-for-poor-security-and-handling-of-hack-attack.

Inukollu, V. N., Arsi, S., & Ravuri, S. R. (2014). Security issues associated with big data in cloud computing. International Journal of Network Security & Its Applications, 6(3), 45.

Pearson, S. (2013). Privacy, security and trust in cloud computing Privacy and Security for Cloud Computing (pp. 3-42): Springer.

Prakash, V., & Darbari, M. A Review on Security Issues in Distributed Systems.

Sakr, S., & Gaber, M. (2014). Large Scale and big data: Processing and Management: CRC Press.

Sandvik , R. A. (2014). Attackers Scrape GitHub for Cloud Service Credentials, Hijack Account to Mine Virtual Currency. Retrieved from https://www.forbes.com/sites/runasandvik/2014/01/14/attackers-scrape-github-for-cloud-service-credentials-hijack-account-to-mine-virtual-currency/#71fe913c3196.

Schwartz, M. (2015). Dell Releases Fix for Root Certificate Fail. Retreived from http://www.bankinfosecurity.com/dell-releases-fix-for-root-certificate-fail-a-8701/op-1.

Soomro, Z. A., Shah, M. H., & Ahmed, J. (2016). Information security management needs more holistic approach: A literature review. International Journal of Information Management, 36(2), 215-225.

Tripathi, A., & Mishra, A. (2011, 14-16 Sept. 2011). Cloud computing security considerations. Paper presented at the 2011 IEEE International Conference on Signal Processing, Communications and Computing (ICSPCC).