Information Technology Requirements in Healthcare

Dr. O. Aly
Computer Science

The purpose of this discussion is to address one of the sectors that utilizes a few unique information technology (IT) requirements.  The selected sector for this discussion is health care. The discussion addresses the IT needs based on a case study.   The discussion begins with Information Technology Key Role in Business, followed by the Healthcare Industry Case Study.

Information Technology Key Role in Business

Information technology (IT) is a critical resource for businesses in the age of Big Data and Big Data Analytics (Dewett & Jones, 2001; Pearlson & Saunders, 2001).  IT supports and consumes a significant amount of the resources of enterprises.  IT needs to be managed wisely like other significant types of business resources such as people, money, and machines.  These resources must return a value to the business. Thus, enterprises must carefully evaluate its resources including the IT resources that can be efficiently and effectively used. 

Information system and technology are now integrated with almost every aspect of every business.  IT and IS play significant roles in business, as it simplifies the organizational activities and processes.  Enterprises can gain competitive advantages when utilizing appropriate information technology.  The inadequate information system can cause a breakdown in providing services to customers or developing products which can harm sales and eventually the businesses (Bhatt & Grover, 2005; Brynjolfsson & Hitt, 2000; Pearlson & Saunders, 2001).  The same thing applies when inefficient business processes sustained by ill-fitting information system and technology as they increase the cost on the business without any return on investment or value.  The lag in the implementation or poor process adaptation reduce the profits and the growth and can place the business behind other competitors. The failure of the information system and technology in business is caused primarily by ignoring them during the planning of the business strategy and organizational strategy.  IT will fail to support business goals and organizational systems because it was not considered in the business and organizational strategy. When the business strategy is misaligned with the organizational strategy, IT is subject to failure (Pearlson & Saunders, 2001).

IT Support to Business Goals

Enterprises should invest in IT resources that will benefit them.  They should make investment in systems that supports their business goals including gaining competitive advantages (Bhatt & Grover, 2005).  Although IT represents a significant investment in businesses, yet, the poorly chosen information system can become an obstacle to achieving the business goals (Dewett & Jones, 2001; Henderson & Venkatraman, 1999; Pearlson & Saunders, 2001).  When the IT does not allow the business to achieve its goals, or lack the capacity required to collect, store, and transfer critical information for the business, the results can be disastrous, leading to dissatisfied customers, or excessive costs for production.  The Toys R US store is an excellent example of such an issue (Pearlson & Saunders, 2001).  The well-publicized website was not designed to process and fulfill orders fast enough.  The site could be redesigned with an additional cost which could have been saved if the IT strategy and business goals were discussed together to be aligned together.

IT Support to Organizational Systems

Organizations systems including people, work processes, and structure represent the core elements of the business.  Enterprises should plan to enable these systems to work together efficiently to achieve the business goals (Henderson & Venkatraman, 1999; Pearlson & Saunders, 2001; Ryssel, Ritter, & Georg Gemünden, 2004). When the IT of the business fails to support the business’ organization systems, the result is a misalignment of the resources needed to achieve the business goals.  For instance, when organizations decide to use Enterprise Resource Planning (ERP) system, the system often dictates how many business processes are executed.  When enterprises deploy a technology, they should think through various aspects such as how the technology will be used in the organization, who will use it, how they will use it, how to make sure the application chosen accomplishes what is intended.  For instance, an organization which plans to institute a wide-scale telecommuting program would need an information system strategy that is compatible with its organization strategy (Pearlson & Saunders, 2001).  The desktop PCs located within the corporate office are not the right solution for a telecommuting organization.  Laptop computers application that are accessible online anywhere and anytime are a most appropriate solution.  If a business only allows the purchase of desktop PCs and only builds systems accessible from desks within the office, the telecommuting program is subject to failure.  Thus, information systems implementation should support the organizational systems and should be aligned with the business goals.

Advantages of IT in Business

Business is able to transform local business to international business with the advent of information system and internet (Bhatt & Grover, 2005; Zimmer, 2018).  Organizations are under pressures to take advantages of information technology to gain competitive advantages.  They are turning to information technology to streamline services and enhance the performance.  IT has become an essential feature in the landscape of the business that aid business to decrease the costs, improve communication, develop recognition, and release more innovative and attractive products.

IT streamlines communication as effective communication is critical to an organization’s success (Bhatt & Grover, 2005; Zimmer, 2018). A key advantage of information system lies in its ability to streamline communication both internally and externally.  For instance, online meeting and video conferencing platform such as Skype, WebEx provide business the opportunity to collaborate virtually in real-time, reducing costs associated with bringing clients on-site or communicating with staff who work remotely.  IT enables Enterprises to connect almost effortlessly with international suppliers and consumers. 

IT can enhance the competitive advantages in the marketplace of the business by facilitating strategic thinking and knowledge transfer (Bhatt & Grover, 2005; Zimmer, 2018).  When using IT as a strategic investment and not as a means to an end, IT provides business with the tools they need to properly evaluate the market and implement strategies needed for a competitive edge.

IT stores and safeguards information, as information management is another domain of IT (Bhatt & Grover, 2005; Zimmer, 2018).  IT is essential to any business that must store and safeguard sensitive information such as financial data for long periods.  Various security techniques can be applied to ensure the data is stored in a secure place.  Organizations should evaluate the options available to store their data such as locally using local data center or cloud-based storage methods. 

IT cuts costs and eliminate waste  (Bhatt & Grover, 2005; Zimmer, 2018).  Although IT implementation at the beginning will be expensive, in the long run, it becomes incredibly cost-effective by streamlining the operational and managerial processes of the business.  Thus, investing in the appropriate IT is key for a business to gain a return on investment.  For instance, the implementation of online training programs is a classic example of IT improving the internal processes of the business by reducing the costs and employees’ time spent outside of work, and travel costs. Information technology enables organizations to implement more with less investment without sacrificing quality or value.

Healthcare Industry Case Study

The healthcare industry generated extensive data driven by keeping patients’ records, complying with regulations and policies, and patients care (Raghupathi & Raghupathi, 2014).  The current trend is digitalizing this explosive growth of the data in the age of Big Data (BD) and Big Data Analytics (BDA) (Raghupathi & Raghupathi, 2014).  BDA has made a revolution in healthcare by transforming the valuable information, knowledge to predict epidemics, cure diseases, improve quality of life, and avoid preventable deaths (Van-Dai, Chuan-Ming, & Nkabinde, 2016).  Various applications of BDA in healthcare include pervasive health, fraud detection, pharmaceutical discoveries, clinical decision support system, computer-aided diagnosis, and biomedical applications. 

Healthcare Big Data Benefits and Challenges

            Healthcare sector employs BDA in various aspect of healthcare such as detecting diseases at early stages, providing evidence-based medicine, minimizing doses of medication to avoid any side effects, and delivering useful medicine base on genetic analysis.  The use of BD and BDA can reduce the re-admission rate, and thereby the healthcare related costs for patients are reduced.  Healthcare BDA can be used to detect spreading diseases earlier before the disease gets spread using real-time analytics (Archenaa & Anita, 2015; Raghupathi & Raghupathi, 2014; Wang, Kung, & Byrd, 2018).   Example of the application of BDA in the healthcare system is Kaiser Permanente implementing a HealthConnect technique to ensure data exchange across all medical facilities and promote the use of electronic health records (Fox & Vaidyanathan, 2016).

            Despite the various benefits of BD and BDA in the healthcare sector, various challenges and issues are emerging from the application of BDA in healthcare.  The nature of the healthcare industry poses challenging to BDA (Groves, Kayyali, Knott, & Kuiken, 2016).  The episodic culture, the data puddles, and the IT leadership are the three significant challenges of the healthcare industry to apply BDA.  The episodic culture addresses the conservative culture of the healthcare and the lack of IT technologies mindset creating rigid culture.  Few providers have overcome this rigid culture and started to use the BDA technology. The data puddles reflect the silo nature of healthcare.  Silo is described as one of the most significant flaws in the healthcare sector (Wicklund, 2014).  The use of the technology properly is lacking in healthcare sector resulting in making the industry fall behind other industries. All silos use their methods to collect data from labs, diagnosis, radiology, emergency, case management and so forth.  The IT leadership is another challenge is caused by the rigid culture of the healthcare industry.  The lack of the latest technologies among the IT leadership in the healthcare industry is a severe problem. 

Healthcare Data Sources for Data Analytics

            The current healthcare data is collected from clinical and non-clinical sources (InformationBuilders, 2018; Van-Dai et al., 2016; Zia & Khan, 2017).  The electronic healthcare records are digital copies of the medical history of the patients.  It contains a variety of data relevant to the care of the patients such as demographics, medical problems, medications, body mass index, medical history, laboratory test data, radiology reports, clinical notes, and payment information. These electronic healthcare records are the most critical data in healthcare data analytics, because it provides effective and efficient methods for the providers and organizations to share data (Botta, de Donato, Persico, & Pescapé, 2016; Palanisamy & Thirunavukarasu, 2017; Van-Dai et al., 2016; Wang et al., 2018).  

The biomedical imaging data plays a crucial role in healthcare data to aid disease monitoring, treatment planning and prognosis.  This data can be used to generate quantitative information and make inferences from the images that can provide insights into a medical condition.  The images analytics is more complicated due to the noises of the data associated with the images and is one of the significant limitations with biomedical analysis (Ji, Ganchev, O’Droma, Zhang, & Zhang, 2014; Malik & Sangwan, 2015; Van-Dai et al., 2016). 

The sensing data is ubiquitous in the medical domain both for real-time and for historical data analysis.  The sensing data involve several forms of medical data collection instruments such as the electrocardiogram (ECG) and electroencephalogram (EEG) which are vital sensors to collect signals from various parts of the human body.  The sensing data plays a significant role for intensive care units (ICU) and real-time remote monitoring of patients with specific conditions such as diabetes or high blood pressure.  The real-time and long-term analysis of various trends and treatment in remote monitoring programs can help providers monitor the state of those patients with certain conditions(Van-Dai et al., 2016). 

The biomedical signals are collected from many sources such as hearts, blood pressure, oxygen saturation levels, blood glucose, nerve conduction, and brain activity.  Examples of biomedical signals include electroneurogram (ENG), electromyogram (EMG), electrocardiogram (ECG), electroencephalogram (EEG), electrogastrogram (EGG), and phonocardiogram (PCG).  The biomedical signals real-time analytics will provide better management of chronic diseases, earlier detection of adverse events such as heart attacks, and strokes and earlier diagnosis of disease.   These biomedical signals can be discrete or continuous based on the kind of care or severity of a particular pathological condition (Malik & Sangwan, 2015; Van-Dai et al., 2016).

The genomic data analysis helps better understand the relationship between various genetic, mutations, and disease conditions. It has great potentials in the development of various gene therapies to cure certain conditions.  Furthermore, the genomic data analytics can assist in translating genetic discoveries into personalized medicine practice (Liang & Kelemen, 2016; Luo, Wu, Gopukumar, & Zhao, 2016; Palanisamy & Thirunavukarasu, 2017; Van-Dai et al., 2016).

The clinical text data analytics using the data mining are the transformation process of the information from clinical notes stored in unstructured data format to useful patterns.  The manual coding of clinical notes is costly and time-consuming, because of their unstructured nature, heterogeneity, different format, and context across different patients and practitioners.  Various methods such as natural language processing (NLP) and information retrieval can be used to extract useful knowledge from large volume of clinical text and automatically encoding clinical information in a timely manner (Ghani, Zheng, Wei, & Friedman, 2014; Sun & Reddy, 2013; Van-Dai et al., 2016).

The social network healthcare data analytics is based on various kinds of collected social media sources such as social networking sites, e.g., Facebook, Twitter, Web Logs, to discover new patterns and knowledge that can be leveraged to model and predict global health trends such as outbreaks of infections epidemics (InformationBuilders, 2018; Luo et al., 2016; Van-Dai et al., 2016; Zia & Khan, 2017).

IT Requirements for Healthcare Sector

The basic requirement for the implementation of this proposal included not only the tools and required software, but also the training at all levels from staff, to nurses, to clinicians, to patients.  The list of the requirements is divided into system requirement, implementation requirement, and training requirements. 

Cloud Computing Technology Adoption Requirement

The volume is one of the significant characteristics of BD, especially in the healthcare industry (Manyika et al., 2011).  Based on the challenges addressed earlier when dealing with BD and BDA in healthcare, the system requirements cannot be met using the traditional on-premise technology center, as it cannot handle the intensive computation requirements of BD, and the storage requirement for all the medical information from various hospitals from the four States (Hu, Wen, Chua, & Li, 2014). Thus, the cloud computing environment is found to be more appropriate and a solution for the implantation of this proposal.  Cloud computing plays a significant role in BDA (Assunção, Calheiros, Bianchi, Netto, & Buyya, 2015).  The massive computation and storage requirement of BDA brings the critical need for cloud computing emerging technology (Mehmood, Natgunanathan, Xiang, Hua, & Guo, 2016).  Cloud computing offers various benefits such as cost reduction, elasticity, pay per use, availability, reliability, and maintainability (Gupta, Gupta, & Mohania, 2012; Kritikos, Kirkham, Kryza, & Massonet, 2017).  However, although cloud computing offers various benefits, it has security and privacy issues using the standard deployment models of public cloud, private cloud, hybrid cloud, and community cloud.  Thus, one of the major requirements is to adopt the Virtual Private Cloud as it has been regarded as the most prominent approach to trusted computing technology (Abdul, Jena, Prasad, & Balraju, 2014).

 Security Requirement

Cloud computing has been facing various threats (Cloud Security Alliance, 2013, 2016, 2017).   Records showed that over the last three years from 2015 until 2017, the number of breaches, lost medical records, and settlements of fines are staggering (Thompson, 2017).  The Office of Civil Rights (OCR) issued 22 resolution agreements, requiring monetary settlements approaching $36 million (Thompson, 2017).  Table 1 shows the data categories and the total for each year. 

Table 1.  Approximation of Records Lost by Category Disclosed on HHS.gov (Thompson, 2017)

Furthermore, a recent report published by HIPAA showed the first three months of 2018 experienced 77 healthcare data breaches reported to the OCR (HIPAA, 2018d).  In the second quarter of 2018, at least 3.14 million healthcare records were exposed (HIPAA, 2018a).  In the third quarter of 2018, 4.39 million records exposed in 117 breaches (HIPAA, 2018c).

Thus, the protection of the patients’ private information requires the technology to extract, analyze, and correlated potentially sensitive dataset (HIPAA, 2018b).  The implementation of BDA requires security measures and safeguards to protect the privacy of the patients in the healthcare industry (HIPAA, 2018b).  Sensitive data should be encrypted to prevent the exposure of data in the event of theft (Abernathy & McMillan, 2016).  The security requirements involve security at the VPC cloud deployment model as well as at the local hospitals in each State (Regola & Chawla, 2013).  The security at the VPC cloud deployment model should involve the implementation of security groups and network access control lists to allow access to the right individuals to the right applications and patients’ records.  Security group in VPC acts as the first line of defense firewall for the associated instances of the VPC (McKelvey, Curran, Gordon, Devlin, & Johnston, 2015).  The network access control lists act as the second layer of defense firewall for the associated subnets, controlling the inbound and the outbound traffic at the subnet level (McKelvey et al., 2015). 

The security at the local hospitals level in each State is mandatory to protect patients’ records and comply with HIPAA regulations (Regola & Chawla, 2013).  The medical equipment must be secured with authentication and authorization techniques so that only the medical staff, nurses and clinicians have access to the medical devices based on their role.  The general access should be prohibited as every member of the hospital has a different role with different responses.  The encryption should be used to hide the meaning or intent of communication from unintended users (Stewart, Chapple, & Gibson, 2015).   The encryption is an essential element in security control especially for the data in transit (Stewart et al., 2015).  The hospital in all four State should implement the encryption security control using the same type of the encryption across the hospitals such as PKI, cryptographic application, and cryptography and symmetric key algorithm (Stewart et al., 2015).

The system requirements should also include the identity management systems that can correspond with the hospitals in each state. The identity management system provides authentication and authorization techniques allowing only those who should have access to the patients’ medical records.  The proposal requires the implementation of various encryption techniques such as secure socket layer (SSL), Transport Layer Security (TLS), and Internet Protocol Security (IPSec) to protect information transferred in public network (Zhang & Liu, 2010). 

Hadoop Implementation for Data Stream Processing Requirement

While the velocity of BD leads to the speed of generating large volume of data and requires speed in data processing (Hu et al., 2014), the variety of the data requires specific technology capabilities to handle various types of dataset such as structured, semi-structured, and unstructured data (Bansal, Deshpande, Ghare, Dhikale, & Bodkhe, 2014; Hu et al., 2014).  Hadoop ecosystem is found to be the most appropriate system that is required to implement BDA (Bansal et al., 2014; Dhotre, Shimpi, Suryawanshi, & Sanghati, 2015).  The implementation requirements include various technologies and various tools.  This section covers various components that are required when implementing Hadoop technology in the four States for healthcare BDA system.

Hadoop has three significant limitations, which must be addressed in this design.  The first limitation is the lack of technical support and document for open source Hadoop (Guo, 2013).   Thus, this design requires the Enterprise Edition of Hadoop to get around this limitation using Cloudera, Hortonworks, and MapR (Guo, 2013). The final decision for which product will be determined by the cost analysis team.  The second limitation is that Hadoop is not optimal for real-time data processing (Guo, 2013). The solution for this limitation will require the integration of real-time streaming program as Spark or Storm or Kafka (Guo, 2013; Palanisamy & Thirunavukarasu, 2017). This requirement of integrating Spark is discussed below in a separate requirement for this design (Guo, 2013). The third limitation is that Hadoop is not a good fit for large graph dataset (Guo, 2013). The solution for this limitation requires the integration of GraphLab which is also discussed below in a separate requirement for this design.

Conclusion

Information technology (IT) play a significant role in various industries including the healthcare sector.  This project discussed the IT role in businesses, the requirement to be aligned with the strategic goal and organizational system of the business.  If IT systems are not included during the planning of the business strategy and organizational strategy, the IT integration into the business at a later stage is very likely to set for failure.  IT offers various advantages to business including the competitive advantages in the marketplace.  Healthcare industry is no exception to integrate IT systems.  Healthcare sector has been suffering from various challenges including the high cost of services and inefficient service to patients.  The case study showed the need for IT systems requirements that can place the industry into competitive advantages offering better care to patients with low cost.  Various IT integrations have been used lately in the healthcare industry including Big Data Analytics, Hadoop technology, security systems, and cloud computing. Kaiser Permanente, for instance, applied Big Data Analytics using HealthConnet to provide care to patients with lower cost and better care, which are aligned with the strategic goal of its business.

References

Abdul, A. M., Jena, S., Prasad, S. D., & Balraju, M. (2014). Trusted Environment In Virtual Cloud. International Journal of Advanced Research in Computer Science, 5(4).

Abernathy, R., & McMillan, T. (2016). CISSP Cert Guide: Pearson IT Certification.

Archenaa, J., & Anita, E. M. (2015). A survey of big data analytics in healthcare and government. Procedia Computer Science, 50, 408-413.

Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A. S., & Buyya, R. (2015). Big Data Computing and Clouds: Trends and Future Directions. Journal of Parallel and Distributed Computing, 79, 3-15. doi:10.1016/j.jpdc.2014.08.003

Bansal, A., Deshpande, A., Ghare, P., Dhikale, S., & Bodkhe, B. (2014). Healthcare data analysis using dynamic slot allocation in Hadoop. International Journal of Recent Technology and Engineering, 3(5), 15-18.

Bhatt, G. D., & Grover, V. (2005). Types of information technology capabilities and their role in competitive advantage: An empirical study. Journal of management information systems, 22(2), 253-277.

Botta, A., de Donato, W., Persico, V., & Pescapé, A. (2016). Integration of Cloud Computing and Internet Of Things: a Survey. Future Generation computer systems, 56, 684-700.

Brynjolfsson, E., & Hitt, L. M. (2000). Beyond computation: Information technology, organizational transformation and business performance. Journal of Economic perspectives, 14(4), 23-48.

Cloud Security Alliance. (2013). The Notorious Nine: Cloud Computing Top Threats in 2013. Cloud Security Alliance: Top Threats Working Group. 

Cloud Security Alliance. (2016). The Treacherous 12: Cloud Computing Top Threats in 2016. Cloud Security Alliance: Top Threats Working Group. 

Cloud Security Alliance. (2017). The Treacherous 12 Top Threats to Cloud Computing. Cloud Security Alliance: Top Threats Working Group. 

Dewett, T., & Jones, G. R. (2001). The role of information technology in the organization: a review, model, and assessment. Journal of Management, 27(3), 313-346.

Dhotre, P., Shimpi, S., Suryawanshi, P., & Sanghati, M. (2015). Health Care Analysis Using Hadoop. Internationaljournalofscientific&tech nologyresearch, 4(12), 279r281.

Fox, M., & Vaidyanathan, G. (2016). Impacts of Healthcare Big Data:  A Framwork With Legal and Ethical Insights. Issues in Information Systems, 17(3).

Ghani, K. R., Zheng, K., Wei, J. T., & Friedman, C. P. (2014). Harnessing big data for health care and research: are urologists ready? European urology, 66(6), 975-977.

Groves, P., Kayyali, B., Knott, D., & Kuiken, S. V. (2016). The ‘Big Data’ Revolution in Healthcare: Accelerating Value and Innovation.

Guo, S. (2013). Hadoop operations and cluster management cookbook: Packt Publishing Ltd.

Gupta, R., Gupta, H., & Mohania, M. (2012). Cloud Computing and Big Data Analytics: What is New From Databases Perspective? Paper presented at the International Conference on Big Data Analytics, Springer-Verlag Berlin Heidelberg.

Henderson, J. C., & Venkatraman, H. (1999). Strategic alignment: Leveraging information technology for transforming organizations. IBM systems journal, 38(2.3), 472-484.

HIPAA. (2018a). At Least 3.14 Million Healthcare Records Were Exposed in Q2, 2018. Retrieved 11/22/2018 from https://www.hipaajournal.com/q2-2018-healthcare-data-breach-report/. 

HIPAA. (2018b). How to Defend Against Insider Threats in Healthcare. Retrieved 8/22/2018 from https://www.hipaajournal.com/category/healthcare-cybersecurity/. 

HIPAA. (2018c). Q3 Healthcare Data Breach Report: 4.39 Million Records Exposed in 117 Breaches. Retrieved 11/22/2018 from https://www.hipaajournal.com/q3-healthcare-data-breach-report-4-39-million-records-exposed-in-117-breaches/. 

HIPAA. (2018d). Report: Healthcare Data Breaches in Q1, 2018. Retrieved 5/15/2018 from https://www.hipaajournal.com/report-healthcare-data-breaches-in-q1-2018/. 

Hu, H., Wen, Y., Chua, T., & Li, X. (2014). Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. Practical Innovation, Open Solution, 2, 652-687. doi:10.1109/ACCESS.2014.2332453

InformationBuilders. (2018). Data In Motion – Big Data Analytics in Healthcare. Retrieved from http://docs.media.bitpipe.com/io_10x/io_109369/item_674791/datainmotionbigdataanalytics.pdf, White Paper.

Ji, Z., Ganchev, I., O’Droma, M., Zhang, X., & Zhang, X. (2014). A cloud-based X73 ubiquitous mobile healthcare system: design and implementation. The Scientific World Journal, 2014.

Kritikos, K., Kirkham, T., Kryza, B., & Massonet, P. (2017). Towards a Security-Enhanced PaaS Platform for Multi-Cloud Applications. Future Generation computer systems, 67, 206-226. doi:10.1016/j.future.2016.10.008

Liang, Y., & Kelemen, A. (2016). Big Data Science and its Applications in Health and Medical Research: Challenges and Opportunities. Austin Journal of Biometrics & Biostatistics, 7(3).

Luo, J., Wu, M., Gopukumar, D., & Zhao, Y. (2016). Big data application in biomedical research and health care: a literature review. Biomedical informatics insights, 8, BII. S31559.

Malik, L., & Sangwan, S. (2015). MapReduce Framework Implementation on the Prescriptive Analytics of Health Industry. International Journal of Computer Science and Mobile Computing, ISSN, 675-688.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute.

McKelvey, N., Curran, K., Gordon, B., Devlin, E., & Johnston, K. (2015). Cloud Computing and Security in the Future Guide to Security Assurance for Cloud Computing (pp. 95-108): Springer.

Mehmood, A., Natgunanathan, I., Xiang, Y., Hua, G., & Guo, S. (2016). Protection of Big Data Privacy. Institute of Electrical and Electronic Engineers, 4, 1821-1834. doi:10.1109/ACCESS.2016.2558446

Palanisamy, V., & Thirunavukarasu, R. (2017). Implications of Big Data Analytics in developing Healthcare Frameworks–A review. Journal of King Saud University-Computer and Information Sciences.

Pearlson, K., & Saunders, C. (2001). Managing and Using Information Systems: A Strategic Approach. 2001: USA: John Wiley & Sons.

Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2(1), 1.

Regola, N., & Chawla, N. (2013). Storing and Using Health Data in a Virtual Private Cloud. Journal of medical Internet research, 15(3), 1-12. doi:10.2196/jmir.2076

Ryssel, R., Ritter, T., & Georg Gemünden, H. (2004). The impact of information technology deployment on trust, commitment and value creation in business relationships. Journal of business & industrial marketing, 19(3), 197-207.

Stewart, J., Chapple, M., & Gibson, D. (2015). ISC Official Study Guide.  CISSP Security Professional Official Study Guide (7th ed.): Wiley.

Sun, J., & Reddy, C. (2013). Big Data Analytics for Healthcare. Retrieved from https://www.siam.org/meetings/sdm13/sun.pdf.

Thompson, E. C. (2017). Building a HIPAA-Compliant Cybersecurity Program, Using NIST 800-30 and CSF to Secure Protected Health Information.

Van-Dai, T., Chuan-Ming, L., & Nkabinde, G. W. (2016, 5-7 July 2016). Big data stream computing in healthcare real-time analytics. Paper presented at the 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA).

Wang, Y., Kung, L. A., & Byrd, T. A. (2018). Big Data Analytics: Understanding its Capabilities and Potential Benefits for Healthcare Organizations. Technological Forecasting and Social Change, 126, 3-13. doi:10.1016/j.techfore.2015.12.019

Wicklund, E. (2014). ‘Silo’ one of healthcare’s biggest flaws. Retrieved from http://www.healthcareitnews.com/news/silo-one-healthcares-biggest-flaws.

Zhang, R., & Liu, L. (2010). Security models and requirements for healthcare application clouds. Paper presented at the Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference on.

Zia, U. A., & Khan, N. (2017). An Analysis of Big Data Approaches in Healthcare Sector. International Journal of Technical Research & Science, 2(4), 254-264.

Zimmer, T. (2018). What Are the Advantages of Information Technology in Business?

Critical Information Technology Solutions Used to Gain Competitive Advantages

Dr. O. Aly
Computer Science

Abstract

The purpose of this project is to discuss critical information technology solutions used to gain competitive advantages.  The discussion begins with Big Data and Big Data Analytics addressing essential topics such as the Hadoop ecosystem, NoSQL databases, Spark integration for real-time data processing, and Big Data Visualization. Cloud computing is an emerging technology to solve Big Data challenges such as storage for the large volume of the data, and the high-speed data processing to extract value from data.  Enterprise Resource Planning (ERP) is a system that can aid organizations to gain competitive advantages if implemented right.  The project discusses various success factor for the ERP system.  Big Data plays a significant role in ERP, which is also discussed in this project.  The last technology addressed in this project is the Customer Relationship Management (CRM), its building blocks and integration.  The project addresses the challenges and costs associated with CRM.  The best practice of CRM is addressed which can assist in the successful implementation of CRM.  In summary, enterprises should evaluate various information technology systems that are developed to aid them to gain competitive advantages. 

Keywords: Big Data Analytics; Cloud Computing; ERP; CRM.

Introduction

            Enterprises should evaluate various information technologies to gain competitive advantages in the market.  Big Data and Big Data Analytics are one of the significant topics in information technology and computer science.  Cloud computing is another critical topic in the same domains, as cloud computing emerged to solve the challenge of Big Data.  Thus, this project begins with these top information technologies.  The discussion covers various major topics in Big Data such as the Hadoop ecosystem, Spark for real-time processing.  The discussion of the cloud computing covers the various service models and deployment models which cloud computing offers.             

The most common business areas that require information technology support include Enterprise Resource Planning (ERP), Customer Relationship Management (CRM), Product Life Cycle Management (PLM), Supply Chain Management (SCM), and Supplier Relationship Management (SRM) (DuttaRoy, 2016). Thus, this project discusses ERP and CRM as additional critical information technology systems that aid Enterprises gain competitive advantages. 

Big Data and Big Data Analytics

Big Data is now the buzzword in the field of computer science and information technology.  Big Data attracted the attention of various sectors, researchers, academia, government and even the media (Géczy, 2014; Kaisler, Armour, Espinosa, & Money, 2013).   In the 2011 report of the International Data Corporation (IDC), it is reporting that the amount of the information which will be created and replicated will exceed 1.8 zettabytes which are 1.8 trillion gigabytes in 2011. This amount of information is growing by a factor of 9 in just five years (Gantz & Reinsel, 2011). 

BD and BDA are terms that have been used interchangeably and described as the next frontier for innovation, competitions, and productivity (Maltby, 2011; Manyika et al., 2011).  BD has a multi-V model with unique characteristics, such as volume referring to the large dataset, velocity refers to the speed of the computation as well as data generation, and variety referring to the various data types such as semi-structured and unstructured (Assunção, Calheiros, Bianchi, Netto, & Buyya, 2015; Hu, Wen, Chua, & Li, 2014).  BD is described as the next frontier for competition, innovation, and productivity.  Various industries have taken this opportunity and applied BD and BDA in their business models (Manyika et al., 2011).  There are many technologies such as Cloud Computing, Hadoop Map/Reduce Hive, and others have emerged to deal with the phenomena of the Big Data.  Data without analysis has no value to organizations. 

Hadoop Ecosystem

While the velocity of BD leads to the speed of generating large volume of data and requires speed in data processing (Hu et al., 2014), the variety of the data requires specific technology capabilities to handle various types of dataset such as structured, semi-structured, and unstructured data (Bansal, Deshpande, Ghare, Dhikale, & Bodkhe, 2014; Hu et al., 2014).  Hadoop ecosystem is found to be the most appropriate system that is required to implement BDA (Bansal et al., 2014; Dhotre, Shimpi, Suryawanshi, & Sanghati, 2015).  Hadoop technologies have been in the front-runner for Big Data application (Bansal et al., 2014; Chrimes, Zamani, Moa, & Kuo, 2018).  Hadoop ecosystem will be part of the implementation requirement as it is proven to serve well with intensive computation using large datasets (Raghupathi & Raghupathi, 2014; Wang, Kung, & Byrd, 2018).   The Hadoop version that is required is version 2.x to include YARN for resource management  (Karanth, 2014).  Hadoop 2.x also include HDFS snapshots to provide a read-only image of the entire or a particular subset of a filesystem to protect against user errors, backup, and disaster recovery (Karanth, 2014). The Hadoop platform can be implemented to gain more insight into various areas (Raghupathi & Raghupathi, 2014; Wang et al., 2018). Hadoop ecosystem involves Hadoop Distributed File System, MapReduce, and NoSQL database such as HBase, and Hive to handle a large volume of dataset using various algorithms and machine learning to extract values from the medical records that are structured, semi-structured, and unstructured (Raghupathi & Raghupathi, 2014; Wang et al., 2018).  Other components to support Hadoop ecosystem include Oozie for workflow, Pig for scripting, and Mahout for machine learning which is part of the artificial intelligence (AI) (Ankam, 2016; Karanth, 2014).  Hadoop ecosystem includes other tools such as Flume for log collector, Sqoop for data exchange, and Zookeeper for coordination (Ankam, 2016; Karanth, 2014).  HCatalog is a required component to manage the metadata in Hadoop (Ankam, 2016; Karanth, 2014).   Figure 1 shows the Hadoop ecosystem before integrating Spark for real-time analytics.


Figure 1.  Hadoop Architecture Overview (Alguliyev & Imamverdiyev, 2014).

NoSQL Databases

In the age of BD and BDA, the traditional data store is found inadequate to handle not only the large volume of the dataset but also the various types of the data format such as unstructured and semi-structured (Hu et al., 2014).   Thus, Not Only SQL (NoSQL) database is emerged to meet the requirement of the BDA.  These NoSQL data stores are used for modern, and scalable databases (Sahafizadeh & Nematbakhsh, 2015).  The scalability feature of the NoSQL data stores enables the systems to increase the throughput when the demand increases during the processing of the data (Sahafizadeh & Nematbakhsh, 2015).  The platform can incorporate two scalability types to support the large volume of the datasets; the horizontal and vertical scalability.  The horizontal scaling allows the distribution of the workload across many servers and nodes to increase the throughput, while the vertical scaling requires more processors, more memories and faster hardware to be installed on a single server (Sahafizadeh & Nematbakhsh, 2015). 

NoSQL data stores have various types such as MongoDB, CouchDB, Redis, Voldemort, Cassandra, Big Table, Riak, HBase, Hypertable, ZooKeeper, Vertica, Neo4j, db4o, and DynamoDB.  These data stores are categorized into four types: document-oriented, column-oriented or column-family stores, graph database, and key-value (EMC, 2015; Hashem et al., 2015). The document-oriented data store can store and retrieve collections of data and documents using complex data forms in various formats such as XML and JSON as well as PDF and MS word (EMC, 2015; Hashem et al., 2015).  MongoDB and CouchDB are examples of document-oriented data stores (EMC, 2015; Hashem et al., 2015).  The column-oriented data store can store the content in columns aside from rows with the attributes of the columns stored contiguously (Hashem et al., 2015).  This type of datastore can store and render blog entries, tags, and feedback (Hashem et al., 2015).  Cassandra, DynamoDB, and HBase are examples of column-oriented data stores (EMC, 2015; Hashem et al., 2015).  The key-value can store and scale large volumes of data and contains value and a key to access the value (EMC, 2015; Hashem et al., 2015).  The value can be complicated, but this type of data stores can be useful in storing the user’s login ID as the key referencing the value of patients.  Redis and Riak are examples of the key-value NoSQL data store (Alexandru, Alexandru, Coardos, & Tudora, 2016).  Each of these NoSQL data stores has its limitations and advantages.  The graph NoSQL database can store and represent data using graph models with nodes, edges, and properties related to one another through relations which will be useful for unstructured medical data such as images, and lab results. Neo4j is an example of this type of graph NoSQL database (Hashem et al., 2015).  Figure 2 summarizes these NoSQL data stores, data types for storage, and examples.

Figure 2.  Big Data Analytics NoSQL Data Store Types.

Spark Integration for Real-Time Data Processing

While the architecture of Hadoop ecosystem has been designed in various scenarios for data storage, data management statistical analysis, and statistical association between various data sources distributed computing and batch processing, businesses requires real-time data processing to gain competitive advantages.  However, the real-time data processes cannot be met by Hadoop alone (Basu, 2014).  Real-time analytics will tremendous value to the healthcare proposed system.  Thus, Apache Spark is another component which is required for real-time data processing.  Spark allows in-memory processing for fast response time, bypassing MapReduce operations (Basu, 2014).  With Spark integration with Hadoop, stream processing, machine learning, interactive analytics, and data integration will be possible (Scott, 2015).  Spark will run on top of Hadoop to benefit from YARN and the underlying storage of HDFS, HBase and other Hadoop ecosystem building blocks (Scott, 2015).  Figure 3 shows the core engines of the Spark.


Figure 3. Spark Core Engines (Scott, 2015).

Big Data Visualization

Visualization is one of the most powerful presentations of the data (Jayasingh, Patra, & Mahesh, 2016).  It helps in viewing the data in a more meaningful way in the form of graphs, images, pie charts that can be understood easily.  It helps in synthesizing a large volume of data set such as healthcare data to get at the core of such raw big data and convey the key points from the data for insight (Meyer, M., 2018).  Some of the commercial visualization tools include Tableau, Spotfire, QlikView, and Adobe Illustrator.  However, the most commonly used visualization tools in healthcare include Tableau, PowerBI, and QlikView.

Cloud Computing Technology

Numerous studies discussed and addressed the definition of cloud computing, as it was not well defined (Foster, Zhao, Raicu, & Lu, 2008).  As an effort to identify precisely the term cloud computing IT practitioners, the academics and research community came up with various definitions.  (Vaquero, Rodero-Merino, Caceres, & Lindner, 2008) suggested twenty-two definitions to cloud computing from different research studies.  The underlying concepts of cloud computing rely heavily on providing computing power, storage services, software services, and platform services on demand to customers over the internet (Lewis, 2010).  The access to cloud computing services can scale up or down as needed, and the consumers use the pay-per-use or pay-as-you-go model (Armbrust et al., 2009; Lewis, 2010).

The National Institute of Standards and Technology (NIST) proposed an official definition of cloud computing.  Cloud computing enables ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources such as network, servers, storage, applications, and services. Organizations can quickly provision and release these resources with minimal effort of management or interaction from a service provider (Mell & Grance, 2011).

Cloud Computing Essential Characteristics

The essential characteristics of cloud computing technology identified by NIST include on-demand self-service, broad network access, resource pooling, rapid elasticity, and measured service (Mell & Grance, 2011).  The on-demand self-service feature provides cloud consumers the computing capabilities such as server time and network storage as needed automatically eliminating the need for any human interaction with a service provider.  The broad network access feature provides capabilities to cloud consumers over the network and the use of various devices such as mobile phones, and tablets from anywhere enabling the heterogeneous client platforms.  The resource pooling feature provides a multi-tenant model that serve multiple consumers sharing the pool of resources.  This feature provides location independence, where the consumers do not know the exact location of the provided resources.  The consumer may be able to specify the location at a higher level of abstraction such as country, state, or datacenter (Mell & Grance, 2011).  The rapid elasticity feature provides capabilities to scale horizontally and vertically to meet the demand.  The measured services feature enables the measurement of the consumption of resources such as processing, storage, and bandwidth. The resource utilization can be monitored, controlled, and reported, providing transparency for both the provider and consumer of the utilized services (Mell & Grance, 2011).

Cloud Computing Three Essential Service Models

Cloud computing offers three essential service models as Infrastructure-as-a-Service (IaaS), Platform-as-a-Service (PaaS), and Software-as-a-Service (SaaS) (Mell & Grance, 2011).  The IaaS layer provides the capability to the consumers to provision storage, processing, networks, and other fundamental computing resources.  Using IaaS, the consumer can deploy and run arbitrary software, which can include operating systems and application.  When using IaaS, the users do not manage or control the underlying infrastructure of the cloud.  The consumers have control over the storage, the operating systems, and the deployed application and limited control of some networking components such as host firewall.  The PaaS allows the cloud computing consumers to deploy applications that are created using programming languages, libraries, services, and tools supported by the providers.  Using PaaS, the cloud computing consumers do not manage or control the underlying infrastructure of the cloud including network, servers, operating systems, or storage.  The consumers have control over the deployed applications and possibly configuration settings for the application-hosting environment.  The SaaS allows cloud computing consumers to use the provider’s applications running on the infrastructure of the cloud.  The SaaS service model consumers can access the applications from various client devices through either a thin client interface, such as a web-based email from a web browser, or a program interface.  The SaaS consumers do not control or manage the underlying infrastructure of the cloud such as network, operating systems, storage, or even individual application capabilities, with the possible exception of limited user-specific application configuration settings  (Mell & Grance, 2011).

Cloud Computing Four Essential Deployment Models

Cloud computing offers four essential deployment models known as public cloud, private cloud, community cloud, and hybrid cloud  (Mell & Grance, 2011).  The public cloud reflects the infrastructure of the cloud available to the general public.  It can be managed, owned and operated by organizations, academic entities, government entities, or a combination of them.  This deployment model resides on the premises of the cloud provider.  The private cloud is the cloud infrastructure designed exclusively for a single organization.  This deployment model can be managed, owned and operated by the organization, or a third party or a combination of both.  This model may reside either on-premises or off-premises.  The community cloud is the cloud infrastructure designed exclusively for a specific community of consumers from organizations that have such as security requirement, compliance consideration, and policy. One or more of organizations in the community, a third party or some combination of them can manage, own, operate the community cloud.  The community cloud can reside on-premises or off-premises.  The hybrid cloud is the cloud infrastructure combining two or more cloud infrastructures such as private, public, or community (Mell & Grance, 2011).  Figure 4 presents the full representation of cloud computing technology per NIST including the standard service models, deployment models, and essential characteristics.

Figure 4.  Overview of Cloud Computing based on NIST’s Definitions.

Cloud Computing Role in Big Data and Big Data Analytics

Cloud computing plays a significant role in BDA (Assunção et al., 2015).  The massive computation and storage requirement of BDA brings the critical need for cloud computing emerging technology (Mehmood, Natgunanathan, Xiang, Hua, & Guo, 2016).  Cloud computing offers various benefits such as cost reduction, elasticity, pay per use, availability, reliability, and maintainability (Gupta, Gupta, & Mohania, 2012; Kritikos, Kirkham, Kryza, & Massonet, 2017).  However, although cloud computing offers various benefits, it has security and privacy issues using the standard deployment models of public cloud, private cloud, hybrid cloud, and community cloud.

Enterprise Resource Planning (ERP)

            American Production and Inventory Control Society (2001), as cited in (Madanhire & Mbohwa, 2016) defined ERP as a method for the effective planning and controlling of all resources needed to take, make, ship and account for customer orders in a manufacturing, distribution or service organization.  This functions integration can be achieved through a software package solution offered by vendors to support the seamless integration of all information flowing through the enterprise, such as financial, accounting and human resources.   ERP is a business management software that is designed to integrate data sources and processes of the entire organization into a combined system (Bahssas, AlBar, & Hoque, 2015).

ERP system is a popular solution which is used by the organization to integrate and automate various processes, performance improvements, and cost reduction.  ERP provides business with a real-time view of its core business processes such as production, planning, manufacturing, inventory management and development (Bahssas et al., 2015). The ERP software is a multi-module application that integrates activities across functional departments such as production, planning, purchasing, inventory control, product distribution, and order tracking.  It allows the automation and integration of business process by enabling data and information sharing to reach best practices in managing the process of the business. 

ERP involves various modules such as accounting, finance, supply chain, human resources, customer information and others (Bahssas et al., 2015; Madanhire & Mbohwa, 2016).  ERP production planning module is used to optimize the utilization of manufacturing capacity, parts, components, and material resources.  ERP purchases module is used to streamline procurement of required raw materials, as it automates the process of identifying potential suppliers, negotiating prices, placing orders to suppliers and related billing processes.  ERP inventory control module is used to facilitate the process of maintaining an appropriate level of stocks in the warehouse through identifying inventory requirements, setting targets, providing replenishment techniques and options, monitoring item usage, reconciling inventory balances and reporting inventory status.  ERP sales module is used for order placement, order scheduling, shipping and invoicing. ERP marketing module is used to support lead generation, direct mailing campaign.  ERP financial module is used to gather financial data from various departments and generate reports such as balance sheet, general ledger, trial balance.  ERP human resources (HR) module is used to maintain a complete employee database to include contact information, salary details, attendance and so forth (Madanhire & Mbohwa, 2016).

Innovations in technology trends have forced ERP designers to establish new development.  Thus, new ERP system designs are implemented to satisfy organizations and customers by evolving new ERP business models.  Furthermore, one of the biggest challenges for ERP is to keep speed with the manufacturing sector which has been moving rapidly from product-centric to customer-centric focus (Bahssas et al., 2015).  Most ERP vendors are required to add a variety of functions and modules to their core systems.

Critical Factors for Successful ERP Implementation

            The implementation of ERP systems is costly, and organizations should be careful when implementing it to ensure its success.  Some believe that ERP systems could hurt their business because of the potential problems of ERP (Umble, Haft, & Umble, 2003). Various studies identified success factors for ERP.  (Umble et al., 2003) addressed the most prominent factors for successful implementation of ERP. The first critical success factor is that organizations should have a clear understanding of the strategic goals.  The commitment by top management is another success factor.  Successful ERP implementation requires excellent project management. The existing organizational structure and processes found in most enterprises are not compatible with the structure, tools, and types of information provided by ERP systems.  Thus, organizational change management is required to ensure the successful implementation of ERP.  ERP implementation teams should be composed of highly skilled professionals that are chosen for their skills, past accomplishments, reputation, and flexibility.  Data accuracy is another success factor for ERP implementation.  The education and training are another success factor for the implementation of the ERP system.   (Bahssas et al., 2015) Indicated that reserving 10-15% of the total ERP implementation budget for training will give an organization an 80% chance of successful implementation.  Focused performance measures must be included from the beginning of the implementation because if the system is not associated with compensation, it will not be successful. 

Big Data and Big Data Analytics Role in ERP

Big Data Analytics plays a significant role in ERP applications (Carlton, 2014; ERP Solutions, 2018; Woodie, 2016).  Enterprise data comprises various departments such as HR, finance, CRM and other essential business functions of a business.  This data can be leveraged to make ERP functionality better.  When Big Data tools are brought together with the ERP system, can unfold valuable insights that can businesses make smarter decisions (Carlton, 2014; Cornell University, 2017; Wailgum, 2018). Many ERP systems fail to make use of real-time inventory and supply chains data because these systems lack the intelligence to make predictions about products demands (Carlton, 2014; ERP Solutions, 2018). Big Data tools can predict demand and help determine what company needs to go forward (ERP Solutions, 2018).  Infor co-president Duncan Angove established Dynamic Science Labs (DSL) aiming to use data science techniques to solve a particular class of business problems for its customers. Employees with big data, math, and coding skills were hired in Cambridge, Massachusetts-based organization to develop proof of concept (POC) (Woodie, 2016).  Big Data systems such as Apache’s Hadoop are creating node-level operating transparencies which affect nearly every current ERP module in real-time (Carlton, 2014).  Managers will be able to quickly leverage ERP Big Data capabilities, thereby enhancing information density and speeding up overall decision-making. In brief, Big Data and Big Data Analytics impact business at all levels, and ERP is no exception.

Customer Relationship Management (CRM)

Customer Relationship Management (CRM) systems assist organizations to manage customer interaction and customer data, automate marketing, sales, and customer support, assess business information and managing partner, vendor, and employee relationships.  A quality CRM system can be scalable to serve the needs of small, medium or large business (Financesonline, 2018).  CRM systems can be customized to allow business is taking actionable customer insights using back-end analytics, identify opportunities with predictive analytics, personalize customer support, and streamline operations based on the history of the customers’ interaction with the business.  Organizations must be aware of the CRM system software available to select the most appropriate CRM system that can better serve their needs. 

Various reports identified various CRM systems.  The best CRM systems include Salesforce CRM, Hubspot CRM, Fresh sales, Pipedrive, Insightly, Zoho CRM, Nimble, PipelineDeals, Nutshell CRM, Microsoft Dynamics CRM, SalesforceIQ, Spiro, and ExxpertApps.  Table 1 shows the best CRM systems available in the market.


Table 1.  CRM Systems  (Financesonline, 2018).

Customer satisfaction is the critical element to the success of the business (Bygstad, 2003; Pearlson & Saunders, 2001).  Businesses need to continuously satisfy customers, understand their needs and expectations, provide high-quality products or service at a competitive price to maintain success.  These interactions needed to be tracked by the business and analyzed in an organized way to foster long-lasting customer relationships which get transformed into long-term success.  

CRM can aid business increase sales efficiency, drive the satisfaction of customers, streamline the process of the business and make it more efficient, and identify and resolve bottlenecks at any of the operational processes from marketing, sales to the product development (Ahearne, Rapp, Mariadoss, & Ganesan, 2012; Bygstad, 2003).  The development of customer relationship is not a trivial or straightforward task. When it is done right, it places the business in a competitive edge. However, the implementation of CRM is challenging. 

CRM Challenges and Costs

The implementation of CRM demonstrates the value of customers to the business and placing customer service on top priority (Pearlson & Saunders, 2001).  CRM plays a significant role in collaborating the effort between customer service, marketing, and sales in an organization.  However, the implementation of CRM is challenging especially for small business and startups.  Various reports addressed various challenges when implementing CRM.  The cost is the most significant challenges organizations are confronted with when implementing the CRM solution (Sage Software, 2015).  The development of a clear objective to achieve with the CRM system is another challenge when implementing CRM.  Organizations are confronted with the type of deployment whether it should be on-premise or cloud-based CRM.  Other challenges involve the employees’ training, the right CRM solution provider and the integration plan in advance (Sage Software, 2015). 

The cost of CRM systems varies from one vendor to another based on the features and deployment key such as data importing, analytics, email integrations, mobile accessibility, email marketing, multi-channel support, SaaS platform, on-premise platform, and SaaS and on-premise.  Some vendors offer CRM for small and medium, or small only, while others offer CRM systems for small, medium and large businesses.  In a report by (Business-Software, 2019), the cost is categorized for more expensive to least expensive using the dollar sign as $$$$ for most expensive, $$$ for expensive, $$ for less expensive and $ for least expensive.  Each vendor CRM system has certain features which must be examined by organizations before making the decision to adopt such a system.  Table 2 provides an idea about the cost from the most expensive, expensive, less expensive, to least expensive.


Table 2.  CRM System Costs based on the Report by (Business-Software, 2019).

 

The Building Blocks of CRM Systems and Their Integration

Understanding the buildings blocks of the CRM system can assist in the implementation and integration of CRM systems.  CRM involves four core building blocks (Meyer, Matthias & Kolbe, 2005). The acquirement and continuous update of the knowledge base on the needs of customers, motivations, and behavior over the lifetime of the relationship with customers.  The application of the customers’ knowledge to continuously improve performance through a process of learning from success and failures is the second building block of CRM system.  The integration of marketing, sales, and service activities to achieve a common goal is another building block of the CRM system.  The last building block of the CRM system involves the implementation of appropriate systems to support customer knowledge acquisition, sharing, and the measurement of CRM effectiveness. 

CRM integration is a critical building block for CRM success (Meyer, Matthias, 2005).  The process of integrating CRM involves various organizational and operational functions of the business such as marketing, sales and service activities.  CRM requires detailed business processes which can be categorized into three core elements; CRM delivery process, CRM support process, and CRM analysis process.  The delivery process involves direct contact with customers to cover part of the customer process such as campaign management, sales management, service management, and complaint management. The support process involves direct contact with the customer that are not designed to fulfill supporting functions within the CRM context such as market research and loyalty management.  The analysis process consolidates and analyzes the knowledge of customers collected in other CRM processes.  The result of this analysis process is passed to the delivery process, support process and to the service innovation and service production processes to enhance their effectiveness such as customer scoring and lead management, customer profiling and segmentation, feedback and knowledge management. 

Best Practices in Implementing These CRM Systems

Various studies and reports addressed best practices in the implementation and integration of CRM systems into the business (Salesforce, 2018; Schiff, 2018).  Organizations must choose a CRM that fits their needs.  Not every CRM is created equally, and if organizations choose a CRM system without properly researching its features, capabilities, and weaknesses, organizations could end up committed to a system that is not appropriate for the business, and as a result, could lose money.  Organizations should decide whether CRM should be cloud-based or on-premise base CRM (Salesforce, 2018; Schiff, 2018; Wailgum, 2008).  Organizations should decide whether CRM should be a service contract or one that costs more upfront to install.  Business should also decide whether it needs in-depth, highly customizable features, or basic functionality will be sufficient to serve the needs of the business.  Organizations should analyze the options and decide on the CRM system that is most appropriate for the business which can serve the needs to build strong customer relationship and gain a competitive edge in the market.

Well-trained personnel and workforce will help organizations achieve its strategic CRM goal. If organizations do not invest in the training of the workforce on how to utilize the CRM system, CRM tools will become useless.  The CRM systems become effective as organizations allow them to be. When the workforce is not using the CRM system to its full potentials, or if the workforce is misusing the CRM systems, CRM will not perform its functions properly and will not serve the needs of the business as expected (Salesforce, 2018; Schiff, 2018). 

Automation is another critical factor for best practice when implementing CRM systems.  Tasks that are associated with data entry can be automated so that CRM systems will be up to date.  The automation will increase the efficiency of the CRM systems as well as the business overall (Salesforce, 2018; Schiff, 2018).  One of the significant benefits of CRM is its potential in improving and enhancing the cooperative efforts across departments of the business.  When the same information is accessible across various departments, CRM systems eliminate confusions that can be caused by using different terms and different information.  Data without analysis is not meaningless.  Organizations should consider mining the data to get the value that can aid in making sound business decisions.  CRM systems are designed to capture and organize massive amounts of data. If organizations do not take advantages of this massive amount of data to turn it into actionable data, the implementation of CRM will be so limited. The best CRM systems are those that come with built-in analytics features which use advanced programming to mine all captured data and use that information to produce valuable conclusions which can be used for future business decisions.  When organizations take advantages of the CRM built-in analytical feature and analyze the data that CRM system procures, the valuable information can provide insight for business decisions (Salesforce, 2018).  The last element for best practice of the implementation of CRM is for organizations to keep it simple. The best CRM system is the one that will best fit the needs and requirements of the business. The simplicity is a crucial element when implementing CRM.  Organizations should implement CRM that is not complex while it is useful and provides everything the business needs.  Organizations should also consider making changes to the CRM policies where necessary.  The effectiveness of day-to-day operations will be the best indicator of whether the CRM performs as expected, and if it is not, some changes must be made until it performs as expected (Salesforce, 2018; Wailgum, 2008).

Conclusion

This project discussed critical information technology solutions used to gain competitive advantages.  The discussion began with Big Data and Big Data Analytics addressing essential topics such as the Hadoop ecosystem, NoSQL databases, Spark integration for real-time data processing, and Big Data Visualization. Cloud computing is an emerging technology to solve Big Data challenges such as storage for the large volume of the data, and the high-speed data processing to extract value from data.  Enterprise Resource Planning (ERP) is a system that can aid organizations to gain competitive advantages if implemented right.  The project discussed various success factor for the ERP system.  Big Data plays a significant role in ERP, which is also discussed in this project.  The last technology addressed in this project is the Customer Relationship Management (CRM), its building blocks and integration.  The project addressed the challenges and costs associated with CRM.  The best practice of CRM is addressed which can assist in the successful implementation of CRM.  In summary, enterprises should evaluate various information technology systems that are developed to aid them to gain competitive advantages. 

References

Ahearne, M., Rapp, A., Mariadoss, B. J., & Ganesan, S. (2012). Challenges of CRM implementation in business-to-business markets: A contingency perspective. Journal of Personal Selling & Sales Management, 32(1), 117-129.

Alexandru, A., Alexandru, C., Coardos, D., & Tudora, E. (2016). Healthcare, Big Data and Cloud Computing. management, 1, 2.

Alguliyev, R., & Imamverdiyev, Y. (2014). Big data: big promises for information security. Paper presented at the Application of Information and Communication Technologies (AICT), 2014 IEEE 8th International Conference on.

Ankam, V. (2016). Big Data Analytics: Packt Publishing Ltd.

Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R. H., Konwinski, A., . . . Stoica, I. (2009). Above The Clouds: A Berkeley View of Cloud Computing. Electrical Engineering and Computer Sciences University of California at Berkeley.

Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A. S., & Buyya, R. (2015). Big Data Computing and Clouds: Trends and Future Directions. Journal of Parallel and Distributed Computing, 79, 3-15. doi:10.1016/j.jpdc.2014.08.003

Bahssas, D. M., AlBar, A. M., & Hoque, M. R. (2015). Enterprise resource planning (ERP) systems: design, trends and deployment. The International Technology Management Review, 5(2), 72-81.

Bansal, A., Deshpande, A., Ghare, P., Dhikale, S., & Bodkhe, B. (2014). Healthcare data analysis using dynamic slot allocation in Hadoop. International Journal of Recent Technology and Engineering, 3(5), 15-18.

Basu, A. (2014). Real-Time Healthcare Analytics on Apache Hadoop* using Spark* and Shark. Retrieved from https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/big-data-real-time-healthcare-analytics-whitepaper.pdf.

Business-Software. (2019). Top 40 CRM Software Report.  

Bygstad, B. (2003). The implementation puzzle of CRM systems in knowledge based organizations. Information Resources Management Journal (IRMJ), 16(4), 33-45.

Carlton, R. (2014). 5 Ways Big Data is Changing ERP Software. Retrieved from https://www.erpfocus.com/five-ways-big-data-is-changing-erp-software-2733.html.

Chrimes, D., Zamani, H., Moa, B., & Kuo, A. (2018). Simulations of Hadoop/MapReduce-Based Platform to Support its Usability of Big Data Analytics in Healthcare.

Cornell University. (2017). Enterprise Information Systems. Retrieved from https://it.cornell.edu/strategic-plan/enterprise-information-systems. 

Dhotre, P., Shimpi, S., Suryawanshi, P., & Sanghati, M. (2015). Health Care Analysis Using Hadoop. Internationaljournalofscientific&tech nologyresearch, 4(12), 279r281.

DuttaRoy, S. (2016). SAP Business Analytics: A Best Practices Guide for Implementing Business Analytics Using SAP: Springer.

EMC. (2015). Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. (1st ed.): Wiley.

ERP Solutions. (2018). The Role of Big Data Analytics in ERP Applications. Retrieved from https://erpsolutions.oodles.io/big-data-analytics-in-erp/. 

Financesonline. (2018). 15 Best CRM Systems for Your Business. Retrieved from https://financesonline.com/15-best-crm-software-systems-business/. 

Foster, I., Zhao, Y., Raicu, I., & Lu, S. (2008). Cloud Computing and Grid Computing 360-Degree Compared. Paper presented at the 2008 Grid Computing Environments Workshop.

Gantz, J., & Reinsel, D. (2011). Extracting Value From Chaos. International Data Corporation, 1142, 1-12.

Géczy, P. (2014). Big data characteristics. The Macrotheme Review, 3(6), 94-104.

Gupta, R., Gupta, H., & Mohania, M. (2012). Cloud Computing and Big Data Analytics: What is New From Databases Perspective? Paper presented at the International Conference on Big Data Analytics, Springer-Verlag Berlin Heidelberg.

Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The Rise of “Big Data” on Cloud Computing: Review and Open Research Issues. Information Systems, 47, 98-115. doi:10.1016/j.is.2014.07.006

Hu, H., Wen, Y., Chua, T., & Li, X. (2014). Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. Practical Innovation, Open Solution, 2, 652-687. doi:10.1109/ACCESS.2014.2332453

Jayasingh, B. B., Patra, M. R., & Mahesh, D. B. (2016, 14-17 Dec. 2016). Security issues and challenges of big data analytics and visualization. Paper presented at the 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I).

Kaisler, S., Armour, F., Espinosa, J. A., & Money, W. (2013). Big Data: Issues and Challenges Moving Forward. Paper presented at the Hawaii International Conference on System Sciences

Karanth, S. (2014). Mastering Hadoop: Packt Publishing Ltd.

Kritikos, K., Kirkham, T., Kryza, B., & Massonet, P. (2017). Towards a Security-Enhanced PaaS Platform for Multi-Cloud Applications. Future Generation computer systems, 67, 206-226. doi:10.1016/j.future.2016.10.008

Lewis, G. (2010). Basics About Cloud Computing. Software Engineering Institute Carnegie Mellon University, Pittsburgh.

Madanhire, I., & Mbohwa, C. (2016). Enterprise resource planning (ERP) in improving operational efficiency: Case study. Procedia Cirp, 40, 225-229.

Maltby, D. (2011). Big Data Analytics. Paper presented at the Annual Meeting of the Association for Information Science and Technology.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute.

Mehmood, A., Natgunanathan, I., Xiang, Y., Hua, G., & Guo, S. (2016). Protection of Big Data Privacy. Institute of Electrical and Electronic Engineers, 4, 1821-1834. doi:10.1109/ACCESS.2016.2558446

Mell, P., & Grance, T. (2011). The NIST Definition of Cloud Computing. National Institute of Standards and Technology (NIST), 800-145, 1-7.

Meyer, M. (2005). Multidisciplinarity of CRM Integration and its Implications. Paper presented at the System Sciences, 2005. HICSS’05. Proceedings of the 38th Annual Hawaii International Conference on.

Meyer, M. (2018). The Rise of Healthcare Data Visualization.

Meyer, M., & Kolbe, L. M. (2005). Integration of customer relationship management: status quo and implications for research and practice. Journal of strategic marketing, 13(3), 175-198.

Pearlson, K., & Saunders, C. (2001). Managing and Using Information Systems: A Strategic Approach. 2001: USA: John Wiley & Sons.

Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2(1), 1.

Sage Software. (2015). Top Challenges in CRM Implementation.  

Sahafizadeh, E., & Nematbakhsh, M. A. (2015). A Survey on Security Issues in Big Data and NoSQL. Int’l J. Advances in Computer Science, 4(4), 2322-5157.

Salesforce. (2018). 7 CRM Best Practices to Get the Most out of your CRM. Retrieved from https://www.salesforce.com/crm/best-practices/. 

Schiff, J. L. (2018). 8 CRM implementation best practices.

Scott, J. A. (2015). Getting Started with Spark: MapR Technologies, Inc.

Umble, E. J., Haft, R. R., & Umble, M. M. (2003). Enterprise resource planning: Implementation procedures and critical success factors. European Journal of Operational Research, 146(2), 241-257.

Vaquero, L. M., Rodero-Merino, L., Caceres, J., & Lindner, M. (2008). A Break in the Clouds: Towards a Cloud Definition. Association for Computing Machinery: Computer Communication Review, 39(1), 50-55.

Wailgum, T. (2008). Five Best Practices for Implementing SaaS CRM. Retrieved from https://www.cio.com/article/2435928/customer-relationship-management/five-best-practices-for-implementing-saas-crm.html.

Wailgum, T. (2018). What is CRM? Software for Managing Customer Data. Retrieved from https://www.cio.com/article/2439505/customer-relationship-management/customer-relationship-management-crm-definition-and-solutions.html.

Wang, Y., Kung, L. A., & Byrd, T. A. (2018). Big Data Analytics: Understanding its Capabilities and Potential Benefits for Healthcare Organizations. Technological Forecasting and Social Change, 126, 3-13. doi:10.1016/j.techfore.2015.12.019

Woodie, A. (2016). Making ERP Better with Big Data. Retrieved from https://www.datanami.com/2016/07/08/making-erp-better-big-data/.

Two Good Quality Research Papers On Customer Relationship Management (CRM)

Dr. O. Aly
Computer Science

The purpose of this discussion is to address two good-quality research papers on customer relationship management (CRM).  The chosen articles for this discussion are  (Ngai, Xiu, & Chau, 2009; Rygielski, Wang, & Yen, 2002).  The reason for selecting these two papers is that they discuss CRM in the context of business intelligence and data mining.

The first journal (Rygielski et al., 2002) is about data mining techniques for CRM.  The authors discussed various aspects of the CRM as well as data mining.  They also discussed the importance of understanding the customers’ lifecycle and the data mining techniques that can be used to extract value from the customers’ data.  Various data mining techniques are discussed and their application with CRM.

The second journal (Ngai et al., 2009) is about the application of data mining techniques in CRM, and a literature review and classification.  The authors identified nine hundred articles to the application of data mining techniques to CRM.  Seven data mining techniques are identified to include association, classification, clustering, forecasting, regression sequence discovery, and visualization.  The authors indicated that classification and association models are the two commonly used models for data mining in CRM.  Four CRM dimensions are identified as customer identification, customer attraction, customer retention, and customer development. 

Customer Relationship Management (CRM)

(Rygielski et al., 2002) defined CRM using four elements of a simple framework; know, target, sell and service.  CRM includes a set of processes and enabling systems to support the enterprise strategy to develop long term, profitable relationships with specified customers (Ngai et al., 2009).  The foundation for successful CRM strategy involves the customers’ data and information technology tools.  The rapid growth of the internet and the emerging technologies increased the opportunities for marketing and transformed the way relationship between business and customers are managed (Ngai et al., 2009).

Enterprises are required to know and understand its market and customers, which involve detailed customer intelligence to select the most profitable customers and identify those no longer worth targeting (Ngai et al., 2009; Rygielski et al., 2002).  The target entails the products to be sold to certain customers through specific channels.  The selling element of CRM requires enterprises to campaign management to increase the effectiveness of the marketing department.  Enterprises seek to retain their customers through services such as call center and help desk. 

CRM Old Model and Relationship Marketing

Technology plays a significant role in marketing. Relationship marketing has become a reality due to the technology application and the advancement in technology (Ngai et al., 2009; Rygielski et al., 2002).  Various enterprises and businesses gained competitive advantages due to the application of technologies such as business intelligence, data mining, data warehouse.  Data mining technique assists organizations to extract value from the data.  When organizations apply data mining techniques, they can determine valuable customers and predict hidden behaviors and allowing businesses to make proactive knowledge-driven decisions.  Data mining provides automated and future-oriented analysis which is beyond the past events that are based on historical data (Rygielski et al., 2002).

The old model of ‘design-build-sell’ which is a product-oriented view is being replaced by ‘sell-build-redesign which is a customer-oriented view (Rygielski et al., 2002). The new approach to one-to-one marketing challenged the traditional process of mass-marketing.  The marketing goal of the traditional approach is to reach more customers and expand the customer base.   

Two-Stage CRM Concepts

Customer Focus: The first stage is to master the basics of building and developing customer focus. This concept shifts the focus from product orientation to customer orientation and define market strategy from outside-in and not from inside-out.  The focus should be on the needs of customers and not on the product’s features (Rygielski et al., 2002).

CRM Integration: The second stage goes beyond the basics by integrating CRM across the entire customer experience chain, leveraging technology to achieve real-time customer management, and continuously innovating the value proposition to customers (Rygielski et al., 2002).

CRM Components

Customer Data: CRM involves several components.  Enterprises must first process customer information before the process of CRM begins.  Customers data can be collected through internal customer data or external sources.  Customer internal data sources include summary tables that describe customers via billing records, customer surveys of a subset of customers who answer the detailed question, and behavioral data contained in transaction systems such as weblogs, credit card records and so forth (Rygielski et al., 2002). 

Data Warehouse: Data warehouse is a critical component for a successful CRM strategy.  Data required for CRM can be limited to a marketing data mart with limited feeds from other corporate systems. External data sources can be a key source for gaining customer knowledge advantage. These external data sources include lookups for current address and phone, household hierarchies, Fair-Isaacs Corp (FICO) credit scores, and webpage viewing profiles (Rygielski et al., 2002).

Analytical Tools: CRM system must analyze the data using statistical tools, OLAP and data mining.  Marketing professionals are required to understand the customer data and business imperative whether the enterprise uses the traditional statistical techniques or one of the data mining software tools. Enterprises should employ data mining analysts who will be involved in the analysis and make sure the business does not lose sight of the original reason for implementing the data mining technique. The segmentation of the market is the result, and decisions are made regarding which segments are attractive (Rygielski et al., 2002).  

Campaign Execution and Tracking: Enterprises should execute campaigns and track the results.  Campaign management software manages and monitors the communications of customers across multiple touchpoints such as direct mail, telemarketing, customer service, point-of-sale, email, and the web.  People and processes contribute to facilitating the interaction between marketing, information technology and sales channels (Rygielski et al., 2002).

Data Mining and Knowledge Discovery

Data mining is defined as a sophisticated data search capability using statistical algorithms to discover correlations and patterns in data (Rygielski et al., 2002).  The term data mining is an analogy to the gold or coal mining, indicating that data nuggets are buried in the large volume of the corporate data warehouses, or information dropped on a website, most of which can lead to better understanding and use of the data.  Data mining approach is complementary to other analysis techniques such as statistics, on-line analytical processing (OLAP), spreadsheets, and necessary data access.   In summary, data mining is another approach to find meaning and value in the data that can aid enterprises to make better strategic and tactic decisions (Ngai et al., 2009; Rygielski et al., 2002). 

When organizations apply data mining techniques, they can discover patterns and relationships hidden in the data. This process of discovering patterns and relationships is part of a more extensive process known as ‘knowledge discovery” (Rygielski et al., 2002). The process of knowledge discovery describes the required steps to ensure meaningful output.  Data mining does not eliminate the need for organizations to understand the data and basic statistical methods.  Data mining does not find patterns or relationships that can be trusted blindly without verification.   The result must be verified.  Data mining assists in generating hypotheses. However, data mining does not validate these hypotheses.     

Data Mining Evolution and Building Blocks

Data mining evolved through four significant phases from the 1960s to 1980s, to 1990s, and 2000s (Rygielski et al., 2002).  Data mining began with the data collection in the 1960s for simple calculations such as summations and average.  The information at this phase answered business questions related to figures derived from data collection sites, such as the total revenue, or average total revenue over a specified period. Specific application programs were created for collecting data and calculations.  Data access is the second data mining generation phase in the 1980s, where databases were used to store data in a structured format. Organizations were able to query the database to access certain data for a specific period. In the 1990s, data navigation phase began as a logical step after the data access where organizations could obtain either a global view or drill down to a particular site for comparison with its peers.  In the 2000s, data mining phase began with the online analytic tools for real-time feedback and information exchange with collaborating business units. 

The primary building blocks of data mining have been developing for decades. These building blocks include statistics, artificial intelligence, and machine learning (Rygielski et al., 2002). These data mining core components are mature.  When integrating these building blocks of the data mining with a relational database, they develop a business environment which can capitalize on knowledge previously buries within the systems.  Figure 1 shows the core components of data mining.


Figure 1.  Core Components of Data Mining.

Data Mining Core Process

When using data mining, the data is formed and constructed into a model.  The model describes patterns and relationships derived from the data.  The implementation of data mining involves three general processes.  The discovery phase is the process of looking in the database to find hidden patterns without pre-determined hypotheses about the patterns. The predictive phase is the process of taking the discovered pattern and using them for future prediction.  The forensic analysis is the process of applying the extracted patterns to find anomalous or unusual data elements (Rygielski et al., 2002).  Figure 2 illustrates these three essential processes.


Figure 2.  Data Mining Three Core Processes (Rygielski et al., 2002).

Data Mining Models and Benefits

Data mining has six types of data models to solve various types of business problems; classification, regression, association analysis, sequence discovery, clustering (Ngai et al., 2009; Rygielski et al., 2002), time series (Rygielski et al., 2002), and visualization (Ngai et al., 2009).  Classification and regressions are used to make predictions, while association and sequence discovery is used to describe behavior.  Clustering model can be used for either forecasting or description.  Prediction and descriptive data mining are used for retail, banking, telecommunication, and other applications. 

In the retail sector, retailers can keep detailed records of every shopping transactions via store-branded credit cards and point-of-sale systems. Retailers can better understand the various customer segments.  Retail applications include performing basket analysis, sales forecasting, database marketing, merchandise planning and allocation (Rygielski et al., 2002).  The banking sector can deploy knowledge discovery for various applications such as card marketing, cardholder pricing and profitability, fraud detection, and predictive life-cycle management.  The telecommunications sector can utilize knowledge discovery for various applications such as call detail record analysis, and customer loyalty. Other knowledge discovery applications are emerging in a variety of sectors such as customer segmentation, manufacturing, warranties, and frequent flier incentives. For the forensic analysis, banks and financial entities can use it for fraud detection to analyze the abnormalities in the data.

Enterprises can integrate data mining into the decision-making process. However, data mining implementation requires skill sets and technology.  While data mining is frequently implemented at the regional or central organization, front line management and operations should have the knowledge gained through the data mining.  The communication of this knowledge gained through data mining can be through an algorithm for scoring, a score or a recommended action associated with a particular customer, employee or a transaction (Rygielski et al., 2002).

Data Mining Techniques

  Data mining techniques involve the retention-based technique and the distillation-based technique (Rygielski et al., 2002). The retention-based technique applies to tasks of predictive modeling and forensic analysis, and not to the knowledge discovery because they do not distill any patterns.  The distillation-based technique has three categories; logical, cross-tabulation, and equational.  These three methods extract patterns from a dataset and use the patterns for various purposes.  The logical approach handles numeric and non-numeric data, while equations require all data to be numeric, and cross-tabulation work only with non-numeric data.   Figure 3 shows the data mining techniques.


Figure 3.  Data Mining Techniques (Rygielski et al., 2002).

Data Mining and CRM

CRM is a broad topic with many layers, one of which is data mining, which is a method or tool that can aid enterprises in their quest to become more customer-oriented.  (Rygielski et al., 2002) discussed the customer lifecycle and the data mining that can aid organizations to gain competitive advantages and customer privacy. 

Customer’s Lifecycle and Data Mining: CRM lifecycle involves the stages in the relationship between customer and the business.  Enterprises can increase the customer’s value by increasing their use or purchase of products they already have, selling them more or higher-margin products, and keeping the customers for a more extended period. The customer relationship changes over time, evolving as the business and customer learn more about each other.  The customer lifecycle involves four stages; prospects, responders, active customers former customers.  The prospects customers are not yet customers but are in the target market.  The responders are prospects who show interest in the product. The active customers are those who are currently using the product or service. The former customer is those who fall into various categories, such as bad customers who did not pay their bills, customers who moved their business to the competing products, customers who incurred a high cost, or customers who are no longer in the target (Rygielski et al., 2002).

Marketing Data Intelligence (MDI): Marketing data intelligence (MDI) is defined as “combining data-driven marketing and technology to increase the knowledge and understanding of customers, products, and transactional data to improve strategic decision making and tactical marketing activity, delivering the CRM challenge”  (Rygielski et al., 2002).  Enterprises should understand the customers’ lifecycle because it provides a good framework for applying data mining to CRM.  The customer’s lifecycle tells what information is available on the input side of the data mining, and what is likely to be interesting on the output side of the data mining.  Data mining can be used over some time to predict changes in detail.  Enterprises can predict the behavior surrounding a particular lifecycle event such as retirement and find other people in a similar life stage and determine which customers are following similar behavior patterns.  The marketing data intelligence is the outcome of this process.

Marketing Data Intelligence (MDI) Components: MDI involves two critical components; customer data transformation, and customer knowledge discovery.  The raw data extracted and transformed from a wide range of internal and external databases, marts or warehouses.  The collected data gets stored in a centralized location where it can be accessed and explored.  The process is continued through customer knowledge discovery, where data mining is implemented, and useful patterns and inferences can be drawn from the data.  The process must be measured and tracked to ensure results are pushed to campaign management software.  Data mining plays a significant role in the process of CRM (Rygielski et al., 2002).  The data mining process involves the interactions with data mart or warehouse in one direction, and the interaction with campaign management software in the other direction.  The link between data mining and the campaign management was mostly manual.  The trend today is to integrate the data mining and the campaign management to gain a competitive advantage.  Enterprises can gain a competitive advantage from such integration by ensuring that the data mining software and the campaign management software share the same definition of the customer segment to avoid modeling the entire database.  For instance, if the ideal segment is about high-income males with the age range of 25-35 living in the northeast, the analysis should be limited to this segment. 

Data Mining and Customers’ Privacy:  The data mining provides various benefits to businesses. However, it can invade the privacy of the customers. (Rygielski et al., 2002) argued that the personalization of CRM is far from the invasion of the privacy.  Personal information can be classified into two categories; data provided and accessible to users, and data generated and analyzed by businesses. Before data mining techniques became popular, customer’s data was collected on a self-provided or transactional basis.  Customers provide general descriptive data which contain demographic data about themselves.  The transactional data refers to data obtained when a transaction takes place, such as product name, quantity, location, and time of purchase. Data mining helps turn customer data into customer profiling information, which belongs to the second category.  It includes customer value, targeting information, customer rating, and behavior tracking.  When abusing this information, people may also suffer from certain forms of discrimination such as insurance or loss of career.  The central issue of privacy is to find a balance between privacy rights for consumers’ protection and businesses benefits. 

(Rygielski et al., 2002) argued that privacy is more of a policy issue than a technology issue.  One basic principle for Enterprises when using personalized technology is to disclose to their customers the kinds of information they are seeking and how that information will be used.  While some list objectives for ethical information and privacy management, others develop a Privacy Bill of Rights that includes fair access by individuals to their personal information.  The privacy of customers can be protected when customers do not have to reveal their identities and can remain anonymous even after implementing data mining.  Various security measures such as encryption and firewall should be implemented.

Conclusion

The discussion involved two main articles that discussed data mining application and CRM.  The application of data mining techniques in CRM is an emerging trend in the industry.  The relationship between business and customers are taking a different path in the presence of the Internet, and Big Data Analytics techniques such as data mining.  Enterprises are under pressure to gain a competitive advantage using data mining techniques to extract value from customers’ data. Enterprises are also under pressures to ensure the protection of the customer’s private information.  Various data mining techniques are available such as statistics and machine learning.  Enterprise should apply the appropriate data mining technique to CRM strategy to gain competitive advantages by not only gaining customers but also retaining the customers.  

References

Ngai, E. W., Xiu, L., & Chau, D. C. (2009). Application of data mining techniques in customer relationship management: A literature review and classification. Expert Systems with Applications, 36(2), 2592-2602.

Rygielski, C., Wang, J.-C., & Yen, D. C. (2002). Data mining techniques for customer relationship management. Technology in society, 24(4), 483-502.

The Challenges and Benefits of Data Warehousing and Data Mining Techniques.

Dr. O. Aly
Computer Science

In the age of big data, a considerable variety, volume, and velocity of data are being generated. The data are being generated by people, machines, the Web, and information systems. Harnessing these data and making sense of them in real time or near real time to develop actionable intelligence is one of the big challenges facing organizations.  Data are stored in warehouses, and they are then mined to generate insights. Analytical techniques that are used include statistical techniques, machine learning, and others.  The purpose of this discussion is to address the challenges and benefits of data warehousing and data mining techniques.

Data Warehousing

Data warehousing is defined as a subject-oriented, integrated, time-variant, and non-volatile collection of data in support of the decision-making process (Connolly & Begg, 2015).  Since the 1970s, enterprises have mostly focused their investment in a new information system that automates a business process.  Businesses gained competitive advantages through these systems that provided more efficient and cost-effective services to customers.  Organizations have been stored the data in the operational databases — however, the operational database designed for daily operations and not to be part of the decision-making process.  Enterprises faced the challenge to turn the archived data into a source of knowledge.  The concept to data warehouse was emerged as the solution to meet this requirement of a capability system supporting decision making and receiving data from various operational sources (Connolly & Begg, 2015; Coronel & Morris, 2016). 

The concept of data warehouse (DW) was devised by IBM as “information warehouse” as a solution for accessing data held in non-relational systems (Connolly & Begg, 2015).  It was proposed to allow businesses to use the archived data to aid them to gain a business advantage. However, due to the complexity of the implementation, the early attempts at creating an information warehouse were mostly rejected.  The concept of data warehousing has been raised several times since then. However, in recent years, the potential of data warehousing has been viewed as a valuable and viable solution to businesses.  Bill Inmon is regarded to be the father of DW as he was one of the earliest promoters of data warehousing (Connolly & Begg, 2015; Guohong, Lijun, Junhui, & Peixin, 2010).

Data Warehouse Characteristics
The database for data warehouse (DW) is another type of database in management information system acting as ‘one-stop shopping” and focusing on supporting informed and actionable decision making (Ally & Khan, 2016; Coronel & Morris, 2016).  It is a central location for knowledge creation to mitigate the challenge of various independent sources of data.  This type of database is distinguished from other databases such as a transactional or operational database (Ally & Khan, 2016; Coronel & Morris, 2016).  DW unlike the operational database collects consolidated and summarized data used in the decision-making process.  DW has four significant characteristics proposed by two DW icon known as Kimble and Inmon. The integrated, subject-oriented, time-variant, and non-volatile are the primary four characteristics of DW (Ally & Khan, 2016; Connolly & Begg, 2015; Coronel & Morris, 2016).

Data Warehouse Architecture
Various studies proposed a various architecture for the data warehouse. The selected architecture for this discussion includes CRM, and ERP (Guohong et al., 2010).  CRM integrates the scattered, isolated data in the enterprise for a comprehensive and complete understanding of customers. Online analytical processing technology (OLAP) is a software technology allowing analysis and managers to access the data fast, consistently, and interactively.  Figure 1 shows the holistic view of the data warehouse framework.


Figure 1.  A Holistic View of DW Framework (Guohong et al., 2010)

Benefits and Challenges of Data Warehousing
The successful implementation of the data warehouse can bring significant advantages to business.  Enterprises can gain potential high returns on investment, competitive advantage, and increased the productivity of corporate decision makers.  As cited in (Connolly & Begg, 2015), data warehouse projects delivered an average three-year return on investment of 401%.  This high ROI posits these enterprises which successfully implemented the data warehousing projects into a competitive advantage.  Businesses gain competitive advantages when allowing decision makers to access the data that can reveal previously unavailable, unknown and untapped information on customers, products, trends, and demands.  The successful implementation of the data warehousing improves the productivity of enterprise decision makers by creating an integrated database of consistent, subject-oriented, and historical data.  The data warehouse can integrate data from various independent data sources and transform this data to meaningful information providing decision makers with substantive, accurate and consistent analysis (Connolly & Begg, 2015; Coronel & Morris, 2016).

Data warehousing is confronted with various challenges.  Underestimation of resource for data ETL (extract, transform and load) process is one of the significant challenges (Connolly & Begg, 2015; Coronel & Morris, 2016).  Hidden problems with source systems and required data are not captured are other challenges that data warehouse faces.  Other challenges include increased end-user demands, data homogenization, high demand for resources, data ownership, high maintenance, long-duration projects, and complexity of integration.  In the era of Big Data and Big Data Analytics, a data warehouse is confronted with additional challenges of new technologies such as Hadoop, MapReduce, Cloud Computing and so forth.  The data warehouse was initially designed for historical data. However, with BDA, real-time (RT) and near-real-time (NRT), a data warehouse is required.  Thus, the demand is increased to design DW to enable RT/NRT extraction, modeling RT fact table, and scalability and query contention (Connolly & Begg, 2015; Coronel & Morris, 2016) 

Data Mining

Data warehouse, OLAP and data mining are essential technologies forming critical components of the Business Intelligence implementation (Connolly & Begg, 2015).  The value of the data warehouse is determined by providing the data to end users using the appropriate analytical tools such as data mining and OLAP (Connolly & Begg, 2015).  Because OLAP and data mining analytical tools are distinguished in what they offer to the end users, they are regarded as complementary technologies (Connolly & Begg, 2015).  While OLAP employs advanced data analysis and presentation tools including the multi-dimensional data analysis, data mining provides advanced statistical tools not only to provide analysis of the large data available through the data warehouses and other sources but also to identify the possible relationships and anomalies (Connolly & Begg, 2015). 

Data mining is “the process of discovering meaningful new correlations, patterns, and trends by mining large amounts of data using statistical, mathematical, and AI techniques.  Data mining has the potential to supersede the capabilities of OLAP tools, as the major attraction of data mining is its ability to build predictive rather than retrospective models” (Connolly & Begg, 2015).  While the traditional BI tools are “reactive,” data mining is regarded to be “proactive” as the end users do not have to identify the problem, and select the data to be analyzed by the traditional BI tools, but rather data mining tools identify the problem by automatically searching the data for anomalies and possible relationship (Coronel & Morris, 2016).   Thus, data mining involves four tasks: (1) analyzing the data, (2) discovering the problems or opportunities that might be hidden in the relationship of the data, (3) formulating a model that is based on the findings, (4) utilizing the model to predict behavior of the business, which requires minimal intervention from the end users (Coronel & Morris, 2016).  As a result of these activities, the business can use the findings to obtain knowledge that can lead to competitive advantages (Coronel & Morris, 2016).  In summary, data mining is described as the analytical tool that “initiate analyses to create knowledge” (Coronel & Morris, 2016).  This knowledge represents very specialized information (Coronel & Morris, 2016).

Data Mining Techniques

Data mining techniques involve four essential operations: (1) “Predictive Modeling,” (2) “Database Segmentation,” (3) “Link Analysis,” and (4) “Deviation Detection.” (Connolly & Begg, 2015).  The “Predictive Modeling” operation implements the classification and prediction technique.  The “Database Segmentation” operation implements demographic clustering and neural clustering techniques (Connolly & Begg, 2015).  The “Link Analysis” operation implements association discovery, sequential pattern discovery, and similar time sequence discovery techniques (Connolly & Begg, 2015).  The “Deviation Detection” operation implements the statistics and visualization techniques (Connolly & Begg, 2015).  Although business can implement any of these four operations, the certain association between the business applications and the data mining techniques (Connolly & Begg, 2015).  For instance, the “Retail/Marketing” applies “database segmentation operation,” while the “Fraud Detection” applies any of the four operations (Connolly & Begg, 2015).

The Machine Learning Algorithm “Supervised” and “Non-supervised” learning techniques are the most common machine learning algorithm that is implemented in various domains, particularly the “Data Mining” domain (Hall, Dean, Kabul, & Silva, 2014).  Supervised learning algorithm (SLA) is a technique that is used to label data to train a model (Hall et al., 2014). It is comprised of “Prediction” (“Regression”) algorithm, and “Classification” algorithm.  The “Regression” or “Prediction” algorithm is used for “interval labels,” while the “Classification” algorithm is used for “class labels” (Hall et al., 2014).  In the SL algorithm, the training data represented in observations, measurements, and so forth are associated by labels reflecting the class of the observations (Han, Pei, & Kamber, 2011).  The new data is classified based on the “training set” (Han et al., 2011). The unsupervised learning algorithm (ULA) occurs when a model is trained on unlabeled data  (Hall et al., 2014).  UL algorithm typically segments data into “groups of examples” called “Clusters” or “groups of features” called “Feature Extraction” (Hall et al., 2014).  The UL technique can be either the “end goal of a machine learning task,” as the case with “Market Segmentation,” or a “preliminary or pre-processing step in a supervised learning task” (Hall et al., 2014).  When using the UL algorithm, the class labels of training data is “unknown” (Han et al., 2011).  UL algorithm is used to establish the existence of class or clusters in the data, given a set of measurements, and observations (Han et al., 2011).

Benefits and Challenges

The goal of data mining is to extract value from data.  Enterprises can utilize this information to make sound decisions to gain competitive advantages (Che, Safran, & Peng, 2013).  Organizations can benefit from data mining in discovering concept/class descriptions, associations and correlations, classification, prediction, clustering, trend analysis outlier, and deviation analysis in making strategic and tactic decisions (Hand, Mannila, & Smyth, 2001; Linoff & Berry, 2011; Rygielski, Wang, & Yen, 2002).  However, data mining is confronted with various challenges include the development of parallel or high-performance algorithms, theoretical models, and data mining techniques (Dubitzky, 2008).  Distributed data mining algorithms should support the complete data mining process from pre-processing, to data mining, to post-processing.  The design of new data mining systems and architectures to deal with efficient use of computing resource is another challenging area for data mining.  More development challenges in several areas such as the high complexity of many data mining applications, the various data sources with various data models, the volume of the data (Dubitzky, 2008). 

Conclusion

This discussion addressed two significant topics of data warehouse and data mining.  It began with the discussion about data warehouse, its evolution using information warehouse by IBM.  Due to the complexity, the concept disappeared for a while but surfaced again. Bill Inmon is the father of the data warehouse.  The benefits of the data warehouse are tremendous to businesses.  However, data warehouse project implementation is confronted with various challenges especially in the age of Big Data Analytics and emerging technologies such as Hadoop.  Data mining is another technique that organization embraces to extract value from the data.  Data mining has various mining techniques including supervised and non-supervised algorithms.  Like data warehouse, data mining makes organization gain a competitive edge.  However, same as the data warehouse, data mining is also confronted with various challenges. Organizations should analyze each technique before embracing the technology to understand the benefits as well as the challenges.

References

Ally, S. S., & Khan, N. (2016, 15-17 Dec. 2016). Data Warehouse and BI to Catalize Information Use in Health Sector for Decision Making: A Case Study. Paper presented at the 2016 International Conference on Computational Science and Computational Intelligence (CSCI).

Che, D., Safran, M., & Peng, Z. (2013). From Big Data to Big Data Mining: Challenges, Issues, and Opportunities. Paper presented at the International Conference on Database Systems for Advanced Applications.

Connolly, T., & Begg, C. (2015). Database Systems: A Practical Approach to Design, Implementation, and Management (6th Edition ed.): Pearson.

Coronel, C., & Morris, S. (2016). Database systems: design, implementation, & management: Cengage Learning.

Dubitzky, W. (2008). Data Mining in Grid Computing Environments: John Wiley & Sons.

Guohong, G., Lijun, X., Junhui, F., & Peixin, Q. (2010). The building of Customer Relationship Management system based on OLAP. Paper presented at the Industrial Mechatronics and Automation (ICIMA), 2010 2nd International Conference on.

Hall, P., Dean, J., Kabul, I. K., & Silva, J. (2014). An Overview of Machine Learning with SAS® Enterprise Miner™. SAS Institute Inc.

Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques: Elsevier.

Hand, D. J., Mannila, H., & Smyth, P. (2001). Principles of data mining.

Linoff, G. S., & Berry, M. J. (2011). Data mining techniques: for marketing, sales, and customer relationship management: John Wiley & Sons.

Rygielski, C., Wang, J.-C., & Yen, D. C. (2002). Data mining techniques for customer relationship management. Technology in society, 24(4), 483-502.

The importance of Information Control for Ethical Reasons

Dr. O. Aly
Computer Science

There are several areas of information ethics in which the control of information is crucial. Four such areas are privacy, accuracy, property, and accessibility (PAPA). The purpose of this discussion is to address these critical areas in the context of the importance of the control of information for ethical reasons. The discussion begins with the ethical issues four building blocks of PAPA, followed by the control of information and security measures which Enterprises must follow to protect the privacy of users since it has been a significant concern in IS domain.

Ethical Issues Building Blocks

In the 1990s, computer ethics became a favorite topic in the research community.  One of the virus and worms attacks called “ILoveYou” proliferated the computer ethics dilemma dramatically (Harris, 2000).  The estimated damage of this virus reached $10 billion worldwide, mostly in the loss of the work time.  FBI estimates billions of dollars lost due to computer crimes.  This virus has raised the red flag for serious ethical issues faced by computer users and IT Professionals.  The Internet has increased the seriousness of the ethical issues when using an information system and computers (Harris, 2000). 

The information system (IS) is becoming boundless as organizations attempt to diminish costs, increase the efficiency and develop strategic competitive advantages (Pearlson & Saunders, 2001). However, these advantages exist in a business domain that lacks moral clarity.  Enterprises are under pressure to evaluate the current information system with more focus on ethical issues.  The building blocks of ethical computing issues have not been clear to many computer and information system users. 

In the age of information system, computers, internet, and digital world, (Mason, 2015) indicated that many unique challenges exist stemming from the nature of information. However, although many ethical issues exist, focused on four major ethical issues; privacy, accuracy, property, and accessibility (PAPA).  Figure 1 shows these four building blocks with their related critical questions. 

Privacy is defined in today’s information-oriented world as the ability of the individual to personally control information about self (Pearlson & Saunders, 2001).  Privacy has been a significant issue around the globe as users are concerned about revealing and discoloring information that they do not want to make it public or shared with other entities (Mason, 2015; Pearlson & Saunders, 2001).

Accuracy represents the correctness of the information. When the information presented does not reflect the accurate information, it can cause serious issues. (Mason, 2015; Pearlson & Saunders, 2001) referred to a bank case where the customer made a payment on a mortgage which was not recorded in the bank system and eventually the bank took foreclosed on the house.  This example shows how serious inaccurate information can cause to individuals. 

The property represents the owner of the data.  The question of intellectual property rights is one of the most complex issues.  Organizations collect information about customers, users, and employees.  The data gets stored either internally or in the cloud.  Who owns the data is the question for the property ethical issue (Mason, 2015; Pearlson & Saunders, 2001)

Accessibility raises the question about what information a person or organization have the right to access and obtain, under what condition, and what safeguards (Mason, 2015; Pearlson & Saunders, 2001).


Figure 1.  PAPA Ethical Issues Model based on (Mason, 2015; Pearlson & Saunders, 2001).

Control of Information and Security Measures

(Abernathy & McMillan, 2016) identified personally identifiable information (PII) that can be used alone or with other information to identify a single person. PII includes full name, an identification number such as driving license, social security, date of birth and so forth. Enterprises must ensure that they understand international, national, state and local regulations and laws regarding the PII. Figure 2 shows the magnitude of personal data.


Figure 2.  PII Complex List of Personal Data (Abernathy & McMillan, 2016).

Various regulations and policies have been established around the world to protect the privacy of the individuals (Abernathy & McMillan, 2016; Pearlson & Saunders, 2001).  In the U.S., privacy legislation includes the 1974 Privacy Act which regulates the government’s collection and use of personal information and the 1998 Children’s Online Privacy Protection Action which regulates the online collection and use of children’s personal information. Other regulations are industry-based legislation to protect the privacy of the individuals such as the Gramm-Leach-Bliley Act of 1999 and the Health Insurance Portability and Accountability Act (HIPAA) of 1996.  Gramm-Leach-Bliley Act of 1999 was issued because banks were selling sensitive information about their customers such as social security number, credit card purchase history to telemarketing companies.  This law has mitigated sharing such sensitive information with other entities.  HIPAA was issued to safeguard the electronic exchange privacy and security of the information in the healthcare industry. Patients’ records must be protected from unauthorized access, manipulation, and transmissions (Abernathy & McMillan, 2016; Pearlson & Saunders, 2001).

Various studies have discussed the ethical issues in information system domain (Harris, 2000; Kuzu, 2009; Ponelis, 2013).  Organizations are under pressure to protect the privacy of the users in the age of information system, computers, and the Internet.  They should limit inappropriate access to customers’ information to respect the privacy of their customers, users, and employees (Pearlson & Saunders, 2001).  Security measures must be implemented to ensure the appropriate data protection so that nonauthorized users and malicious attacks can be prevented and mitigated.  These security measures can be a firewall, authentication, authorization, access control, and encryption.  At the network level when data is moving from one system to another, security measures include secure socket layer (SSL) protocol, Transport Layer Security (TLS) protocol, secure IP (IPSec), secure HTTP (HTTPS), secure email (S/MIME) (Kuzu, 2009). When using cloud computing, the security measures to protect the privacy and the integrity of the data are more complicated as cloud computing has different service models, and different deployment models (Kumar, Ranjan, & Gangwar, 2012). Organizations must evaluate the options for selecting the appropriate security measure not only to protect themselves from outrageous fines and penalties but also to protect the privacy of the users.

Conclusion

This discussion addressed critical ethical issues using the privacy, accuracy, property, and accessibility (PAPA) model of (Mason, 2015). These ethical issues raise the flag to protect data from unauthorized user access, from sharing private information, or from any malicious attacks that can cause loss of data or data breach.  Enterprises are under pressure to ensure the protection of the user’s information. Various security measures can be implemented at the various level of the information system for data at rest or data in motion.  For data in motion, security measures such as SSL, HTTPS, and IPSec can be implemented to protect data.  For data at rest, security measures can include encryption and access control.  Organizations should take into consideration additional security measures to control access to information, especially when using cloud computing.

References

Abernathy, R., & McMillan, T. (2016). CISSP Cert Guide: Pearson IT Certification.

Harris, A. L. (2000). IS ethical attitudes among college students: A comparative study.

Kumar, A., Ranjan, A., & Gangwar, U. (2012). An understanding approach towards cloud computing. International Journal of Emerging Technology and Advanced Engineering, 2(9).

Kuzu, A. (2009). Problems Related to Computer Ethics: Origins of the Problems and Suggested Solutions. Online Submission, 8(2).

Mason, R. O. (2015). Four ethical issues of the information age Computer Ethics (pp. 41-48): Routledge.

Pearlson, K., & Saunders, C. (2001). Managing and Using Information Systems: A Strategic Approach. 2001: USA: John Wiley & Sons.

Ponelis, S. (2013). Ethical risks of social media use by academic libraries. Innovation: journal of appropriate librarianship and information work in Southern Africa, 2013(47), 231-244.

Business Analytics: Big Data Challenges

Dr. O. Aly
Computer Science

The purpose of this discussion is to address Big Data (BD) and the challenges associated with BD in the context of business analytics. The discussion begins with a brief overview of Big Data and Big Data Analytics, followed by the challenges. Cloud computing solution is also discussed as well as the role of BD in ERP.

Big Data Brief Overview

Big Data is now the buzzword in the field of computer science and information technology.  Big Data attracted the attention of various sectors, researchers, academia, government and even the media (Géczy, 2014; Kaisler, Armour, Espinosa, & Money, 2013).   In the 2011 report of the International Data Corporation (IDC), it is reporting that the amount of the information which will be created and replicated will exceed 1.8 zettabytes which are 1.8 trillion gigabytes in 2011. This amount of information is growing by a factor of 9 in just five years (Gantz & Reinsel, 2011).  Big Data and Big Data Analytic are terms that have been used interchangeably (Maltby, 2011).  Big Data has unique characteristics that are identified as challenging using traditional technology.

Big Data (BD) has been characterized by what is often referred to as a multi-V model such as variety, velocity, volume, veracity, and value (Assunção, Calheiros, Bianchi, Netto, & Buyya, 2015). While variety represents the data types, the velocity reflects the rate at which the data is produced and processed (Assunção et al., 2015).  The volume defines the amount of data, and the veracity reflects how much the data can be trusted given the reliability of its source. The value, on the other hand, represents the monetary worth which organizations can derive from adopting Big Data computing. Figure 1 summarizes these characteristics.

Big Data (BD) has been characterized by what is often referred to as a multi-V model such as variety, velocity, volume, veracity, and value (Assunção, Calheiros, Bianchi, Netto, & Buyya, 2015). While variety represents the data types, the velocity reflects the rate at which the data is produced and processed (Assunção et al., 2015).  The volume defines the amount of data, and the veracity reflects how much the data can be trusted given the reliability of its source. The value, on the other hand, represents the monetary worth which organizations can derive from adopting Big Data computing. Figure 1 summarizes these characteristics.

Figure 1.  Big Data Multi-V Model (Assunção et al., 2015).

The variety characteristic of the Big Data reflects the data types (Assunção et al., 2015). The data types are further categorized into the structure, unstructured, semi-structured and mixed. The structured data represents the formal schema and data models, while the unstructured reflects no pre-defined data model, and semi-structured lacked strict data model structure and mixed as the term indicates that various types together (Assunção et al., 2015). Figure 2 summarizes these data types in the Big Data.

Figure 2.  Variety Characteristic of Big Data (Assunção et al., 2015).

The velocity characteristics of the Big Data represents the speed or arrival and the processing of the data which have been characterized into the batch, near-time, real-time, and streams according to (Assunção et al., 2015). The batch reflects the at time intervals, while near-time refers to at small time intervals.  The real-time, on the other hand, represents the continuous input, process, and output, while the streams refer to data flows (Assunção et al., 2015). Figure 3 summarizes these characteristics of the velocity feature of the Big Data.

Figure 3.  Velocity Characteristic of Big Data (Assunção et al., 2015).

Big Data Challenges

With these characteristics of Big Data, including the growth rate, challenges and issues have come along (Jagadish et al., 2014; Meeker & Hong, 2014; Misra, Sharma, Gulia, & Bana, 2014; Nasser & Tariq, 2015; Zhou, Chawla, Jin, & Williams, 2014). The growth rate in the amount of data is regarded to be a significant challenge for IT researchers and practitioners to design appropriate systems that handle the data effectively and analyze it to extract relevant meaning for decision-making (Kaisler et al., 2013). Various challenges and issues of the Big Data have been discussed and analyzed in multiple research studies, such as data storage, data management, and data processing (Fernández et al., 2014; Kaisler et al., 2013); Big Data variety, Big Data integration and cleaning, Big Data reduction, Big Data query and indexing, and Bid Data analysis and mining (J. Chen et al., 2013).  

Extracting a meaningful value from the Big Data is a significant challenge (Fernández et al., 2014; Sagiroglu & Sinanc, 2013).  Three factors must be taken into consideration to create value from Big Data (Chopra & Madan, 2015).  These three factors include the user control over the data, the security issues to be taken seriously, and the examination of safety points on a yearly basis.  (Chopra & Madan, 2015) suggested that businesses and organizations, which follow those factors, will distinguish themselves by gaining market initiatives.  Other research studies such as (Labrinidis & Jagadish, 2012) suggested that the value obtained from the analysis of the data is broadly recognized, but the analysis of the data is regarded to be challenging due to the challenging characteristics of the Big Data. Other research studies such as (Assunção et al., 2015; Chopra & Madan, 2015) have indicated that the complexity of Big Data is preventing organization to realize its benefit and causing a business to step back from the Big Data deployment and implementation.

Big Data Analytics and Cloud Computing Solution

The challenges of BD and BDA such as data storage, data management, data processing,  and data-intensive computational requirements required solutions as the traditional technology was found inadequate (Fernández et al., 2014; Hu, Wen, Chua, & Li, 2014).  As indicated above, one of the significant challenges is extracting a meaningful value from BD.  BD and BDA require advanced and unique data storage, management, analysis, intensive computing, and visualization technologies (H. Chen, Chiang, & Storey, 2012; J. Chen et al., 2013).   Cloud computing emerging technology has been meeting these requirements and serving as a solution and platform to BD and BDA challenges.  

Cloud computing plays a significant role in Big Data Analytics (Assunção et al., 2015).  The massive computation and storage requirement of the BD and BDA brings the critical need for cloud computing (Mehmood, Natgunanathan, Xiang, Hua, & Guo, 2016). Cloud computing is currently the biggest buzz in the information technology, computer science industry, in the computer world, and the distributed computing community (Dhanani, 2014; Saini & Sharma, 2014). It is being positioned as the “next wave of computing” (Mvelase, Dlodlo, Makitla, Sibiya, & Adigun, 2012, p. 214).   The use of cloud computing technology in conjunction with data has been the more recent trend for BDA (Wang, Kung, & Byrd, 2018).  Organizations have increasingly adopted BD and BDA in the cloud, particularly, the Software-as-a-Service (SaaS) cloud service model, which offers an attractive alternative with lower cost (Wang et al., 2018).  Cloud computing technology for BDA systems supporting a real-time analytic capability and cost-effective storage is becoming a preferred information technology solution (Wang et al., 2018).  The cloud computing technology is the solution and the answer to the challenges of BD and BDA (Fernández et al., 2014).  Organizations and businesses are under pressure to quickly adopt and implement technologies such as cloud computing to address the challenges of Big Data (Hashem et al., 2015).

Big Data Analytics Role in ERP

Big Data Analytics plays a significant role in ERP applications (Carlton, 2014; ERP Solutions, 2018; Woodie, 2016).  Enterprise data comprises various departments such as HR, finance, CRM and other essential business functions of a business.  This data can be leveraged to make ERP functionality better.  When Big Data tools are brought together with the ERP system, it can unfold valuable insights that can businesses make smarter decisions (Carlton, 2014; Cornell University, 2017; Wailgum, 2018). Many ERP systems fail to make use of real-time inventory and supply chains data because these systems lack the intelligence to make predictions about products demands (Carlton, 2014; ERP Solutions, 2018). Big Data tools can predict demand and help determine the needs of the organization to go forward (ERP Solutions, 2018).  Infor co-president Duncan Angove established Dynamic Science Labs (DSL) aiming to use data science techniques to solve particular business problems for its customers. Employees with big data, math, and coding skills were hired in Cambridge, Massachusetts-based organization to develop proof of concept (POC) (Woodie, 2016).  Big Data systems such as Apache’s Hadoop are creating node-level operating transparencies which affect nearly every current ERP module in real-time (Carlton, 2014).  Managers will be able to quickly leverage ERP Big Data capabilities, thereby enhancing information density and speeding up overall decision-making. In brief, Big Data and Big Data Analytics impact business at all levels, and ERP is no exception.

Conclusion

Big Data (BD) and Big Data Analytics (BDA) have been the buzzwords across various industries from academic, research, practitioners, media and government.  BD has been characterized by certain features such as volume, variety, and velocity which were the first V-model of BD.  The traditional technology and systems were found inadequate to deal with and handle BD.  The explosive growth of the data in various forms such as structured, unstructured and semi-structured, and the speed of the growth and the required speed for processing the data demanded technologies that can deal with these unique characteristics.  Cloud computing emerging technology was found to provide a solution when applying BD and BDA for storage and computation.  Other technologies include Hadoop, MapReduce, Spark and so forth.  BD and BDA play a crucial role in Enterprise Resource Planning (ERP). Organizations are under pressure to take advantage of BD and BDA to become competitive and stay competitive in the age of the digital world and the era of Big Data and Big Data Analytics.

References

Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A. S., & Buyya, R. (2015). Big Data Computing and Clouds: Trends and Future Directions. Journal of Parallel and Distributed Computing, 79, 3-15. doi:10.1016/j.jpdc.2014.08.003

Carlton, R. (2014). 5 Ways Big Data is Changing ERP Software. Retrieved from https://www.erpfocus.com/five-ways-big-data-is-changing-erp-software-2733.html.

Chen, H., Chiang, R. H. L., & Storey, V. C. (2012). Business Intelligence and Analytics: From Big Data to Big Impact. MIS Quarterly, 36(4), 1165-1188.

Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S., & Zhou, X. (2013). Big Data Challenge: a Data Management Perspective. Frontiers of Computer Science, 7(2), 157-164. doi:10.1007/s11704-013-3903-7

Chopra, A., & Madan, S. (2015). Big Data: A Trouble or A Real Solution? International Journal of Computer Science Issues, 12(2), 221.

Cornell University. (2017). Enterprise Information Systems. Retrieved from https://it.cornell.edu/strategic-plan/enterprise-information-systems. 

Dhanani, M. (2014). Cloud Security: Privacy and Data Protection. Department of Computer Science and Software Engineering, University of Canterbury, New Zealand.

ERP Solutions. (2018). The Role of Big Data Analytics in ERP Applications. Retrieved from https://erpsolutions.oodles.io/big-data-analytics-in-erp/. 

Fernández, A., Del Río, S., López, V., Bawakid, A., del Jesus, M. J., Benítez, J. M., & Herrera, F. (2014). Big Data with Cloud Computing: An Insight on the Computing Environment, MapReduce, and Programming Frameworks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(5), 380-409. doi:10.1002/widm.1134

Gantz, J., & Reinsel, D. (2011). Extracting Value From Chaos. International Data Corporation, 1142, 1-12.

Géczy, P. (2014). Big data characteristics. The Macrotheme Review, 3(6), 94-104.

Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The Rise of “Big Data” on Cloud Computing: Review and Open Research Issues. Information Systems, 47, 98-115. doi:10.1016/j.is.2014.07.006

Hu, H., Wen, Y., Chua, T., & Li, X. (2014). Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. Practical Innovation, Open Solution, 2, 652-687. doi:10.1109/ACCESS.2014.2332453

Jagadish, H. V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R., & Shahabi, C. (2014). Big Data and Its Technical Challenges. Communications of the Association for Computing Machinery, 57(7), 86-94. doi:10.1145/2611567

Kaisler, S., Armour, F., Espinosa, J. A., & Money, W. (2013). Big Data: Issues and Challenges Moving Forward. Paper presented at the Hawaii International Conference on System Sciences

Labrinidis, A., & Jagadish, H. V. (2012). Challenges and Opportunities with Big Data. International Conference on Very Large Data Bases, 5(12), 2032-2033.

Maltby, D. (2011). Big Data Analytics. Paper presented at the Annual Meeting of the Association for Information Science and Technology.

Meeker, W., & Hong, Y. (2014). Reliability Meets Big Data: Opportunities and Challenges. Quality Engineering, 26(1), 102-116. doi:10.1080/08982112.2014.846119

Mehmood, A., Natgunanathan, I., Xiang, Y., Hua, G., & Guo, S. (2016). Protection of Big Data Privacy. Institute of Electrical and Electronic Engineers, 4, 1821-1834. doi:10.1109/ACCESS.2016.2558446

Misra, A., Sharma, A., Gulia, P., & Bana, A. (2014). Big Data: Challenges and Opportunities. International Journal of Innovative Technology and Exploring Engineering, 4(2).

Mvelase, P., Dlodlo, N., Makitla, I., Sibiya, G., & Adigun, M. (2012). An Architecture Based on SOA and Virtual Enterprise Principles: OpenNebula for Cloud Deployment, Reading.

Nasser, T., & Tariq, R. S. (2015). Big Data Challenges. Journal of Computer Engineering & Information Technology, 9307, 1-10. doi:10.4172/2324

Sagiroglu, S., & Sinanc, D. (2013). Big Data: A Review. Paper presented at the International Conference: Collaboration Technologies and Systems.

Saini, G., & Sharma, N. (2014). Triple Security of Data in Cloud Computing. International Journal of Computer Science and Information Technologies, 5(4), 5825-5827.

Wailgum, T. (2018). What is CRM? Software for Managing Customer Data. Retrieved from https://www.cio.com/article/2439505/customer-relationship-management/customer-relationship-management-crm-definition-and-solutions.html.

Wang, Y., Kung, L. A., & Byrd, T. A. (2018). Big Data Analytics: Understanding its Capabilities and Potential Benefits for Healthcare Organizations. Technological Forecasting and Social Change, 126, 3-13. doi:10.1016/j.techfore.2015.12.019

Woodie, A. (2016). Making ERP Better with Big Data. Retrieved from https://www.datanami.com/2016/07/08/making-erp-better-big-data/.

Zhou, Z., Chawla, N., Jin, Y., & Williams, G. (2014). Big Data Opportunities and Challenges: Discussions from Data Analytics Perspectives. Institute of Electrical and Electronic Engineers: Computational Intelligence Magazine, 9(4), 62-74.

Proposal: State-of-the-Art Healthcare System in Four States.

Dr. O. Aly
Computer Science

Abstract

The purpose of this proposal is to design a state-of-the-art healthcare system in four States of Colorado, Utah, Arizona, and New Mexico.   Big Data and Big Data Analytics have played significant roles in various industries including the healthcare industry.  The value that is driven by BDA can save lives and minimize costs for patients.  The project proposes a design to apply BD and BDA in the healthcare system across these identified four States.  Cloud computing is the most appropriate technology to deal with the large volume of healthcare data at the storage level as well as at the data processing level.  Due to the security issue of the cloud computing, the Virtual Private Cloud (VPC) will be used.  VPC provides a secure cloud environment using network traffic security setup using security groups and network access control lists.   The project requires other components to be fully implemented using the latest technology such as Hadoop and MapReduce for data streaming processing, machine learning for artificial intelligence, which will be used for Internet of Things (IoT).  The NoSQL database HBase and MongoDB will be used to handle the semi-structured data such as XML and unstructured data such as logs and images.  Spark will be used for real-time data processing which can be vital for urgent care and emergency services.  This project addresses the assumptions and limitations plus the justification for selecting these specific components.  All stakeholders in the healthcare sector including providers, insurers, pharmaceuticals, practitioners should cooperate and coordinate to facilitate the implementation process.  The rigid culture and silo pattern need to change for better healthcare which can save millions of dollars to the healthcare industry and provide excellent care to the patients at the same time.

Keywords: Big Data Analytics; Hadoop; Healthcare Big Data System; Spark.

Introduction

            In the age of Big Data (BD), information technology plays a significant role in the healthcare industry (HIMSS, 2018).  The role of information technology in healthcare The healthcare sector generates a massive amount of data every day to conform to standards and regulations (Alexandru, Alexandru, Coardos, & Tudora, 2016).  The generated Big Data has the potential to support many medical and healthcare operations including clinical decision support, disease surveillance and population health management (Alexandru et al., 2016). This project proposes a state-of-the-art integrated system for hospitals located in Arizona, Colorado, New Mexico, and Utah.  The system is based on the Hadoop ecosystem to help the hospitals maintain and improve human health via diagnosis, treatment and disease prevention. 

It begins with Big Data Analytics in Healthcare Overview, which covers the benefits and challenges of BD and BDA in the healthcare industry.  The overview also covers the various healthcare data sources for data analytics, in different formats such as semi-structured, e.g., XML and JSON, and unstructured, e.g., images and XRays.  The second section addresses the healthcare BDA Design Proposal Using Hadoop. This section covers various components.  The first component discusses the requirements for this design.  These requirements include state-of-the-art technology such as Hadoop/MapReduce, Spark, NoSQL database, Artificial Intelligence (AI), Internet of Things (IoT).  The project also covers various diagrams including the data flow diagram, a communication flow chart, and the overall system diagram.  The healthcare design system is bounded by regulation, policies, and governance such as HIPAA, that is also covered in this project.  The justification, limitation, and assumptions are also discussed.

Big Data Analytics in Healthcare Overview

BD and BDA are terms that have been used interchangeably and described as the next frontier for innovation, competitions, and productivity (Maltby, 2011; Manyika et al., 2011).  BD has a multi-V model with unique characteristics, such as volume referring to the large dataset, velocity refers to the speed of the computation as well as data generation, and variety referring to the various data types such as semi-structured and unstructured (Assunção, Calheiros, Bianchi, Netto, & Buyya, 2015; Hu, Wen, Chua, & Li, 2014).  BD is described as the next frontier for competition, innovation, and productivity.  Various industries including healthcare have taken this opportunity and applied BD and BDA in their business models (Manyika et al., 2011).  McKinsey Institute predicted $300 billion as a potential annual value to US healthcare (Manyika et al., 2011).  

The healthcare industry generated extensive data driven by keeping patients’ records, complying with regulations and policies, and patients care (Raghupathi & Raghupathi, 2014).  The current trend is digitalizing this explosive growth of the data in the age of Big Data (BD) and Big Data Analytics (BDA) (Raghupathi & Raghupathi, 2014).  BDA has made a revolution in healthcare by transforming the valuable information, knowledge to predict epidemics, cure diseases, improve quality of life, and avoid preventable deaths (Van-Dai, Chuan-Ming, & Nkabinde, 2016).  Various applications of BDA in healthcare include pervasive health, fraud detection, pharmaceutical discoveries, clinical decision support system, computer-aided diagnosis, and biomedical applications. 

Healthcare Big Data Benefits and Challenges

            Healthcare sector employs BDA in various aspect of healthcare such as detecting diseases at early stages, providing evidence-based medicine, minimizing doses of medication to avoid any side effects, and delivering useful medicine base on genetic analysis.  The use of BD and BDA can reduce the re-admission rate, and thereby the healthcare related costs for patients are reduced.  Healthcare BDA can be used to detect spreading diseases earlier before the disease gets spread using real-time analytics (Archenaa & Anita, 2015; Raghupathi & Raghupathi, 2014; Wang, Kung, & Byrd, 2018).   Example of the application of BDA in the healthcare system is Kaiser Permanente implementing a HealthConnect technique to ensure data exchange across all medical facilities and promote the use of electronic health records (Fox & Vaidyanathan, 2016).

            Despite the various benefits of BD and BDA in the healthcare sector, various challenges and issues are emerging from the application of BDA in healthcare.  The nature of the healthcare industry poses challenging to BDA (Groves, Kayyali, Knott, & Kuiken, 2016).  The episodic culture, the data puddles, and the IT leadership are the three significant challenges of the healthcare industry to apply BDA.  The episodic culture addresses the conservative culture of the healthcare and the lack of IT technologies mindset creating rigid culture.  Few providers have overcome this rigid culture and started to use the BDA technology. The data puddles reflect the silo nature of healthcare.  Silo is described as one of the most significant flaws in the healthcare sector (Wicklund, 2014).  The use of the technology properly is lacking in healthcare sector resulting in making the industry fall behind other industries. All silos use their methods to collect data from labs, diagnosis, radiology, emergency, case management and so forth.  The IT leadership is another challenge is caused by the rigid culture of the healthcare industry.  The lack of the latest technologies among the IT leadership in the healthcare industry is a severe problem. 

Healthcare Data Sources for Data Analytics

            The current healthcare data is collected from clinical and non-clinical sources (InformationBuilders, 2018; Van-Dai et al., 2016; Zia & Khan, 2017).  The electronic healthcare records are digital copies of the medical history of the patients.  It contains a variety of data relevant to the care of the patients such as demographics, medical problems, medications, body mass index, medical history, laboratory test data, radiology reports, clinical notes, and payment information. These electronic healthcare records are the most important data in healthcare data analytics, because it provides effective and efficient methods for the providers and organizations to share data (Botta, de Donato, Persico, & Pescapé, 2016; Palanisamy & Thirunavukarasu, 2017; Van-Dai et al., 2016; Wang et al., 2018).  

The biomedical imaging data plays a crucial role in healthcare data to aid disease monitoring, treatment planning and prognosis.  This data can be used to generate quantitative information and make inferences from the images that can provide insights into a medical condition.  The images analytics is more complicated due to the noises of the data associated with the images and is one of the significant limitations with biomedical analysis (Ji, Ganchev, O’Droma, Zhang, & Zhang, 2014; Malik & Sangwan, 2015; Van-Dai et al., 2016). 

The sensing data is ubiquitous in the medical domain both for real-time and for historical data analysis.  The sensing data involve several forms of medical data collection instruments such as the electrocardiogram (ECG) and electroencephalogram (EEG) which are vital sensors to collect signals from various parts of the human body.  The sensing data plays a significant role for intensive care units (ICU) and real-time remote monitoring of patients with specific conditions such as diabetes or high blood pressure.  The real-time and long-term analysis of various trends and treatment in remote monitoring programs can help providers monitor the state of those patients with certain conditions(Van-Dai et al., 2016). 

The biomedical signals are collected from many sources such as hearts, blood pressure, oxygen saturation levels, blood glucose, nerve conduction, and brain activity.  Examples of biomedical signals include electroneurogram (ENG), electromyogram (EMG), electrocardiogram (ECG), electroencephalogram (EEG), electrogastrogram (EGG), and phonocardiogram (PCG).  The biomedical signals real-time analytics will provide better management of chronic diseases, earlier detection of adverse events such as heart attacks, and strokes and earlier diagnosis of disease.   These biomedical signals can be discrete or continuous based on the kind of care or severity of a particular pathological condition (Malik & Sangwan, 2015; Van-Dai et al., 2016).

The genomic data analysis helps better understand the relationship between various genetic, mutations, and disease conditions. It has great potentials in the development of various gene therapies to cure certain conditions.  Furthermore, the genomic data analytics can assist in translating genetic discoveries into personalized medicine practice (Liang & Kelemen, 2016; Luo, Wu, Gopukumar, & Zhao, 2016; Palanisamy & Thirunavukarasu, 2017; Van-Dai et al., 2016).

The clinical text data analytics using the data mining are the transformation process of the information from clinical notes stored in unstructured data format to useful patterns.  The manual coding of clinical notes is costly and time-consuming, because of their unstructured nature, heterogeneity, different format, and context across different patients and practitioners.  Various methods such as natural language processing (NLP) and information retrieval can be used to extract useful knowledge from large volume of clinical text and automatically encoding clinical information in a timely manner (Ghani, Zheng, Wei, & Friedman, 2014; Sun & Reddy, 2013; Van-Dai et al., 2016).

The social network healthcare data analytics is based on various kinds of collected social media sources such as social networking sites, e.g., Facebook, Twitter, Web Logs, to discover new patterns and knowledge that can be leveraged to model and predict global health trends such as outbreaks of infections epidemics (InformationBuilders, 2018; Luo et al., 2016; Van-Dai et al., 2016; Zia & Khan, 2017). Figure 1 shows a summary of these healthcare data sources.


Figure 1.  Healthcare Data Sources.

Healthcare Big Data Analytics Design Proposal Using Hadoop

            The implementation of BDA in the hospitals within the four States aims to improve the safety of the patient, the clinical outcomes, promoting wellness and disease management (Alexandru et al., 2016; HIMSS, 2018).  The BDA system will take advantages of the large healthcare-generated data to provide various applied analytical disciplines such as statistical, contextual, quantitative, predictive and cognitive spectrums (Alexandru et al., 2016; HIMSS, 2018).  These applied analytical disciplines will drive the fact-based decision making for planning management and learning in hospitals (Alexandru et al., 2016; HIMSS, 2018). 

            The proposal begins with the requirements, followed by the data flow diagram, the communication flowcharts, and the overall system diagram.  The proposal addresses the regulations, policies, and governance for the medical system.  The limitation and assumptions are also addressed in this proposal, followed by the justification for the overall design.

1.      Basic Design Requirements

The basic requirement for the implementation of this proposal included not only the tools and required software, but also the training at all levels from staff, to nurses, to clinicians, to patients.  The list of the requirements is divided into system requirement, implementation requirement, and training requirements. 

1.1 Cloud Computing Technology Adoption Requirement

The volume is one of the significant characteristics of BD, especially in the healthcare industry (Manyika et al., 2011).  Based on the challenges addressed earlier when dealing with BD and BDA in healthcare, the system requirements cannot be met using the traditional on-premise technology center, as it cannot handle the intensive computation requirements of BD, and the storage requirement for all the medical information from various hospitals from the four States (Hu et al., 2014). Thus, the cloud computing environment is found to be more appropriate and a solution for the implantation of this proposal.  Cloud computing plays a significant role in BDA (Assunção et al., 2015).  The massive computation and storage requirement of BDA brings the critical need for cloud computing emerging technology (Mehmood, Natgunanathan, Xiang, Hua, & Guo, 2016).  Cloud computing offers various benefits such as cost reduction, elasticity, pay per use, availability, reliability, and maintainability (Gupta, Gupta, & Mohania, 2012; Kritikos, Kirkham, Kryza, & Massonet, 2017).  However, although cloud computing offers various benefits, it has security and privacy issues using the standard deployment models of public cloud, private cloud, hybrid cloud, and community cloud.  Thus, one of the major requirements is to adopt the Virtual Private Cloud as it has been regarded as the most prominent approach to trusted computing technology (Abdul, Jena, Prasad, & Balraju, 2014).

 1.2 Security Requirement

Cloud computing has been facing various threats (Cloud Security Alliance, 2013, 2016, 2017).   Records showed that over the last three years from 2015 until 2017, the number of breaches, lost medical records, and settlements of fines are staggering (Thompson, 2017).  The Office of Civil Rights (OCR) issued 22 resolution agreements, requiring monetary settlements approaching $36 million (Thompson, 2017).  Table 1 shows the data categories and the total for each year. 

Table 1.  Approximation of Records Lost by Category Disclosed on HHS.gov (Thompson, 2017)

Furthermore, a recent report published by HIPAA showed the first three months of 2018 experienced 77 healthcare data breaches reported to the OCR (HIPAA, 2018d).  In the second quarter of 2018, at least 3.14 million healthcare records were exposed (HIPAA, 2018a).  In the third quarter of 2018, 4.39 million records exposed in 117 breaches (HIPAA, 2018c).

Thus, the protection of the patients’ private information requires the technology to extract, analyze, and correlated potentially sensitive dataset (HIPAA, 2018b).  The implementation of BDA requires security measures and safeguards to protect the privacy of the patients in the healthcare industry (HIPAA, 2018b).  Sensitive data should be encrypted to prevent the exposure of data in the event of theft (Abernathy & McMillan, 2016).  The security requirements involve security at the VPC cloud deployment model as well as at the local hospitals in each State (Regola & Chawla, 2013).  The security at the VPC cloud deployment model should involve the implementation of security groups and network access control lists to allow access to the right individuals to the right applications and patients’ records.  Security group in VPC acts as the first line of defense firewall for the associated instances of the VPC (McKelvey, Curran, Gordon, Devlin, & Johnston, 2015).  The network access control lists act as the second layer of defense firewall for the associated subnets, controlling the inbound and the outbound traffic at the subnet level (McKelvey et al., 2015). 

The security at the local hospitals level in each State is mandatory to protect patients’ records and comply with HIPAA regulations (Regola & Chawla, 2013).  The medical equipment must be secured with authentication and authorization techniques so that only the medical staff, nurses and clinicians have access to the medical devices based on their role.  The general access should be prohibited as every member of the hospital has a different role with different responses.  The encryption should be used to hide the meaning or intent of communication from unintended users (Stewart, Chapple, & Gibson, 2015).   The encryption is an essential element in security control especially for the data in transit (Stewart et al., 2015).  The hospital in all four State should implement the encryption security control using the same type of the encryption across the hospitals such as PKI, cryptographic application, and cryptography and symmetric key algorithm (Stewart et al., 2015).

The system requirements should also include the identity management systems that can correspond with the hospitals in each state. The identity management system provides authentication and authorization techniques allowing only those who should have access to the patients’ medical records.  The proposal requires the implementation of various encryption techniques such as secure socket layer (SSL), Transport Layer Security (TLS), and Internet Protocol Security (IPSec) to protect information transferred in public network (Zhang, R. & Liu, 2010).  

 1.3 Hadoop Implementation for Data Stream Processing Requirement

While the velocity of BD leads to the speed of generating large volume of data and requires speed in data processing (Hu et al., 2014), the variety of the data requires specific technology capabilities to handle various types of dataset such as structured, semi-structured, and unstructured data (Bansal, Deshpande, Ghare, Dhikale, & Bodkhe, 2014; Hu et al., 2014).  Hadoop ecosystem is found to be the most appropriate system that is required to implement BDA (Bansal et al., 2014; Dhotre, Shimpi, Suryawanshi, & Sanghati, 2015).  The implementation requirements include various technologies and various tools.  This section covers various components that are required when implementing Hadoop technology in the four States for healthcare BDA system.

Hadoop has three significant limitations, which must be addressed in this design.  The first limitation is the lack of technical support and document for open source Hadoop (Guo, 2013).   Thus, this design requires the Enterprise Edition of Hadoop to get around this limitation using Cloudera, Hortonworks, and MapR (Guo, 2013). The final decision for which product will be determined by the cost analysis team.  The second limitation is that Hadoop is not optimal for real-time data processing (Guo, 2013). The solution for this limitation will require the integration of real-time streaming program as Spark or Storm or Kafka (Guo, 2013; Palanisamy & Thirunavukarasu, 2017). This requirement of integrating Spark is discussed below in a separate requirement for this design (Guo, 2013). The third limitation is that Hadoop is not a good fit for large graph dataset (Guo, 2013). The solution for this limitation requires the integration of GraphLab which is also discussed below in a separate requirement for this design.

1.3.1 Hadoop Ecosystem for Data Processing

Hadoop technologies have been in the front-runner for Big Data application (Bansal et al., 2014; Chrimes, Zamani, Moa, & Kuo, 2018).  Hadoop ecosystem will be part of the implementation requirement as it is proven to serve well with intensive computation using large datasets (Raghupathi & Raghupathi, 2014; Wang et al., 2018).   The implementation of Hadoop technology will be performed in the VPC deployment model.  The Hadoop version that is required is version 2.x to include YARN for resource management  (Karanth, 2014).  Hadoop 2.x also include HDFS snapshots to provide a read-only image of the entire or a particular subset of a filesystem to protect against user errors, backup, and disaster recovery (Karanth, 2014). The Hadoop platform can be implemented to gain more insight into various areas (Raghupathi & Raghupathi, 2014; Wang et al., 2018). Hadoop ecosystem involves Hadoop Distributed File System, MapReduce, and NoSQL database such as HBase, and Hive to handle a large volume of dataset using various algorithms and machine learning to extract values from the medical records that are structured, semi-structured, and unstructured (Raghupathi & Raghupathi, 2014; Wang et al., 2018).  Other components to support Hadoop ecosystem include Oozie for workflow, Pig for scripting, and Mahout for machine learning which is part of the artificial intelligence (AI) (Ankam, 2016; Karanth, 2014).  Hadoop ecosystem will also include Flume for log collector, Sqoop for data exchange, and Zookeeper for coordination (Ankam, 2016; Karanth, 2014).  HCatalog is a required component to manage the metadata in Hadoop (Ankam, 2016; Karanth, 2014).   Figure 2 shows the Hadoop ecosystem before integrating Spark for real-time analytics.


Figure 2.  Hadoop Architecture Overview (Alguliyev & Imamverdiyev, 2014).

1.3.2 Hadoop-specific File Format for Splittable and Agnostic Compression

The ability of splittable files plays a significant role during the data processing (Grover, Malaska, Seidman, & Shapira, 2015).  Therefore, Hadoop-specific file formats of SequenceFile, and Serialization formats like Avro, and columnar formats such as RCFile and Parquet should be used because these files share two essential characteristics that are essential for Hadoop applications: splittable compression and agnostic compression (Grover et al., 2015).  Hadoop allows large files to be split for input to MapReduce and other types of jobs, which is required for parallel processing and an essential key to leveraging data locality feature of Hadoop (Grover et al., 2015). The agnostic compression is required to compress data using any compression codec without readers having to know the codec because the codec is stored in the header metadata of the file format (Grover et al., 2015).  Figure 3 summarizes the three Hadoop file types with the two common characteristics.  


Figure 3. Three Hadoop File Types with the Two Common Characteristics.  

1.3.3 XML and JSON Use in Hadoop

The clinical data include semi-structured formats such as XML and JSON.  The split process of XML and JSON is not straightforward and can present unique challenges using Hadoop (Grover et al., 2015).  Since and Hadoop does not provide a built-in InputFormat for either format of XML and JSON (Grover et al., 2015).  Furthermore, JSON presents more challenges to Hadoop than XML because no token is available to mark the beginning or end of a record (Grover et al., 2015). When using these file format, two primary considerations must be taken.  The container format such as Avro should be used because Avro provides a compact and efficient method to store and process the data when transforming the data into Avro (Grover et al., 2015).  A library for processing XML or JSON should be designed (Grover et al., 2015).  XMLLoader in PiggyBank library for Pig is an example when using XML data type.  The Elephant Bird project is an example of a JSON data type file (Grover et al., 2015). 

1.4 HBase and MongoDB NoSQL Database Integration Requirement

In the age of BD and BDA, the traditional data store is found inadequate to handle not only the large volume of the dataset but also the various types of the data format such as unstructured and semi-structured (Hu et al., 2014).   Thus, Not Only SQL (NoSQL) database is emerged to meet the requirement of the BDA.  These NoSQL data stores are used for modern, and scalable databases (Sahafizadeh & Nematbakhsh, 2015).  The scalability feature of the NoSQL data stores enables the systems to increase the throughput when the demand increases during the processing of the data (Sahafizadeh & Nematbakhsh, 2015).  The platform can incorporate two scalability types to support the large volume of the datasets; the horizontal and vertical scalability.  The horizontal scaling allows the distribution of the workload across many servers and nodes to increase the throughput, while the vertical scaling requires more processors, more memories and faster hardware to be installed on a single server (Sahafizadeh & Nematbakhsh, 2015). 

NoSQL data stores have various types such as MongoDB, CouchDB, Redis, Voldemort, Cassandra, Big Table, Riak, HBase, Hypertable, ZooKeeper, Vertica, Neo4j, db4o, and DynamoDB.  These data stores are categorized into four types: document-oriented, column-oriented or column-family stores, graph database, and key-value (EMC, 2015; Hashem et al., 2015). The document-oriented data store can store and retrieve collections of data and documents using complex data forms in various formats such as XML and JSON as well as PDF and MS word (EMC, 2015; Hashem et al., 2015).  MongoDB and CouchDB are examples of document-oriented data stores (EMC, 2015; Hashem et al., 2015).  The column-oriented data store can store the content in columns aside from rows with the attributes of the columns stored contiguously (Hashem et al., 2015).  This type of datastore can store and render blog entries, tags, and feedback (Hashem et al., 2015).  Cassandra, DynamoDB, and HBase are examples of column-oriented data stores (EMC, 2015; Hashem et al., 2015).  The key-value can store and scale large volumes of data and contains value and a key to access the value (EMC, 2015; Hashem et al., 2015).  The value can be complicated, but this type of data stores can be useful in storing the user’s login ID as the key referencing the value of patients.  Redis and Riak are examples of the key-value NoSQL data store (Alexandru et al., 2016).  Each of these NoSQL data stores has its limitations and advantages.  The graph NoSQL database can store and represent data using graph models with nodes, edges, and properties related to one another through relations which will be useful for unstructured medical data such as images, and lab results. Neo4j is an example of this type of graph NoSQL database (Hashem et al., 2015).  Figure 4 summarizes these NoSQL data stores, data types for storage, and examples.

Figure 4.  Big Data Analytics NoSQL Data Store Types.

The proposed design requires one or more NoSQL data stores to meet the requirement of BDA using Hadoop environment for this healthcare BDA system.  Healthcare big data has unique characteristics which must be addressed when selecting the data store and consideration must be taken for the various types of data.   HBase and HDFS are the commonly used storage manager in the Hadoop environment (Grover et al., 2015).  HBase is a column-oriented data store which will be used to store multi-structured data (Archenaa & Anita, 2015).  HBase sets on top of HDFS in the Hadoop ecosystem framework (Raghupathi & Raghupathi, 2014).   

MongoDB will also be used to store the semi-structured data set such as XML and JSON. Metadata for HBase data schema, to improve the accessibility and readability of HBase data schema (Luo et al., 2016).  Riak will be used for a key-value dataset which can be used for the dictionary, hash tables and associative arrays that can be used for login and user ID information for patients as well as for providers and clinicians (Klein et al., 2015).  Neo4j NoSQL will be used to store the images with nodes and edges such as Lab images, XRays (Alexandru et al., 2016).

The proposed healthcare system has a logical data model and query patterns that need to be supported by NoSQL databases (Klein et al., 2015). The data model will include reading the medical test results for patients is a core function used to populate the user interface. It will also include a strong replica consistency when a new medical result is written for a patient.  Providers can make patient care decisions using these records.  All providers will be able to see the same information within the hospital systems in the four States, whether they are at the same site as the patients, or providing telemedicine support from another location. 

The logical data model includes mapping the application-specific model into the particular data model, indexing, and query language capabilities of each database.  The HL7 Fast Healthcare Interoperability Resources (FHIR) is used as the logical data model for records analysis.  The patient’s data such as demographic information such as names, addresses, and telephone will be modeled using the FHIR Patient Resources such as result quantity, and result units (Klein et al., 2015). 

1.5 Spark Integration for Real-Time Data Processing Requirement

While the architecture of Hadoop ecosystem has been designed in various scenarios for data storage, data management statistical analysis, and statistical association between various data sources distributed computing and batch processing, this proposal requires real-time data processing which cannot be met by Hadoop alone (Basu, 2014).  Real-time analytics will tremendous value to the healthcare proposed system.  Thus, Apache Spark is another component which is required to implement this proposal (Basu, 2014).  Spark allows in-memory processing for fast response time, bypassing MapReduce operations (Basu, 2014).  With Spark integration with Hadoop, stream processing, machine learning, interactive analytics, and data integration will be possible (Scott, 2015).  Spark will run on top of Hadoop to benefit from YARN and the underlying storage of HDFS, HBase and other Hadoop ecosystem building blocks (Scott, 2015).  Figure 5 shows the core engines of the Spark.


Figure 5. Spark Core Engines (Scott, 2015).

 1.6 Big Healthcare Data Visualization Requirement

Visualization is one of the most powerful presentations of the data (Jayasingh, Patra, & Mahesh, 2016).  It helps in viewing the data in a more meaningful way in the form of graphs, images, pie charts that can be understood easily.  It helps in synthesizing a large volume of data set such as healthcare data to get at the core of such raw big data and convey the key points from the data for insight (Meyer, 2018).  Some of the commercial visualization tools include Tableau, Spotfire, QlikView, and Adobe Illustrator.  However, the most commonly used visualization tools in healthcare include Tableau, PowerBI, and QlikView. This healthcare design proposal will utilize Tableau. 

Healthcare providers are successfully transforming data from information to insight using Tableau software.  Healthcare organizations can utilize three approaches to get more from the healthcare datasets.  The first approach is to break the data access by empowering the departments in healthcare to explore their data.  The second approach is to uncover answers with data from multiple systems to reveal trends and outliers.  The third approach is to share insights with executives, providers, and others to drive collaboration (Tableau, 2011).  It has several advantages including the interactive visualization using drag-n-drop techniques, handling large amounts of data and millions of rows of data with ease, and other scripts such as Python can be integrated with Tableau (absentdata.com, 2018).  It also provides mobile support and responsive dashboard.  The limitation of Tableau is that it requires substantial training to fully master the platform, among other limitations including lack of automatic refreshing,  conditional formatting and 16-column table limit (absentdata.com, 2018).   Figure 6 shows the Patient Cycle Time data visualization using Tableau software.


Figure 6. Patient Cycle Time Data Visualization Example (Tableau, 2011).

1.7 Artificial Intelligence Integration Requirement

Artificial Intelligence is a computational technique allowing machines to perform cognitive functions such as acting or reacting to input, similar to the way humans do (Patrizio, 2018).  The traditional computing applications react to data, and the reactions and responses must be hand-coded with human intervention (Patrizio, 2018).  The AI systems are continuously in a flux mode changing their behavior to accommodate any changes in the results and modifying their reactions accordingly (Patrizio, 2018). The AI techniques can include video recognition, natural language processing, speech recognition, machine learning engines, and automation (Mills, 2018)

Healthcare system can benefit from BDA integration with Artificial Intelligence (AI) (Bresnick, 2018).  Since AI can play a significant role in BDA in the healthcare system, this proposal suggests the implementation of machine learning which is part of the AI to deploy more precise and impactful interventions at the right time in the care of patients (Bresnick, 2018).  The application of AI in the proposed design requires machine learning (Patrizio, 2018).  Since the data used in the AI and machine learning is already cleaned after removing the duplicates and unnecessary data, AI can take advantages of these filtered data leading to many healthcare breakthroughs such as genomic and proteomic experiments to enable personalized medicine (Kersting & Meyer, 2018).

The healthcare industry has been utilizing AI, machine learning (ML) and data mining (DM) to extract value from BD by transforming the large medical datasets into actionable knowledge performing predictive and prescriptive analytics (Palanisamy & Thirunavukarasu, 2017).   The ML will be used to utilize the AI to develop sophisticated algorithm processing massive medical datasets including the structured, unstructured, and semi-structured data performing advanced analytics (Palanisamy & Thirunavukarasu, 2017).  Apache Mahout, which is an open source for ML, will be integrated with Hadoop to facilitate the execution of scalable machine learning algorithms, offering various techniques such as recommendation, classification, and clustering (Palanisamy & Thirunavukarasu, 2017).

1.8 Internet of Things (IoT) Integration Requirement

Internet of Things (IoT) refers to the increased connected devices with IP addresses which were not common years ago  (Anand & Clarice, 2015; Thompson, 2017).  These connected devices collect and use the IP addresses to transmit information (Thompson, 2017).    Providers in healthcare take advantages of the collected information to find new treatment methods and increase efficiency (Thompson, 2017).

The implementation of IoT will involve various technologies including frequency identification (RFID), near field communication (NFC), machine to machine (M2M), wireless sensor network (WSM), and addressing schemes (AS) (IPv6 addresses) (Anand & Clarice, 2015; Kumari, 2017).  The implementation of IoT requires machine learning and algorithm to find patterns, correlations, and anomalies that have the potential of enabling healthcare improvements (O’Brien, 2016).  Machine learning is a critical component of artificial intelligence. Thus, the success of IoT depends on AI implementation. 

1.9 Training Requirement

This design proposal requires various training to IT professionals, providers and clinician and those who will be using this healthcare ecosystem depending on their role (Alexandru et al., 2016; Archenaa & Anita, 2015). Each component of this ecosystem should have training such as training for Hadoop/MapReduce, Spark, Security, and so forth.  The training will play a significant role in the success of this design implementation to apply BD and BDA in the healthcare system in the four States of Colorado, Utah, Arizona, and New Mexico.   Patients should be considered in training for remote monitoring programs such as blood sugar monitoring, and blood pressure monitoring applications.  The senior generation might face some challenges.  However, with the technical support, this challenge can be alleviated.

2.      Data Flow Diagram

            This section discusses the data flow for the proposed design for the healthcare ecosystem for the application of BDA. 

2.1 HBase Cluster and HDFS Data Flow

HBase stores data into table schema and specify the column family (Yang, Liu, Hsu, Lu, & Chu, 2013).  The table schema must be predefined, and the column families must be specified.  New columns can be added to families as required making the schema-flexible and can adapt to changing application requirements (Yang et al., 2013).   HBase is developed in a similar way like HDFS with a NameNode and slave nodes, and MapReduce with JobTracker and TaskTracker slaves (Yang et al., 2013).  HBase will play a vital role in the cluster environment of Hadoop system.  In HBase master node called HMaster will manage the cluster, and region servers store portions of the tables and perform the work on the data. The HMaster reflects the Master Server and is responsible for monitoring all RegionServer instances in the cluster and is the interface for all metadata changes.  This Master executes on the NameNode in the distributed cluster Hadoop environment.  The HRegionServer represents the RegionServer and is responsible for serving and managing regions.  The RegionServer runs on a DataNode in the distributed cluster Hadoop environment.   The ZooKeeper will assist other machines are selected within the cluster as HMaster in case of a failure, unlike HDFS framework where NameNode has a single point of availability issue.  Thus, the data flow between the DataNodes and the NameNodes when integrating HBase on top of HDFS is shown in Figure 7.  


Figure 7.  HBase Cluster Data Flow (Yang et al., 2013).

2.2 HBase and MongoDB with Hadoop/MapReduce and HDFS Data Flow

The healthcare system integrates four significant components such as HBase, MongoDB, MapReduce, and Visualization.  HBase is used for data storage, MongoDB is used for metadata, MapReduce using Hadoop for computation, and data visualization tool.  The signal data will be stored in HBase while the metadata and other clinical data will be stored in MongoDB.  The data stored in both HBase and MongoDB will be accessible from the Hadoop/MapReduce environment for processing and the data visualization layer as well.   One master node and eight slave nodes, and several supporting servers.   The data will be imported to Hadoop and processed via MapReduce.  The result of the computational process will be viewed through a data visualization tool such as Tableau.  Figure 8 shows the data flow between these four components of the proposed healthcare ecosystem.


Figure 8.  The Proposed Data Flow Between Hadoop/MapReduce and Other Databases.

2.3 XML Design Flow Using ETL Process with MongoDB 

Healthcare records have various types of data from structured, semi-structured to unstructured (Luo et al., 2016).   Some of these healthcare records are XML-based records in the semi-structured format using tags.  XML stands for eXtensible Markup Language (Fawcett, Ayers, & Quin, 2012).  Healthcare sector can drive value from these XML documents which reflect semi-structured data (Aravind & Agrawal, 2014).  Example of this XML-based patients records shows in Figure 9.


Figure 9.  Example of the Patient’s Electronic Health Record (HL7, 2011)

XML-based records need to get ingested into Hadoop system for the analytical purpose to derive value from this semi-structured XML-based data.   However, Hadoop does not offer a standard XML “RecordReader” (Lublinsky, Smith, & Yakubovich, 2013).  XML is one of the standard file formats for MapReduce.  Various approaches can be used to process XML semi-structured data.  The process of ETL (Extract, Transform and Load) can be used to process XML data in Hadoop.  MongoDB is a NoSQL database which is required in this design proposal.  It handles XML document-oriented type. 

The ETL process in MongoDB starts with the extract and transform.  The MongoDB application provides the ability to map the XML elements within the document to the downstream data structure.  The application supports the ability to unwind simple arrays or present embedded documents using appropriate data relationships such as one-to-one (1:1), one-to-many (1: M), or many-to-many (M: M) (MongoDB, 2018).  The application infers the schema information by examining a subset of documents within target collections.  Organizations can add fields to the discovered data model that may not have been present within the subset of documents used for schema inference.  The application infers information about the existing indexes for collections to be queried.  It prompts or warns of queries that do not contain any indexes fields.  The application can return a subset of fields from documents using query projections.  For queries against MongoDB Replica Sets, the application supports the ability to specify custom MongoDB Read Preferences for individual query operations.  The application then infers information about sharded cluster deployment and note the shard key fields for each sharded collection.  For queries against MongoDB Sharded Clusters, the application warns against queries that do not use proper query isolation.  Broadcast queries in a sharded cluster can have a negative impact on database performance (MongoDB, 2018). 

The load process in MongoDB is performed after the extract and transform process.  The application supports the ability to write data to any MongoDB deployment whether a single node, replica set or sharded cluster.  For writes to a MongoDB Sharded Cluster, the application informs or display an error message to the user if XML documents do not contain a shard key.  A custom WriteConcern can be used for any write operations to a running MongoDB deployment.  For the bulk loading operations, writing documents in batches using the insert() method can be used using the MongoDB 2.6 version or above, which supports the bulk update database command. For the bulk loading into a MongoDB sharded deployment, the bulk insert into a sharded collection is supported, including the pre-splitting of the collections’ shard key and inserting via multiple mongos processes.   Figure 10 shows this ETL process for XML-based patients records using MongoDB.


Figure 10.  The Proposed XML ETL Process in MongoDB.

2.4 Real-Time Streaming Spark Data Flow

Real-Time streaming can be implemented using any real-time streaming program such as Spark, Kafka, or Storm.  This healthcare design proposal will integrate Spark open-source program for the real-time streaming data such as sensing data, from various sources such as intensive care units, remote monitoring programs, biomedical signals. The data from various sources will be flow into Spark for analytics and then imported to the data storage systems.  Figure 11 illustrates the data flow for real-time streaming analytics.

Figure 11.  The Proposed Spark Data Flow.

3.      Communication Workflow

The communication flow involves the stakeholders involves in the healthcare system. These stakeholders include providers, insurer, pharmaceutical, and IT professionals and practitioners.  The communication flow is centered with the patient-centric healthcare system using the cloud computing technology for the four States of Colorado, Utah, Arizona, and New Mexico.  These stakeholders are from these states.  The patient-centric healthcare system is the central point for communication.  The patients communicate with the central system using the web-based platform, and clinical forums as needed.  The providers communicate with the patient-centric healthcare system using resource usages, patient feedback, and hospital visits, and services details.  The insurers communicate with the central system using claims database, and census and societal data. The pharmaceutical vendors will communicate with the central system using prescription and drug reports which can be retrieved by the providers from anywhere in these four states. The IT professionals and practitioners will communicate with the central system for data streaming, medical records, genomics, and all omics data analysis and reporting.  Figure 12 shows the communication flow between these stakeholders and the central system in the cloud that can be accessed from any of these identified four States.

Figure 12.  The Proposed Patient-Centric Healthcare System Communication Flow.

4.      Overall System Diagram

The overall system represents the state-of-the-art healthcare ecosystem system that utilizes the latest technology for healthcare Big Data Analytics. The system is bounded by the regulations and policy such as HIPAA to ensure the protection of the patients’ privacy across the various layers of the overall system.  The system integrated components include the Hadoop latest technology with MapReduce and HDFS.  The data government layer is the bottom layer which contains three major building blocks:  master data management (MDM), data life-cycle management (DLM) components, and data security and privacy management.  The MDM component is responsible for data completeness, accuracy, and availability, while the DLM is responsible for archiving the data, maintaining the data warehousing, data deletion, and disposal.   The data security and privacy management building block is responsible for sensitive data discovery, vulnerability and configuration assessment, security policies application, auditing and compliance reporting, activity monitoring, identify and access management, and protecting data.  The top layers include data layer, data aggregation layer, data analytics layer, and information exploration layer.  The data layer is responsible for data sources and content format, while the data aggregation layer involves various components from data acquisition process, transformation engines, and data storage area using Hadoop, HDFS, NoSQL databases such as MongoDB and HBase.  The data analytics layer involves the Hadoop/MapReduce mapping process, stream computing, real-time streaming, and database analytics.  AI and IoT are part of the data analytics layer.  The information exploration layer involves the data visualization layer, visualization reporting, real-time monitoring using healthcare dashboard, and clinical decision support. Figure 13 illustrates the overall system diagram with these layers.


Figure 13.  The Proposed Healthcare Overall System Diagram.

5.      Regulations, Policies, and Governance for the Medical Industry

Healthcare data must be stored in a secure storage area to protect the information and the privacy of patients (Liveri, Sarri, & Skouloudi, 2015).  When the healthcare industry fails to comply with the regulation and policies, the fines and the cost can cause financial stress on the industry (Thompson, 2017).  Records showed that the healthcare industry paid millions of dollars in fines.  The Advocate Health Care in suburban Chicago agreed to the most significant figure as of August 2016 with a total amount of $5.55 million (Thompson, 2017).  Memorial Health System in southern Florida became the second entity to top of paying $5 million (Thompson, 2017). Table 2 shows the five most substantial fines posted to the Office of Civil Rights (OCR) site. 

Table 2.  Five Largest Fines Posted to OCR Web Site (Thompson, 2017)

The hospitals must adhere to the data privacy regulations and legislative rules carefully to protect the patients’ medical records from data breaches (HIPAA).  The proper security policy and risk management must be implemented to ensure the protection of private information as well to minimize the impact of confidential data in case of loss or theft (HIPAA, 2018a, 2018c; Salido, 2010).  The healthcare system design proposal requires the implementation of a system for those hospitals or providers who are not compliant with the regulation and policies and the escalation path (Salido, 2010).  This design proposal implements four major principles as the best practice to comply with required policies and regulation and protect the confidential data assets of the patients and users (Salido, 2010).  The first principle is to honor policies throughout private data life (Salido, 2010).  The second principle for best practice in healthcare design system is to minimize the risk of unauthorized access or misuse of confidential data (Salido, 2010).  The third principle is to minimize the impact of confidential data loss, while the fourth principle is to document appropriate controls and demonstrate their effectiveness (Salido, 2010).  Figure 14 shows these four principles which this healthcare design proposal adheres to ensure protection healthcare data from unauthorized users and comply with the required regulation and policies. 


Figure 14.  Healthcare Design Proposal Four Principles.

6.      Assumptions and Limitations

This design proposal assumes that the healthcare sector in the four States will support the application of BD and BDA across these fours States.  The support includes investment in the proper technology, proper tools and proper training based on the requirements of this design proposal.  The proposal also assumes that the stakeholders including the providers, patients, insurer, pharmaceutical vendors, and practitioners will welcome the application of BDA to take advantages of it to provide efficient healthcare services, increase productivity, decrease costs for healthcare sector as well as for patients, and provide better care to patients.

            The limitation of this proposal is the timeframe that is required to implement it.  With the support of the healthcare sector from these four States, the implementation can be expedited.  However, the silo and the rigid culture of the healthcare may interfere with the implementation which can take longer than expected.   The initial implementation might face unexpected challenges. However, these unexpected challenges will come from the lack of experienced IT professionals and managers in the field of BD and BDA domain.  This design proposal will be enhanced based on the observations from the first few months of the implementation. 

7.      The justification for Overall Design

            The traditional database and analytical systems are found inadequate when dealing with healthcare data in the age of BDA.  The characteristics of the healthcare datasets including the large volume medical records, the variety of the dataset from structured, to semi-structured, to the unstructured dataset, and the velocity of the dataset generation and the data processing requires technology such as cloud computing (Fernández et al., 2014). Cloud computing is found the best solution when dealing with BD and BDA to address the challenges of BD storage, and the intensive-computing processing demands (Alexandru et al., 2016; Hashem et al., 2015).  The healthcare system in the four States will shift the communication technology and services for applications across the hospitals and providers (Hashem et al., 2015).  Some of the advantages of cloud computing adoption include virtualized resources, parallel processing, security and data service integration with scalable data storage (Hashem et al., 2015).  With the cloud computing technology, the healthcare sector in the four States will reduce the cost, and increase the efficiency (Hashem et al., 2015).  When quick access to critical data for patients care is required quickly, the mobility of accessing the data from anywhere is one of the most significant advantages of the cloud computing adoption as recommended by this proposed design  (Carutasu, Botezatu, Botezatu, & Pirnau, 2016). The benefits of cloud computing include technological benefits such as visualization, multi-tenancy, data and storage, security and privacy compliance (Chang, 2015).  The cloud computing also offers economic benefits such as pay per use, cost reduction, return on investment (Chang, 2015).  The non-functional benefits of the cloud computing cover the elasticity, quality of service, reliability, and availability (Chang, 2015).  Thus, the proposed design justifies the use of cloud computing for several benefits as cloud computing is proven the best technology for BDA especially for healthcare data analytics.

            Although cloud computing offers several benefits to the proposed healthcare system, cloud computing has been suffering from security and privacy concerns (Balasubramanian & Mala, 2015; Kazim & Zhu, 2015).  The security concerns involve risk areas such as external data storage, dependency on the public internet, lack of control, multi-tenancy and integration with internal security (Hashizume, Rosado, Fernández-medina, & Fernandez, 2013). The traditional security techniques such as identity, authentication, and authorization are not sufficient for cloud computing environments in their current forms using the standard deployment models of the public cloud, and private cloud  (Hashizume et al., 2013).  The increasing trend in the security threats data breaches, and the current deployment models of private and public clouds, which are not meeting the security challenges, have triggered the need for another deployment to ensure security and privacy protection.  Thus, the VPC deployment model which is a new deployment model of cloud computing technology (Botta et al., 2016; Sultan, 2010; Venkatesan, 2012; Zhang, Q., Cheng, & Boutaba, 2010).  The VPC is taking advantages of technologies such as a virtual private network (VPN) which will allow hospitals and providers to set up their required network settings such as security (Botta et al., 2016; Sultan, 2010; Venkatesan, 2012; Zhang, Q. et al., 2010).  The VPC deployment model will have dedicated resources with the VPN to provide the required isolation for security to protect the patients’ information (Botta et al., 2016; Sultan, 2010; Venkatesan, 2012; Zhang, Q. et al., 2010). Thus, this proposed design will be using VPC cloud computing deployment mode to store and use healthcare data in a secure and isolated environment to protect the patients’ medical records (Regola & Chawla, 2013).

Hadoop ecosystem is a required component in this proposed design for several reasons.  Hadoop technology is a commonly used computing paradigm for massive volume data processing in the cloud computing (Bansal et al., 2014; Chrimes et al., 2018; Dhotre et al., 2015).  Hadoop is the only technology that enables large healthcare volumes of data to be stored in its native forms (Dezyre, 2016).  Hadoop is proven to develop better treatments for diseases such as cancer by accelerating the design and testing of effective treatments tailored to patients, expanding genetically based clinical cancer trials, and establishing a national cancer knowledge network to guide treatment decision (Dezyre, 2016).  With Hadoop system, hospitals in the four States will be able to monitor the patient vitals (Dezyre, 2016).  The Children’s Healthcare of Atlanta is an example of using the Hadoop ecosystem to treat over six thousand children in their ICU units (Dezyre, 2016).

The proposed design requires the integration of NoSQL database because it offers benefits such as mass storage support, reading and writing operations which are fast, and the expansion is easy with a low cost (Sahafizadeh & Nematbakhsh, 2015). HBase is proposed as a required NoSQL database as it is faster when reading more than six million variants which are required when analyzing large healthcare datasets (Luo et al., 2016).  Besides, query engine such as SeqWare can be integrated with HBase as needed to help bioinformatics researchers access large-scale whole-genome datasets (Luo et al., 2016).  HBase can store clinical sensors where the row key serves as the time stamp of a single value, and the column stores patients’ physiological values that correspond with the row key time stamp (Luo et al., 2016). HBase is scalable, high-performance and low-cost NoSQL data store that can be integrated with Hadoop sitting on top of HDFS (Yang et al., 2013). As a column-oriented NoSQL data store that runs on top of HDFS of Hadoop ecosystem, HBase is well suited to parse the healthcare large data sets (Yang et al., 2013). HBase supports applications written in Avro, REST and Thrift (Yang et al., 2013).  MongoDB is another NoSQL data store, which will be used to store metadata to improve the accessibility and readability of the HBase data schema (Luo et al., 2016).

The integration of Spark is required in order to overcome the Hadoop limitation of real-time data processing because Hadoop is not optimal for real-time data processing (Guo, 2013).  Thus, Apache Spark is a required component to implement this proposal so that the healthcare BDA system can take advantages of data processing at rest using the batching technique as well as a motion using the real-time processing technique (Liang & Kelemen, 2016).  Spark allows in-memory processing for fast response time, bypassing MapReduce operations (Liang & Kelemen, 2016).   Spark is a high integration to the recent Hadoop cluster deployment (Scott, 2015).  While Spark is a powerful tool on its own for processing a large volume of medical and healthcare datasets, Spark is not well-suited for production workload.  Thus, the integration of Spark with Hadoop ecosystem provides many capabilities which Spark cannot offer on its own, and Hadoop cannot offer on its own.

The integration of AI as part of this proposal is justified by the examination of Harvard Business Review (HBR) that shows ten promising AI application in healthcare (Kalis, Collier, & Fu, 2018). The findings of HBR’s examination showed that the application of AI could create up to $150 billion in annual savings for U.S. healthcare by 2026 (Kalis et al., 2018).  The result also showed that AI currently creates the most value in assisting the frontline clinicians to be more productive and in making back-end processes more efficient (Kalis et al., 2018).   Furthermore, IBM invested $1 billion in AI through the IBM Watson Group, and healthcare industry is the most significant application of Watson (Power, 2015).

Conclusion

Big Data and Big Data Analytics have played significant roles in various industries including the healthcare industry.  The value that is driven by BDA can save lives and minimize costs for patients.  This project proposes a design to apply BDA in the healthcare system across four States of Colorado, Utah, Arizona, and New Mexico.  Cloud computing is the most appropriate technology to deal with the large volume of healthcare data.  Due to the security issue of the cloud computing, the Virtual Private Cloud (VPC) will be used.  VPC provides a secure cloud environment using network traffic security setup using security groups and network access control lists. 

The project requires other components to be fully implemented using the latest technology such as Hadoop and MapReduce for data streaming processing, machine learning for artificial intelligence, which will be used for Internet of Things (IoT).  The NoSQL database HBase and MongoDB will be used to handle the semi-structured data such as XML and unstructured data such as logs and images.  Spark will be used for real-time data processing which can be vital for urgent care and emergency services.  This project addressed the assumptions and limitations plus the justification for selecting these specific components. 

In summary, all stakeholders in the healthcare sector including providers, insurers, pharmaceuticals, practitioners should cooperate and coordinate to facilitate the implementation process.  All stakeholders are responsible to facilitate the integration of BD and BDA into the healthcare system.  The rigid culture and silo pattern need to change for better healthcare system which can save millions of dollars to the healthcare industry and provide excellent care to the patients at the same time.

References

Abdul, A. M., Jena, S., Prasad, S. D., & Balraju, M. (2014). Trusted Environment In Virtual Cloud. International Journal of Advanced Research in Computer Science, 5(4).

Abernathy, R., & McMillan, T. (2016). CISSP Cert Guide: Pearson IT Certification.

absentdata.com. (2018). Tableau Advantages and Disadvantages. Retrieved from https://www.absentdata.com/advantages-and-disadvantages-of-tableau/.

Alexandru, A., Alexandru, C., Coardos, D., & Tudora, E. (2016). Healthcare, Big Data and Cloud Computing. management, 1, 2.

Alguliyev, R., & Imamverdiyev, Y. (2014). Big data: big promises for information security. Paper presented at the Application of Information and Communication Technologies (AICT), 2014 IEEE 8th International Conference on.

Anand, M., & Clarice, S. (2015). Artificial Intelligence Meets Internet of Things. Retrieved from http://www.ijcset.net/docs/Volumes/volume5issue6/ijcset2015050604.pdf.

Ankam, V. (2016). Big Data Analytics: Packt Publishing Ltd.

Aravind, P. S., & Agrawal, V. (2014). Processing XML data in BigInsights 3.0. Retrieved from https://developer.ibm.com/hadoop/2014/10/31/processing-xml-data-biginsights-3-0/.

Archenaa, J., & Anita, E. M. (2015). A survey of big data analytics in healthcare and government. Procedia Computer Science, 50, 408-413.

Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A. S., & Buyya, R. (2015). Big Data Computing and Clouds: Trends and Future Directions. Journal of Parallel and Distributed Computing, 79, 3-15. doi:10.1016/j.jpdc.2014.08.003

Balasubramanian, V., & Mala, T. (2015). A Review On Various Data Security Issues In Cloud Computing Environment And Its Solutions. Journal of Engineering and Applied Sciences, 10(2).

Bansal, A., Deshpande, A., Ghare, P., Dhikale, S., & Bodkhe, B. (2014). Healthcare data analysis using dynamic slot allocation in Hadoop. International Journal of Recent Technology and Engineering, 3(5), 15-18.

Basu, A. (2014). Real-Time Healthcare Analytics on Apache Hadoop* using Spark* and Shark. Retrieved from https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/big-data-real-time-healthcare-analytics-whitepaper.pdf.

Botta, A., de Donato, W., Persico, V., & Pescapé, A. (2016). Integration of Cloud Computing and Internet Of Things: a Survey. Future Generation computer systems, 56, 684-700.

Bresnick, J. (2018). Top 12 Ways Artificial Intelligence Will Impact Healthcare. Retrieved from https://healthitanalytics.com/news/top-12-ways-artificial-intelligence-will-impact-healthcare.

Carutasu, G., Botezatu, M., Botezatu, C., & Pirnau, M. (2016). Cloud Computing and Windows Azure. Electronics, Computers and Artificial Intelligence.

Chang, V. (2015). A Proposed Framework for Cloud Computing Adoption. International Journal of Organizational and Collective Intelligence, 6(3).

Chrimes, D., Zamani, H., Moa, B., & Kuo, A. (2018). Simulations of Hadoop/MapReduce-Based Platform to Support its Usability of Big Data Analytics in Healthcare.

Cloud Security Alliance. (2013). The Notorious Nine: Cloud Computing Top Threats in 2013. Cloud Security Alliance: Top Threats Working Group. 

Cloud Security Alliance. (2016). The Treacherous 12: Cloud Computing Top Threats in 2016. Cloud Security Alliance: Top Threats Working Group. 

Cloud Security Alliance. (2017). The Treacherous 12 Top Threats to Cloud Computing. Cloud Security Alliance: Top Threats Working Group. 

Dezyre. (2016). 5 Healthcare Applications of Hadoop and Big Data Retrieved from https://www.dezyre.com/article/5-healthcare-applications-of-hadoop-and-big-data/85.

Dhotre, P., Shimpi, S., Suryawanshi, P., & Sanghati, M. (2015). Health Care Analysis Using Hadoop. Internationaljournalofscientific&tech nologyresearch, 4(12), 279r281.

EMC. (2015). Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. (1st ed.): Wiley.

Fawcett, J., Ayers, D., & Quin, L. R. (2012). Beginning XML: John Wiley & Sons.

Fernández, A., del Río, S., López, V., Bawakid, A., del Jesus, M. J., Benítez, J. M., & Herrera, F. (2014). Big Data with Cloud Computing: An Insight on the Computing Environment, MapReduce, and Programming Frameworks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(5), 380-409. doi:10.1002/widm.1134

Fox, M., & Vaidyanathan, G. (2016). Impacts of Healthcare Big Data:  A Framwork With Legal and Ethical Insights. Issues in Information Systems, 17(3).

Ghani, K. R., Zheng, K., Wei, J. T., & Friedman, C. P. (2014). Harnessing big data for health care and research: are urologists ready? European urology, 66(6), 975-977.

Grover, M., Malaska, T., Seidman, J., & Shapira, G. (2015). Hadoop Application Architectures: Designing Real-World Big Data Applications: ” O’Reilly Media, Inc.”.

Groves, P., Kayyali, B., Knott, D., & Kuiken, S. V. (2016). The ‘Big Data’ Revolution in Healthcare: Accelerating Value and Innovation.

Guo, S. (2013). Hadoop operations and cluster management cookbook: Packt Publishing Ltd.

Gupta, R., Gupta, H., & Mohania, M. (2012). Cloud Computing and Big Data Analytics: What is New From Databases Perspective? Paper presented at the International Conference on Big Data Analytics, Springer-Verlag Berlin Heidelberg.

Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The Rise of “Big Data” on Cloud Computing: Review and Open Research Issues. Information Systems, 47, 98-115. doi:10.1016/j.is.2014.07.006

Hashizume, K., Rosado, D. G., Fernández-medina, E., & Fernandez, E. B. (2013). An analysis of security issues for cloud computing. Journal of internet services and applications, 4(1), 1-13. doi:10.1186/1869-0238-4-5

HIMSS. (2018). 2017 Security Metrics:  Guide to HIPAA Compliance: What Healthcare Entities and Business Associates Need to Know. . Retrieved on 12/1/2018 from  http://www.himss.org/file/1318331/download?token=h9cBvnl2. 

HIPAA. (2018a). At Least 3.14 Million Healthcare Records Were Exposed in Q2, 2018. Retrieved 11/22/2018 from https://www.hipaajournal.com/q2-2018-healthcare-data-breach-report/. 

HIPAA. (2018b). How to Defend Against Insider Threats in Healthcare. Retrieved 8/22/2018 from https://www.hipaajournal.com/category/healthcare-cybersecurity/. 

HIPAA. (2018c). Q3 Healthcare Data Breach Report: 4.39 Million Records Exposed in 117 Breaches. Retrieved 11/22/2018 from https://www.hipaajournal.com/q3-healthcare-data-breach-report-4-39-million-records-exposed-in-117-breaches/. 

HIPAA. (2018d). Report: Healthcare Data Breaches in Q1, 2018. Retrieved 5/15/2018 from https://www.hipaajournal.com/report-healthcare-data-breaches-in-q1-2018/. 

HL7. (2011). Patient Example Instance in XML.  

Hu, H., Wen, Y., Chua, T., & Li, X. (2014). Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. Practical Innovation, Open Solution, 2, 652-687. doi:10.1109/ACCESS.2014.2332453

InformationBuilders. (2018). Data In Motion – Big Data Analytics in Healthcare. Retrieved from http://docs.media.bitpipe.com/io_10x/io_109369/item_674791/datainmotionbigdataanalytics.pdf, White Paper.

Jayasingh, B. B., Patra, M. R., & Mahesh, D. B. (2016, 14-17 Dec. 2016). Security issues and challenges of big data analytics and visualization. Paper presented at the 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I).

Ji, Z., Ganchev, I., O’Droma, M., Zhang, X., & Zhang, X. (2014). A cloud-based X73 ubiquitous mobile healthcare system: design and implementation. The Scientific World Journal, 2014.

Kalis, B., Collier, M., & Fu, R. (2018). 10 Promising AI Applications in Health Care. Retrieved from https://hbr.org/2018/05/10-promising-ai-applications-in-health-care, Harvard Business Review.

Karanth, S. (2014). Mastering Hadoop: Packt Publishing Ltd.

Kazim, M., & Zhu, S. Y. (2015). A Survey on Top Security Threats in Cloud Computing. International Journal Advanced Computer Science and Application, 6(3), 109-113.

Kersting, K., & Meyer, U. (2018). From Big Data to Big Artificial Intelligence? : Springer.

Klein, J., Gorton, I., Ernst, N., Donohoe, P., Pham, K., & Matser, C. (2015, June 27 2015-July 2 2015). Application-Specific Evaluation of No SQL Databases. Paper presented at the 2015 IEEE International Congress on Big Data.

Kritikos, K., Kirkham, T., Kryza, B., & Massonet, P. (2017). Towards a Security-Enhanced PaaS Platform for Multi-Cloud Applications. Future Generation computer systems, 67, 206-226. doi:10.1016/j.future.2016.10.008

Kumari, W. M. P. (2017). Artificial INtelligence Meets Internet of Things.

Liang, Y., & Kelemen, A. (2016). Big Data Science and its Applications in Health and Medical Research: Challenges and Opportunities. Austin Journal of Biometrics & Biostatistics, 7(3).

Liveri, D., Sarri, A., & Skouloudi, C. (2015). Security and Resilience in eHealth: Security Challenges and Risks. European Union Agency For Network And Information Security.

Lublinsky, B., Smith, K. T., & Yakubovich, A. (2013). Professional hadoop solutions: John Wiley & Sons.

Luo, J., Wu, M., Gopukumar, D., & Zhao, Y. (2016). Big data application in biomedical research and health care: a literature review. Biomedical informatics insights, 8, BII. S31559.

Malik, L., & Sangwan, S. (2015). MapReduce Framework Implementation on the Prescriptive Analytics of Health Industry. International Journal of Computer Science and Mobile Computing, ISSN, 675-688.

Maltby, D. (2011). Big Data Analytics. Paper presented at the Annual Meeting of the Association for Information Science and Technology.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute.

McKelvey, N., Curran, K., Gordon, B., Devlin, E., & Johnston, K. (2015). Cloud Computing and Security in the Future Guide to Security Assurance for Cloud Computing (pp. 95-108): Springer.

Mehmood, A., Natgunanathan, I., Xiang, Y., Hua, G., & Guo, S. (2016). Protection of Big Data Privacy. Institute of Electrical and Electronic Engineers, 4, 1821-1834. doi:10.1109/ACCESS.2016.2558446

Meyer, M. (2018). The Rise of Healthcare Data Visualization.

Mills, T. (2018). Eight Ways Big Data And AI Are Changing The Business World.

MongoDB. (2018). ETL Best Practice.  

O’Brien, B. (2016). Why The IoT Needs ARtificial Intelligence to Succeed.

Palanisamy, V., & Thirunavukarasu, R. (2017). Implications of Big Data Analytics in developing Healthcare Frameworks–A review. Journal of King Saud University-Computer and Information Sciences.

Patrizio, A. (2018). Big Data vs. Artificial Intelligence.

Power, B. (2015). Artificial Intelligence Is Almost Ready for Business.

Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2(1), 1.

Regola, N., & Chawla, N. (2013). Storing and Using Health Data in a Virtual Private Cloud. Journal of medical Internet research, 15(3), 1-12. doi:10.2196/jmir.2076

Sahafizadeh, E., & Nematbakhsh, M. A. (2015). A Survey on Security Issues in Big Data and NoSQL. Int’l J. Advances in Computer Science, 4(4), 2322-5157.

Salido, J. (2010). Data Governance for Privacy, Confidentiality and Compliance: A Holistic Approach. ISACA Journal, 6, 17.

Scott, J. A. (2015). Getting Started with Spark: MapR Technologies, Inc.

Stewart, J., Chapple, M., & Gibson, D. (2015). ISC Official Study Guide.  CISSP Security Professional Official Study Guide (7th ed.): Wiley.

Sultan, N. (2010). Cloud Computing for Education: A New Dawn? International Journal of Information Management, 30(2), 109-116. doi:10.1016/j.ijinfomgt.2009.09.004

Sun, J., & Reddy, C. (2013). Big Data Analytics for Healthcare. Retrieved from https://www.siam.org/meetings/sdm13/sun.pdf.

Tableau. (2011). Three Ways Healthcare Probiders are transforming data from information to insight. White Paper.

Thompson, E. C. (2017). Building a HIPAA-Compliant Cybersecurity Program, Using NIST 800-30 and CSF to Secure Protected Health Information.

Van-Dai, T., Chuan-Ming, L., & Nkabinde, G. W. (2016, 5-7 July 2016). Big data stream computing in healthcare real-time analytics. Paper presented at the 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA).

Venkatesan, T. (2012). A Literature Survey on Cloud Computing. i-Manager’s Journal on Information Technology, 1(1), 44-49.

Wang, Y., Kung, L. A., & Byrd, T. A. (2018). Big Data Analytics: Understanding its Capabilities and Potential Benefits for Healthcare Organizations. Technological Forecasting and Social Change, 126, 3-13. doi:10.1016/j.techfore.2015.12.019

Wicklund, E. (2014). ‘Silo’ one of healthcare’s biggest flaws. Retrieved from http://www.healthcareitnews.com/news/silo-one-healthcares-biggest-flaws.

Yang, C. T., Liu, J. C., Hsu, W. H., Lu, H. W., & Chu, W. C. C. (2013, 16-18 Dec. 2013). Implementation of Data Transform Method into NoSQL Database for Healthcare Data. Paper presented at the 2013 International Conference on Parallel and Distributed Computing, Applications and Technologies.

Zhang, Q., Cheng, L., & Boutaba, R. (2010). Cloud Computing: State-of-the-Art and Research Challenges. Journal of internet services and applications, 1(1), 7-18. doi:10.1007/s13174-010-0007-6

Zhang, R., & Liu, L. (2010). Security models and requirements for healthcare application clouds. Paper presented at the Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference on.

Zia, U. A., & Khan, N. (2017). An Analysis of Big Data Approaches in Healthcare Sector. International Journal of Technical Research & Science, 2(4), 254-264.

 

The Relationship Between Internet of Things (IoT) and Artificial Intelligence (AI)

Dr. O. Aly
Computer Science

The purpose of this discussion is to address the relationship between the Internet of Things (IoT) and the Artificial Intelligence (AI), and whether one can be used efficiently without the help from the other.  The discussion begins with the Internet of Things (IoT) and artificial intelligence (AI) overview, followed by the relationship between them. 

Internet of Things (IoT) and Artificial Intelligence Overview

Internet of Things (IoT) refers to the increased connected devices with IP addresses that years ago were not common (Anand & Clarice, 2015; Thompson, 2017).  The connected devices collect and use these IP addresses to transmit information (Thompson, 2017).  Organizations take advantages of the collected information for innovation, enhancing customer service, optimizing processes (Thompson, 2017). Providers in healthcare take advantages of the collected information to find new treatment methods and increase efficiency (Thompson, 2017).

IoT implementation involves various technologies such as radio frequency identification (RFID), near field communication (NFC), machine to machine (M2M), wireless sensor network (WSM), and addressing schemes (AS) (IPv6 addresses) (Anand & Clarice, 2015; Kumari, 2017).   The RFID uses electromagnetic fields to identify and track tags attached to objects.  The NFC is a set of thoughts and technologies where smartphones and other objects want to communicate under IoT.  The M2M is used often for remote monitoring. WSM is a set of a large number of sensors used to monitor environmental conditions.  The AS is the primary tool which is used in IoT and giving IP addresses to each object which wants to communicate (Anand & Clarice, 2015; Kumari, 2017).

Machine learning (ML) is a subset of AI.  Machine learning (ML) involves supervise and unsupervised ML (Thompson, 2017).  In the AI domain, the advances in computer science result in creating intelligent machines that resemble humans in their functions (NMC, 2018).  The access to categories, properties, and relationships between various datasets help develop knowledge engineering allowing computers to simulate the perception, learning, and decision making of human (NMC, 2018).  The ML enables computers to learn without being explicitly programmed (NMC, 2018).  The unsupervised ML and AI allow for security tools such as behavior-based-analytics and anomaly detection (Thompson, 2017).  The neural network of AI help model the biological function of the human brain to interpret and react to specific inputs such as words and tone of voice (NMC, 2018).  The neural networks have been used for voice recognition, and natural language processing (NLP), enabling a human to interact with machines.

The Relationship Between IoT and AI

Various reports and studies have discussed the relationship between IoT and AI.  (O’Brien, 2016) has reported the need of IoT to AI to succeed.  (Jaffe, 2014) suggested the same thing that IoT will not work without AI.  IoT future depends on ML to find patterns, correlations, and anomalies that have the potential of enabling improvement in almost every facet of the daily lives (Jaffe, 2014).

Thus, the success of IoT depends on AI.  IoT follows five necessary steps: sense, transmit, store, analyze and act (O’Brien, 2016). AI plays a significant role in the analyzing step, where the ML which is the subset of AI gets involved in this step.  When ML is applied in the analysis step, it can change the subsequent step of “act” which dictates whether the action has high value or no value to the consumer (O’Brien, 2016).   

(Schatsky, Kumar, & Bumb, 2018) suggested the AI can unlock the potential of IoT. As cited in (Schatsky et al., 2018), Gartner predicts by 2022, more than 80% of enterprise IoT projects will include AI components which are up from only 10% in 2018.  International Data Corp (IDC) predicts by 2019, AI will support “all effective” IoT efforts, and without AI, data from the deployments will have limited value (Schatsky et al., 2018).

Various companies are crafting an IoT strategy to include AI (Schatsky et al., 2018).  Venture capital funding of AI-focused IoT start-ups is growing, while vendors of IoT platforms such as Amazon, GE, IBM, Microsoft, Oracle, and Salesforce are integrating AI capabilities (Schatsky et al., 2018).  The value of AI is the ability to extract insight from data quickly. The ML, which is a subset of AI, enables the automatic identification of patterns and detected anomalies in the data that smart sensors and devices generate (Schatsky et al., 2018).  IoT is expected to combine with the power of AI, blockchain, and other emerging technologies to create the “smart hospitals” of the future (Bresnick, 2018).  Example of AI-powered IoT devices includes automated vacuum cleaners, like that of the iRobot Roomba, smart thermostat solutions, like that of Nest Labs, and self-driving cars, such as that of Tesla Motors (Faggella, 2018; Kumari, 2017).   

Conclusion

This discussion has addressed artificial intelligence (AI) and the internet of things (IoT) and the relationship between them.  Machine learning which is a subset of AI is required for IoT at the analysis phase.  Without this analysis phase, IoT will not provide the value-added insight organizations anticipate.  Various studies and reports have indicated that the success and the future of IoT depend on AI. 

References

Anand, M., & Clarice, S. (2015). Artificial Intelligence Meets Internet of Things. Retrieved from http://www.ijcset.net/docs/Volumes/volume5issue6/ijcset2015050604.pdf.

Bresnick, J. (2018). Internet of Things, AI to Play Key Role in Future Smart Hospitals.

Faggella, D. (2018). Artificial Intelligence Plus the Internet of Things (IoT) – 3 Examples Worth Learning From.

Jaffe, M. (2014). IoT Won’t Work Without Artificial Intelligence.

Kumari, W. M. P. (2017). Artificial Intelligence Meets Internet of Things.

NMC, H. P. (2018). NMC Horizon Report: 2017 Higher Education Edition. Retrieve from https://www.nmc.org/publication/nmc-horizon-report-2017-higher-education-edition/.

O’Brien, B. (2016). Why The IoT Needs ARtificial Intelligence to Succeed.

Schatsky, D., Kumar, N., & Bumb, S. (2018). Bringing the power of AI to the Internet of Things.

Thompson, E. C. (2017). Building a HIPAA-Compliant Cybersecurity Program, Using NIST 800-30 and CSF to Secure Protected Health Information.

The Impact of Artificial Intelligence (AI) on Big Data Analytics (BDA)

Dr. O. Aly
Computer Science

The purpose of this discussion is to discuss the influence of artificial intelligence on big data analytics. As discussed in the previous discussion, Big Data empowers artificial intelligence.  This discussion is about the impact of artificial intelligence in the Big Data Analytics domain.  The discussion begins with artificial intelligence building blocks and big data building blocks, following by the impact of the artificial intelligence in the BDA. 

Artificial Intelligence Building Blocks and Their Impact on BDA

Understanding the building blocks of AI could help understand the impact of AI on BDA.  Various reports and studies have identified various building blocks for AI.  Four building blocks have been identified  In (Chibuk, 2018), four building blocks that are expected to shape the next stage of AI.  The computation methodology is the first building block of AI.  This component is structured in a way to improve the computers move from binary to infinite connections. The storage of the information is the second building block of AI improving storing and accessing data in the more efficient form.  Brain-computer interface is the third building block of AI, through which the human minds would speak silently with a computer, and our thought would turn into actions.  The mathematics and algorithms form the last building block of AI to include advanced mathematics called capsule network and having networks to teach each other based on rules defined (Chibuk, 2018). 

(Rao, 2017) has identified five fundamental building blocks for AI in the banking sector, while they can be easily applicable to other sectors. Machine learning (ML) is the first component of AI in banking where the software can learn on its own without being programmed and adjust its algorithms to respond to new insights. The data mining algorithms hand over findings to a human for further work, while machine learning can act on its own (Rao, 2017).  The financial and banking industry can benefit from machine learning for fraud detection, security settlement and alike (Rao, 2017).  The deep learning (DL) is another building block of AI in the banking industry (Rao, 2017).  DL can leverage a hierarchy of artificial neural networks, similar to the human brain to do its job.  DL mimics the human brain to perform non-linear deductions, unlike the linearly traditional programs (Rao, 2017).  DL can produce better decisions by factoring learning from previous transactions or interactions to conclude (Rao, 2017).  Example of DL is the collected information about customers and their behaviors from social networks, from which their likes and preferences can be inferred, and financial institutions can utilize this insight to make contextual, relevant offers to those customers in real-time (Rao, 2017).   Natural language process (NLP) is the third building block for AI in banking (Rao, 2017).  NLP is a key building block in AI to help computers learn, analyze and understand human language (Rao, 2017).  NLP can be used to organize and structure knowledge in order to answer queries, translate content from one language to another, recognize people by their speech, mine text, and perform sentiment analysis (Rao, 2017). The natural language generation (NLG) is another essential building block in AI, which can help computers analyze, understand, and make sense of human language (Rao, 2017).  It can help converse and interact intelligently with humans (Rao, 2017).  NLG can transform raw data into a narrative, which banks such as Credit Suisse are using to generate portfolio review (Rao, 2017).  Visual recognition is the last component of AI which help recognize images and their content (Rao, 2017). It uses DL to perform its role of finding faces, tagging images, identifying the components of visuals, and picking out similar images from a large dataset (Rao, 2017). Various banks such as Australia’s Westpac is using this technology to allow customers to activate their new card from their smartphone camera, and Bank of America, Citibank, Wells Fargo, and TD Bank are using this technology of visual recognition to allow customers to deposit checks remotely via mobile app (Rao, 2017).

(Gerbert, Hecker, Steinhäuser, & Ruwolt, 2017) have identified ten building blocks for AI.  They have suggested that the simplest AI use cases often consist of a single building block. However, they often evolve to combine two or more blocks over time (Gerbert et al., 2017).  The machine vision is one of the building blocks of AI. The machine vision building block of AI is the classification and tracking of real-world objects based on visual, x-ray, laser or other signals.  The quality of machine vision depends on the labels of a large number of reference images which is performed by a human (Gerbert et al., 2017).  Video-based computer vision is anticipated to recognize actions and predict motions within the next five years (Gerbert et al., 2017).  The speech recognition is another building block which involves the transformation of auditory signals into text (Gerbert et al., 2017).  Siri and Alexa can identify most words in a general vocabulary, but as vocabulary becomes specific, tailored programs such as the PowerScribe of Nuance for radiologist will be needed (Gerbert et al., 2017).  Information processing building block of AI involves searching billions of documents or constructing basic knowledge graphs identifying relationships in text.  This building block is closely related to NLP, which is also identified as another building block of AI (Gerbert et al., 2017).  NLP can provide basic summaries of text and infer intent in some instances (Gerbert et al., 2017). Learning from data is another component of AI, which is a machine learning and able to predict values or classify information based on historical data (Gerbert et al., 2017).  While ML is an element in AI building blocks of machine vision and NLP, it is also a separate building block of AI (Gerbert et al., 2017).  Other building blocks of AI include the planning and exploring agents that can help identify the best sequence of actions to achieve certain goals.  Self-driving cars rely on this building clock for navigation (Gerbert et al., 2017).  The image generation is another building block of AI, which is the opposite of machine vision block, as it creates images based on models.  Speech generation is another building block of AI which covers both data-based text generation and text-based speech synthesis. The handling and control building block of AI refers to interactions with real-world objects (Gerbert et al., 2017). The navigating and movement building block of AI covers the ways where robots move through a given physical environment. The self-driving cars and drones do well with their wheels and rotors.  However, walking on legs especially a single pair of legs is challenging (Gerbert et al., 2017).

Artificial Intelligence (AI) and machine learning (ML) have observed an increasing trend across industries, and public sector (Brook, 2018).  Such increasing trend plays a significant role in the digital world (Brook, 2018).  This increasing trend is driven by the customer-centric view of data involving use data as part of the product or service (Brook, 2018). The customer-centric model assumes data enrichment with data from multiple sources, and the data is divided into real-time data and historical data (Brook, 2018).  Businesses build a trust relationship with customers, where data is becoming the central model for many consumer services such as Amazon, and Facebook (Brook, 2018).   The data value increases over time (Brook, 2018).  The impact of machine learning and artificial intelligence have driven the need for “corporate memory” to be rapidly adopted in organizations.  (Brook, 2018) have suggested organizations implement loosely coupled data silos and data lake which can contribute to the corporate memory and the super-fast data usage in the age of AI-driven data usage.  Various examples of AL and ML impact on BDA and the value of data over time include Coca-Cola’s global market and extensive product list, IBM’s machine learning system Watson, GE Power using BD, ML, and internet of things (IoT) to build internet of energy (Marr, 2018).  Figure 1 shows the impact of AI and ML on Big Data Analytics and the value of the data over time.


Figure 1.  Impact of AI and ML on BDA and the Value of Data Overtime (Brook, 2018).

AI is anticipated to be the most dominant factor that will have a disruptive impact on organizations and businesses (Hansen, 2017).  (Mills, 2018) has suggested that organizations need to embrace BD and AI to help their businesses.  EMC survey has shown that 69% of information technology decision-makers in New Zealand believe that BDA is critical to their business strategy, and 41% already incorporated BD into the everyday business decision (Henderson, 2015).     

The application of AI to BDA can assist businesses and organizations to detect a correlation between factors humans cannot perceive (Henderson, 2015).  It can allow organizations to deal with the speed of the information change today in the business world (Henderson, 2015).   AI can help organization add a level of intelligence to their BDA to understand complex issues better quicker than humans can in the absence of AI (Henderson, 2015).  AI can also serve to fill the gap left by not having enough data analysts available (Henderson, 2015).  AI can also reveal insights that can lead to novel solutions to existing problems or even uncover issues that are not previously known (Henderson, 2015).  A good example of AI impact on BDA is the AI-powered BDA in Canada which is used to identify patterns in the vital signs of premature babies that can be used in the early detection of life-threatening infections.  Figure 2 shows AI and BD working together for better analytics and better insight. 


Figure 2:  Artificial Intelligence and Big Data (Hansen, 2017).

Conclusion

This assignment has discussed the impact of artificial intelligence (AI) on Big Data Analytics (BDA).  It began with the identification of the building blocks of the AI and the impact of each building block on BDA.  BDA has an essential impact on AI as it empowers it, and AI has a crucial role in BDA as demonstrated and proven in various fields especially in the healthcare and financial industries.  The researcher would like to summarize this relationship between AI and BDA in a single statement: “AI without BDA is lame, and BDA without AI is blind.”  

References

Brook, P. (2018). Trends in Big Data and Artificial Intelligence Data

Chibuk, J. D. (2018). Four Building Blocks for a General AI.

Gerbert, P., Hecker, M., Steinhäuser, S., & Ruwolt, P. (2017). The Building Blocks of Artificial Intelligence.

Hansen, S. (2017). How Big Data Is Empowering AI and Machine Learning?

Henderson, J. (2015). Insight: What role does Artificial Intelligence Play in Big Data?  What are the links between artificial intelligence and Big Data?

Marr, B. (2018). 27 Incredible Examples Of AI And Machine Learning In Practice.

Mills, T. (2018). Eight Ways Big Data And AI Are Changing The Business World.

Rao, S. (2017). The Five Fundamental Building Blocks for Artificial Intelligence in Banking.

The Significance of Big Data and Artificial Intelligence to any Industry

Dr. O. Aly
Computer Science

The purpose of this discussion is to address whether the combination of Big Data and Artificial Intelligence is significant to any industry.  The discussion also provides an example where Artificial Intelligence has been used and applied successfully.  The chosen sector for such the use of AI is the health care.    

The Significance of Big Data and Artificial Intelligence Integration

            As discussed in U4-DB2, Big Data empowers artificial intelligence.  Thus, there is no doubt about the benefits and advantages of utilizing Big Data in artificial intelligence for businesses.  However, in this discussion, the question is whether the significance of their combination in any industry or specific industries only. 

            McKinsey Global Institute reported in 2011 that not all industries are created equal when parsing the benefits of Big Data (Brown, Chui, & Manyika, 2011).  The report has indicated that although Big Data is changing the game for virtually every sector, it favors some companies and industries over the others, especially in the early stages of the adoption.  McKinsey has also reported in (Manyika et al., 2011) five domains that could take advantages of the transformative potential of Big Data. These domains include for U.S. healthcare, retail, and public sector administration, retail for European Union, and personal location data globally.  Figure 1 illustrates the value of Big Data significant financial value across sectors.


Figure 1.  Big Data Financial Value Across Sectors (Manyika et al., 2011).

            Thus, the value of Big Data Analytics is tremendous already for almost every business, and the value varies from one sector to another.  The combination of Big Data and artificial intelligence is good for innovation (Bean, 2018; Seamans, 2017).  There is no limit for innovation for any business.  Figure 2 shows the 19-year Go player Ke Jie reacts during the second match against Google’s artificial intelligence program AlphaGo in Wuzhen. 


Figure 2.  19-year old Ke Jie Reacts During the Second Match Against Google’s Artificial Intelligence Program AlphaGo (Seamans, 2017).

If the combination of Big Data and artificial intelligence is good for innovation, then, logically every organization and every sector need innovations to survive the competition.  In the survey conducted by NewVantage Partner, 97.2% of the executive decision-makers have reported that their companies are investing in building or launching Big Data and Artificial Intelligence initiatives (Bean, 2018; Patrizio, 2018).  It is also worth noting that 76.5% of the executives have indicated that the availability of Big Data is empowering AI and cognitive initiatives within their organizations (Bean, 2018).  The same survey has also shown 93% of the executives have identified artificial intelligence as the disruptive technology and their organizations are investing in for the future.  This result shows that a common consensus among the executives that organizations must leverage cognitive technologies to compete in an increasingly disruptive period (Bean, 2018). 

AI Application Example in the Health Care Industry

Since healthcare industry has been identified in various research studies about its great benefits from Big Data and artificial intelligence, this sector is chosen as an example of the application of both BD and AI for this discussion. AI is becoming a transformational force in healthcare (Bresnick, 2018).  The healthcare industry has almost endless opportunities to apply technologies such as Big Data and AI to deploy more precise and impactful interventions at the right time in the care of patients (Bresnick, 2018).

Harvard Business Review (HBR) has indicated that 121 health AI and machine learning companies raised $2.7 billion in 206 deals between 2011 and 2017 (Kalis, Collier, & Fu, 2018).  HBR has examined ten promising artificial intelligence applications in healthcare (Kalis et al., 2018).  The findings have shown that the application of AI could create up to $150 billion in annual savings for U.S. health care by 2026 (Kalis et al., 2018).   The investigation has also shown that AI currently creates the most value in assisting the frontline clinicians to be more productive and in making back-end processes more efficient, but not yet in making clinical decisions or improving clinical outcomes (Kalis et al., 2018).  Figure 3 shows the ten AI applications that could change health care.


Figure 3.  Ten Application of AI That Could Change Health Care (Kalis et al., 2018).

Conclusion

            In conclusion, the combination of Big Data and Artificial Intelligence drives innovations for all sectors. Every sector and every business need to innovate to maintain a competitive edge.  Some sectors are leading in taking the advantages of this combination of BD and AI more than others.  Health care is an excellent example of employing artificial intelligence.  However, the application of the AI has its most value on three main areas only of AI-assisted surgery, virtual nurse, administrative workflow.  The use of AI in other areas in healthcare is still in infant stages and will take time until it establishes its root and witness the great benefits of AI application (Kalis et al., 2018).

References

Bean, R. (2018). How Big Data and AI Are Driving Business Innovation in 2018. Retrieved from https://sloanreview.mit.edu/article/how-big-data-and-ai-are-driving-business-innovation-in-2018/.

Bresnick, J. (2018). Top 12 Ways Artificial Intelligence Will Impact Healthcare. Retrieved from https://healthitanalytics.com/news/top-12-ways-artificial-intelligence-will-impact-healthcare.

Brown, B., Chui, M., & Manyika, J. (2011). Are you ready for the era of ‘big data’. McKinsey Quarterly, 4(1), 24-35.

Kalis, B., Collier, M., & Fu, R. (2018). 10 Promising AI Applications in Health Care. Retrieved from https://hbr.org/2018/05/10-promising-ai-applications-in-health-care, Harvard Business Review.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity.

Patrizio, A. (2018). Big Data vs. Artificial Intelligence.

Seamans, R. (2017). Artificial Intelligence And Big Data: Good For Innovation?