Sarbanes-Oxley (SOX) Act of 2002: The Role of COBIT

Dr. O. Aly
Computer Science

The purpose of this discussion is to address a high-level understanding of the Sarbanes-Oxley (SOX) Act of 2002. The focus of the discussion will be on the role of IT such as COBIT in the implementation of SOX.  The discussion also addresses some of the other frameworks that can be used to implement SOX, with a focus on one selected framework and rationale for such a choice. The discussion begins with the Controls Frameworks Background, followed by the Sarbanes-Oxley (SOX) Act.

Controls Frameworks Background

In 1929, Wall Street crashed (Shofner & Adams, n.d.).  In 1934, the US Security and Exchange Commission (SEC) was formed, and public companies required to perform annual audits.  In 1987, the Treadway Commission was formed in response to corrupt mid-1970s accounting practices, retains Coopers & Lybrand to perform a project to create an accounting control framework. In 1992, Internal Control – Integrated Framework, a four-volume report was released by the Committee of Sponsoring Organizations (COSO). A survey result shows that 82% of the respondents used COSO (Shofner & Adams, n.d.).  In 1996, the Information Technology Governance Institute (ITGI) released the Control Objectives for Information and Related Technology (COBIT) Framework.  In 2002, Sarbanes-Oxley (SOX) Act passed, requiring companies to adopt and declare a framework used to define and assess the internal controls (Pearlson & Saunders, 2001; Shofner & Adams, n.d.). 

Sarbanes-Oxley (SOX) Act

Sarbanes-Oxley (SOX) Act is a Public Company Accounting Reform and Investor Protection Act (PCAOB) of 2002.  The SOX Act affects organizations that are publicly traded in the United States (Abernathy & McMillan, 2016; Pearlson & Saunders, 2001).  It controls the accounting methods and financial reporting for the organizations and stipulates penalties and even jail time for executive officers.  SOX introduced new limitations on auditors including mandatory partner rotation and limits on services (Bannister, 2006).  It requires new disclosure controls that inform corporate officers of material information during the reporting period.  The purpose of SOX is to reduce the possibilities of corporate fraud by increasing the stringency of procedures and requirements for financial reporting (Sarbanes-Oxley, 2002). It is worth to mention two significant sections of SOX, section 302 for financial reporting and section 404 for internal control (Bannister, 2006). 

Section 302 of the SOX Act directed the Security and Exchange Commission (SEC) to adopt rules to require the principal executive and financial officers of a public company to certify in their company’s annual and quarterly reports that such reports are accurate and complete and that they have established and maintained adequate internal controls for public disclosure.  The purpose of this section is to ensure that the CEO and CFO take a proactive role in their company’s public disclosure and to give investors more confidence in the accuracy, quality, and reliability of the company’s SEC periodic reports (Sarbanes-Oxley, 2018a).

Section 404 of the SOX Act requires the publicly-held company’s auditor to attest to, and report on, management’s assessment of its internal control.  It mandates that all publicly-traded companies must establish internal controls and procedures for financial reporting and must document, test and maintain those controls and procedures to ensure their effectiveness (Sarbanes-Oxley, 2018b). 

Business Audit and Strategic Security Risk Assessment

Security Information and Event Management (SIEM or SIM/SEM) solutions play a significant role in monitoring operational security and supporting organizations in decision making (Zhu, Hill, & Trovati, 2016).  SIEM provides a standardized approach to collect information and events, store and query and provide degrees of correlations, usually driven by rules. The leading SIEM solutions in the market include HP ArcSight, IBM Security QRadar, LogRhythm, and EMC Corp. However, SIEM does not cover the business audit and strategic security risk assessment but instead provide inputs that need to be adequately analyzed and translated into a suitable format to be used by senior risk assessors and strategic policymakers.

The risk assessment standards such as ISO2700x, NIST and so forth operate at a macro level and usually do not fully use the information coming from the logging and auditing activities carried bout by IT operations. Some frameworks for auditing the companies IT controls, most notably COSO and COBIT (Bannister, 2006; Zhu et al., 2016).  COBiT and COSO frameworks become more critical in documenting and testing the effectiveness of the internal controls.  However, COSO is not sufficient alone (Bannister, 2006; Shofner & Adams, n.d.).

Other types of detective techniques are concerned with the cloud computing services adoption rather than the security and information monitoring.  Examples of these techniques include Sumo Logic, Amazon Web Services (AWS) CloudTrail, and Logentries.  

COBIT IT Control Framework for Sarbanes-Oxley (SOX)

COBIT stands for Control Objectives for Information and Related Technology. It is a documented set of best IT security practices designed by the Information Systems and Control Association (ISACA) (Stewart, Chapple, & Gibson, 2015). COBIT is used to plan the IT security of an organization and also serve as a guideline for auditors.  It is a security concept infrastructure used to organize the complex security solution of companies. COBIT has a series of COBIT from COBIT 1, COBIT 2, COBIT 3, COBIT 4 and 4.1, and COBIT 5. Figure 1 illustrates the history of COBIT.

 
Figure 1. COBIT History (Shofner & Adams, n.d.).

COBIT prescribes goals and requirements for security controls and encourages the mapping of IT security ideals to business objectives.  COBIT five is based on five fundamental principles for governance and management of enterprise IT (itgovernanceusa.com, 2018; Shofner & Adams, n.d.).  Principle One is about meeting the stakeholder’s needs.  Principle Two is about covering the Enterprise end-to-end.  Principle Three is about applying a Single, Integrated Framework. Principle Four is about Enabling a Holistic Approach.  Principle Five is about Separating Governance From Management (Bannister, 2006). 

Other standards and guidelines for IT security include the Open Source Security Testing Methodology Manual (OSSTMM), ISO/IEC27002 which replaced ISO17799, and the Information Technology Infrastructure Library (ITIL) (Pearlson & Saunders, 2001; Shofner & Adams, n.d.).

COBIT Advantages and Rationale

COBIT is well suited to organizations focused on risk management and mitigation, and it is detailed (Pearlson & Saunders, 2001). COBIT is chosen for Sarbanes Oxley (SOX) because it is the most widely-recognized internal control framework used to achieve IT SOX compliance (itgovernanceusa.com, 2018). It has more breadth of IT control coverage.  COSO, ISO 17799 and ITIL provide medium control coverage, while COBIT provides a high level of control coverage (Bannister, 2006). Figure 2 illustrates the breadth of IT control coverage and the position of COBIT.


Figure 2.  COBIT Control Coverage (Bannister, 2006).

COBIT is internationally accepted good practices, a de facto standard. It is management oriented and supported by tools and training. It is freely available and shares knowledge and leverage expert volunteers.  COBIT continually evolves and maintained by reputable, not for profit organization.  It maps 100% onto COSO, and strongly onto all primarily related standards.

It is used for audit planning and audit program development.  It is also used to validate current IT controls and assess and reduce IT risks.  It complements the COSO framework.  It is used as a framework for improving IT, and to benchmark IT.  It is also used as a foundation for IT Governance (Bannister, 2006).

Summary and Conclusion

 This discussion addressed various essential topics related to Sarbanes-Oxley.  The discussion provides a brief history of the Controls Frameworks since 1929.  COSO is not recommended to be used alone, as it is not detailed enough for IT.  ISO 17799 is not enough as it does not cover sound data management, third-party processes, IT delivery and support operations, audit and governance issues, software and hardware development control and segregation of duties.  Organizations should consult and agree on the framework with external auditors before implementing the program.  Businesses should not select too many COBIT control objectives and control practices.  Simplification is highly recommended. The focus should be on Key IT Control deficiencies that are high or are a critical risk such as change management issues, access control and segregation of duties, and some data management issues like backups and storage.  Organizations should include IT applications such as SAP with the business process document because most of the business controls are defined by the systems and applications (Bannister, 2006).

Organizations are recommended not to test too many applications and processes, but instead take a Risk & Business Impact Approach.  Businesses should also use external Pricewaterhouse-Coopers five-step process of inventory spreadsheets, evaluation of the use and complexity, the determination of the level of controls, the evaluation of existing controls, and the development remediation. COSO and COBIT frameworks should be used as benchmarks, as they do not provide answers or specific controls, and should be tailored to meet the needs of the business. Organizations should analyze the compliance tools and software as some of them are not matured yet.  Accountabilities for each business and IT process should be assigned as Segregation of Duties is business accountability but facilitated by IT (Bannister, 2006).

References

Abernathy, R., & McMillan, T. (2016). CISSP Cert Guide: Pearson IT Certification.

Bannister, G. (2006). Using COBiT for Sarbanes Oxley. Retrieved from http://itgi.jp/conf200611/garybannister.pdf.

itgovernanceusa.com. (2018). COBIT (Control Objectives for Information and Related Technology). Retrieved from https://www.itgovernanceusa.com/cobit. 

Pearlson, K., & Saunders, C. (2001). Managing and Using Information Systems: A Strategic Approach. 2001: USA: John Wiley & Sons.

Sarbanes-Oxley. (2002). Sarbanes-Oxley Act Guideline. Retrieved from http://www.sarbanes-oxley.com/. 

Sarbanes-Oxley. (2018a). SOX Section 302: Corporate Responsibility for Financial Reports. Retrieved from http://sarbanes-oxley-101.com/SOX-302.htm. 

Sarbanes-Oxley. (2018b). SOX Section 404: Management Assessment of Internal Controls. Retrieved from http://sarbanes-oxley-101.com/SOX-404.htm. 

Shofner, S., & Adams, M. (n.d.). Introduction to COSO & COBIT. Retrieved from http://www.sfisaca.org/images/FC12Presentations/C31.pdf.

Stewart, J., Chapple, M., & Gibson, D. (2015). ISC Official Study Guide.  CISSP Security Professional Official Study Guide (7th ed.): Wiley.

Zhu, S. Y., Hill, R., & Trovati, M. (2016). Guide to Security Assurance for Cloud Computing: Springer.

The Impact of Cloud Computing Technology on Information Security Governance Decisions

Dr. O. Aly
Computer Science

Information security plays a significant role in the context of information technology (IT) governance. The critical decisions as part of governance for the information security needs are in the areas of information security strategy, policies, infrastructure, training, and investments for tools. Cloud computing emerging technology provides a new business model for accessing computing infrastructure on a virtualized, scalable, and lower-cost basis.  The purpose of this discussion is to address the impact of cloud computing on changing decisions related to information security governance.

Cloud Computing Technology

“Cloud computing and big data are conjoined” (Hashem et al., 2015).  This statement can raise the question about the reason for such a relationship.  Big Data has been characterized by what is often referred to as a multi-V model such as variety, velocity, volume, veracity, and value (Assunção, Calheiros, Bianchi, Netto, & Buyya, 2015). While variety represents the data types, the velocity reflects the rate at which the data is produced and processed (Assunção et al., 2015).  The volume defines the amount of data, and the veracity reflects how much the data can be trusted given the reliability of its source. The value, on the other hand, represents the monetary worth which organizations can derive from adopting Big Data computing.  The characteristics of Big Data including the explosive growth rate, challenges and issues came along (Jagadish et al., 2014; Meeker & Hong, 2014; Misra, Sharma, Gulia, & Bana, 2014; Nasser & Tariq, 2015; Zhou, Chawla, Jin, & Williams, 2014).  The growth rate is regarded to be a significant challenge for IT researchers and practitioners to design appropriate systems that handle the data effectively, and analyze it to extract relevant meaning for decision-making (Kaisler, Armour, Espinosa, & Money, 2013). Other challenges include data storage, data management and data processing (Fernández et al., 2014; Kaisler et al., 2013); Big Data variety, Big Data integration and cleaning, Big Data reduction, Big Data query and indexing, and Bid Data analysis and mining (Chen et al., 2013). 

Traditional systems could not face all these challenges of BD. Cloud computing technology emerged to address these challenges of BD. Cloud computing is regarded as the solution and the answer to BD challenges and issues (Fernández et al., 2014).  Organizations and businesses are under pressure to quickly adopt and implement technologies such as cloud computing to address the challenges of the Big Data storage, and processing demands (Hashem et al., 2015).  Besides, the increasing demand of the Big Data on networks, storage, and servers outsourcing the data to the cloud may seem to be a practical and useful option and approach when dealing with Big Data (Katal, Wazid, & Goudar, 2013).  During the last two decades, this increasing demand for data storage and data security has been growing at a fast pace (Gupta, 2015).  Such a demand lead to the emerging cloud computing technology (Gupta, 2015).  Issues such as scalability of the Big Data has also pointed towards the cloud computing technology, which can aggregate multiple disparate workloads with varying performance goals into significant clusters in the cloud (Katal et al., 2013). 

Various studies provided a different definition to cloud computing.  However, the National Institute of Standards and Technology (NIST) proposed an official definition of cloud computing.  NIST defined cloud computing as “a model for enabling ubiquitous, convenient, on-demand network access to a shared pool of configurable computing resources (e.g., network, servers, storage, applications, and services) that can be rapidly provisioned and released with minimal management effort or service provider interaction” (page 2) (Mell & Grance, 2011).

Cloud computing technology offers various deployment models of public cloud, private cloud, hybrid cloud, and community cloud. The public cloud is the least secure cloud model (Puthal, Sahoo, Mishra, & Swain, 2015).  The private cloud has also been referred by (Armbrust et al., 2009) as internal datacenters, which are not available to the general public. Community cloud supports the specific community with particular concerns such as security requirements, policy and compliance consideration, and mission (Yang & Tate, 2012; Zissis & Lekkas, 2012). It also offers three major service models such as Infrastructure-as-a-Service (IaaS), Software-as-a-Service (SaaS), Platform-as-a-Service (PaaS) (Mell & Grance, 2011).   

Cloud computing offers various benefits from technological benefits such as data and storage, APIs, metering and tools, to economic benefits such as pay per use, cost reduction and return on investment, to non-functional benefits such as elasticity, reliability, and availability (Chang, 2015).  Despite these benefits, and the increasing trend in the adoption of cloud computing is still not widely used. Security concerns related to virtualization, hardware, network, data, and service providers act as significant obstacles in adopting cloud computing in IT industry (Balasubramanian & Mala, 2015; Kazim & Zhu, 2015).  The security and privacy concern has been one of the major obstacle preventing the full adoption of the technology (Shahzad, 2014).   (Purcell, 2014) have stated that “The advantages of cloud computing are tempered by two major concerns – security and loss of control.” The uncertainty about security has lead executives to state that security is their number one concern for deploying cloud computing (Hashizume, Rosado, Fernández-medina, & Fernandez, 2013).

Cloud Computing Governance and Data Governance

The enforcement of regulatory laws such as Health and Human Services Health Insurance Portability and Accountability Act (HIPAA) and Sarbanes-Oxley becomes an issue especially when adopting cloud computing (Ali, Khan, & Vasilakos, 2015).  Cloud computing fosters security concerns that hamper the fast rate adoption of the cloud computing. Thus, cloud governance and data governance are highly recommended when adopting cloud computing.

Cloud governance is defined as the control and processes that make sure policies are enforced (Saidah & Abdelbaki, 2014).  It is a framework applied to all related parties and the business process securely to ensure that the cloud supports the goal of the organization and comply with all required regulations and rules. Cloud governance model should be aligned with the corporate governance and IT governance.  It has to comply with the strategy of the organization to accomplish the business goals.  Various studies proposed various cloud governance models.

(Saidah & Abdelbaki, 2014) proposed a cloud governance model that provides three models; policy model, operational model, and management model. The policy model invovle data policy, service policy, business process management policy and exit policy. The operational model include authentication, authorization, audit, monitoring, adaptations, medata repository, and asset management. The management model includes policy management, security management, and service management. Figure 1 illustrates the proposed cloud governance model.


Figure 1.  The Proposed Cloud Governance Model (Saidah & Abdelbaki, 2014).

(Rebollo, Mellado, & Fernández-Medina, 2013) proposed a security governance framework for cloud computing environment (ISGcloud). The proposed governance framework is founded upon two main standards. It implements the core governance principles of the ISO/IEC 38500 governance standard. The framework proposed a cloud service lifecycle based on the ISO/IEC 27036 outsourcing security draft.

When organizations decide to adopt the cloud computing technology, careful considerations must be made toward the deployment model as well as to the service model to understand the security requirements and the governance strategies (Al-Ruithe, Benkhelifa, & Hameed, 2016).  Data governance for cloud computing is not nice to have but is required by rules and regulations to protect the privacy of the users and employees. 

The loss of control on the data is the most significant issue when adopting cloud computing because the data is stored on a computer belonging to the cloud provider. This loss of governance and control could have a potentially severe impact on the strategy of the organization, and the capacity to meet its mission and goals (Al-Ruithe et al., 2016).  The loss of control and governance of the data can lead to the impossibility of complying with security requirements, a lack of confidentiality, integrity, and availability of data, and a deterioration of performance and quality of services, not to mention the introduction of compliance challenges. Thus, organizations must be aware of the best practice for safeguarding, governing and operating data when adopting cloud computing technology.  NIST offers many recommendations when adopting cloud computing technology (Al-Ruithe et al., 2016). The organization should consider data governance strategy before adopting cloud computing. This recommendation demonstrates the importance of data governance for organizations which intend to move their data and services to cloud computing environment as policies, rules, and distribution of responsibilities between cloud actors will have to be set.  The development of policies and data governance will assist organizations in monitoring compliance with the current regulations and rules.  The primary benefit of data governance when using cloud environment is to ensure security measures, privacy protection and quality of data. 

The implementation of data governance for cloud computing changes based on the roles and responsibilities in the internal process of the organization (Al-Ruithe et al., 2016).  Thus, organizations are expected to face many issues.  The lack of understanding of data governance is one of the major issues.  The lack of training and lack of communication plan are additional issues which organizations will face. The lack of support is another obstacle which includes lack of top management support, lack of compliance enforcement and lack of cloud regulation. Lack of policies, process and defined roles in the organization are one of the main obstacles to implement data governance in the cloud.  The lack of resources including lack of funding, technology, people, and skills is considered another data governance obstacle.

Conclusion

This discussion addressed cloud computing technology and its relationship with BD and BDA. Cloud computing technology emerged as a solution to the challenges that BD and BDA faced. However, cloud computing is confronted with security and privacy challenges.  Executives expressed security as the number one concern for cloud computing adoption.  The governance of cloud computing will provide a secure environment to protect data from loss or malicious attacks. Organizations are required to comply with the various security and privacy regulations and rules.  Organizations under pressure for data protection especially when using cloud computing technology.  Thus, they are required to implement the data governance and cloud computing governance framework to ensure such compliance.

References

Al-Ruithe, M., Benkhelifa, E., & Hameed, K. (2016). A Conceptual Framework for Designing Data Governance for Cloud Computing. Procedia Computer Science, 94, 160-167. doi:10.1016/j.procs.2016.08.025

Ali, M., Khan, S. U., & Vasilakos, A. V. (2015). Security in cloud computing: Opportunities and challenges. Information Sciences, 305, 357-383. doi:10.1016/j.ins.2015.01.025

Armbrust, M., Fox, A., Griffith, R., Joseph, A. D., Katz, R. H., Konwinski, A., . . . Stoica, I. (2009). Above The Clouds: A Berkeley View of Cloud Computing. Electrical Engineering and Computer Sciences University of California at Berkeley.

Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A. S., & Buyya, R. (2015). Big Data Computing and Clouds: Trends and Future Directions. Journal of Parallel and Distributed Computing, 79, 3-15. doi:10.1016/j.jpdc.2014.08.003

Balasubramanian, V., & Mala, T. (2015). A Review On Various Data Security Issues In Cloud Computing Environment And Its Solutions. Journal of Engineering and Applied Sciences, 10(2).

Chang, V. (2015). A Proposed Framework for Cloud Computing Adoption. International Journal of Organizational and Collective Intelligence, 6(3).

Chen, J., Chen, Y., Du, X., Li, C., Lu, J., Zhao, S., & Zhou, X. (2013). Big Data Challenge: a Data Management Perspective. Frontiers of Computer Science, 7(2), 157-164. doi:10.1007/s11704-013-3903-7

Fernández, A., Del Río, S., López, V., Bawakid, A., del Jesus, M. J., Benítez, J. M., & Herrera, F. (2014). Big Data with Cloud Computing: An Insight on the Computing Environment, MapReduce, and Programming Frameworks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(5), 380-409. doi:10.1002/widm.1134

Gupta, U. (2015). Survey on Security Issues in File Management in Cloud Computing Environment. Department of Computer Science and Information Systems, Birla Institute of Technology and Science, Pilani.

Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The Rise of “Big Data” on Cloud Computing: Review and Open Research Issues. Information Systems, 47, 98-115. doi:10.1016/j.is.2014.07.006

Hashizume, K., Rosado, D. G., Fernández-medina, E., & Fernandez, E. B. (2013). An analysis of security issues for cloud computing. Journal of internet services and applications, 4(1), 1-13. doi:10.1186/1869-0238-4-5

Jagadish, H. V., Gehrke, J., Labrinidis, A., Papakonstantinou, Y., Patel, J. M., Ramakrishnan, R., & Shahabi, C. (2014). Big Data and Its Technical Challenges. Communications of the Association for Computing Machinery, 57(7), 86-94. doi:10.1145/2611567

Kaisler, S., Armour, F., Espinosa, J. A., & Money, W. (2013). Big Data: Issues and Challenges Moving Forward. Paper presented at the Hawaii International Conference on System Sciences

Katal, A., Wazid, M., & Goudar, R. H. (2013). Big Data: Issues, Challenges, Tools and Good Practices. Paper presented at the International Conference on Contemporary Computing.

Kazim, M., & Zhu, S. Y. (2015). A Survey on Top Security Threats in Cloud Computing. International Journal Advanced Computer Science and Application, 6(3), 109-113.

Meeker, W., & Hong, Y. (2014). Reliability Meets Big Data: Opportunities and Challenges. Quality Engineering, 26(1), 102-116. doi:10.1080/08982112.2014.846119

Mell, P., & Grance, T. (2011). The NIST Definition of Cloud Computing. National Institute of Standards and Technology (NIST), 800-145, 1-7.

Misra, A., Sharma, A., Gulia, P., & Bana, A. (2014). Big Data: Challenges and Opportunities. International Journal of Innovative Technology and Exploring Engineering, 4(2).

Nasser, T., & Tariq, R. S. (2015). Big Data Challenges. Journal of Computer Engineering & Information Technology, 9307, 1-10. doi:10.4172/2324

Purcell, B. M. (2014). Big Data Using Cloud Computing. Journal of Technology Research, 5, 1-9.

Puthal, D., Sahoo, B., Mishra, S., & Swain, S. (2015). Cloud Computing Features, Issues, and Challenges: a Big Picture. Paper presented at the Computational Intelligence and Networks (CINE), 2015 International Conference on Computational Intelligence & Networks.

Rebollo, O., Mellado, D., & Fernández-Medina, E. (2013). Introducing a security governance framework for cloud computing. Paper presented at the Proceedings of the 10th International Workshop on Security in Information Systems (WOSIS), Angers, France.

Saidah, A. S., & Abdelbaki, N. (2014). A New Cloud Computing Governance Framework.

Shahzad, F. (2014). State-of-the-art Survey on Cloud Computing Security Challenges, Approaches and Solutions. Procedia Computer Science, 37, 357-362. doi:10.1016/j.procs.2014.08.053

Yang, H., & Tate, M. (2012). A Descriptive Literature Review and Classification of Cloud Computing Research. Communications of the Association for Information Systems, 31(2), 35-60.

Zhou, Z., Chawla, N., Jin, Y., & Williams, G. (2014). Big Data Opportunities and Challenges: Discussions from Data Analytics Perspectives. Institute of Electrical and Electronic Engineers: Computational Intelligence Magazine, 9(4), 62-74.

Zissis, D., & Lekkas, D. (2012). Is Cloud Computing Finally Beginning to Mature? International Journal of Cloud Computing and Services Science, 1(4), 172. doi:10.11591/closer.v1i4.1248

Zachman Enterprise Architecture

Dr. O. Aly
Computer Science

Abstract

The purpose of this project is to discuss Zachman Enterprise Architecture which is also known as the Zachman Framework.  Zachman introduced the concept of architecture in 1987 and compared the framework to the construction architecture which requires components, builders, time frame and so forth.  The framework is not a methodological but a logical framework. It is a two-dimensional framework.  The framework is not a security-based framework.  However, it allows analysis of enterprise to be presented to different groups in the enterprise in ways that relate to the responsibilities of the groups.  A few architectures expanded since the inception of Zachman’s Framework. This project discussed a few of them such as TOGAF, DoDAF, MODAF, SABSA, and CobiT. In brief, the concept of architecture did not exist until the Zachman’s initiative. The architecture concept was limited to building and construction in the Industrial Age. However, in the Information Age, Zachman was inspired to develop an information system architecture and framework for Enterprise.  The application of the architecture concept to the enterprise information system was an innovative idea from Zachman that deserves recognition.

Keywords: Zachman Framework; Zachman Enterprise Architecture.

Introduction

This project discusses the enterprise framework which is developed by Zachman. As indicated in (Zachman, 1987), information system architecture was not significant thirty years ago from the time Zachman started to develop the framework in 1987.  Zachman was inspired to develop such a framework and architecture for the enterprise information system.  This project begins with Zachman Enterprise Architecture, followed by additional frameworks and architectures that appeared and expanded since the inception of Zachman Framework.

Zachman Enterprise Architecture or Zachman’s Framework

In 1987, John A. Zachman published a unique approach to the elements of the information system.  Zachman is often mentioned in the literature as the primary contributor to enterprise architecture.  Zachman (1987) presented a comparison between project design and the implementation using the classical engineering and building constructions, roads, and bridges (Zachman, 1987).  The construction begins with requirements and then the structure to implement these requirements is designed.  Before starting the implementation of a project that is based on stakeholders’ requirement, the design to implement such a project must be developed (Zachman, 1987).   Zachman provided a comparison using a generic set of architectural representations produced during the process of constructing a building, including concepts, work breakdown structure, engineering design, manufacturing engineering, and assembly and fabrication drawings.

The framework for enterprise architecture or as it is called Zachman Framework is a logical structure for classifying and organizing the descriptive representations of the enterprise which are significant to the management and the development of the systems of the enterprise including manual and automated systems (Zachman, 1997).   The generic classification structure of the design artifacts involves questions of what, how, where, who, when and why with various players planner, owner, design, builder, implementer, and operator. The artifacts involve scope, concepts, logic, physics, technology, and product.  It also includes material, process, geometry, instructions, timing, and objectives.  Figure 1 shows the generic classification structure of Zachman’s Design Artifacts. Figure 2 shows the populated framework. 


Figure 1.  Generic Classification Structure of Design Artifacts (Zachman, 1997).


Figure 2.  The Populated Zachman’s Framework for Enterprise Architecture (Zachman, 1997).

            Zachman’s Framework is a generic classification scheme for design artifacts, which is detailed representations of a complex object. The utility of such a scheme is to enable focused concentration on selected aspects of an object without losing a sense of the contextual or holistic perspective (Zachman, 1997).  The framework is logical with five perspectives Owner, Designer, Builder, bounded by Scope or Strategist and Detail or Implementer in addition to the instantiation.  Six abstractions of What for things, How for the process, Where for location, Who for responsibility, When for timing, and Why for motivation.  The framework is comprehensive as it addresses the enterprise as a whole. The framework does not require technical professionals.  The framework serves as a planning tool to make better choices by positioning issues in the context of the enterprise and view various alternatives.  It also serves as a problem-solving tool to enable enterprise work with abstractions to simplify, isolate simple and single variables without losing the sense of the complexity of the enterprise as a whole.  Zachman’s framework is described as a neutral as it is defined independently of tools or methodologies and therefore any tool or any methodology can be mapped against it.  It is also described as a raw material for enterprise engineering (Zachman, 1997).

            Zachman’s Framework is the basis for architecture (Zachman, 2008).  During the industrial age, the industrial products were increasing in complexity and products that are changing.  However, in the age of information system, the enterprise is increasing in complexity and is changing continuously.  Zachman suggested that enterprise architecture is the determinant factor of survival in the Information Age (Zachman, 2008).  Thus, the Framework for Enterprise Architecture which is also called the Zachman Framework has profound significance in placing definitions around Enterprise Architecture, the survival issue of the century.  It is not a methodology but rather an ontology and a theory of the existence of a structured set of essential components of an object for which explicit expressions is required and probably mandatory for creating, operating and changing the object (Zachman, 2008).

            Zachman’s Framework is also described as a two-dimensional classification system based on six communication questions of What, Where, When, Why, Who and How as discussed above, which intersect with different views of Planner, Owner, Designer, Builder, Subcontractor, and Actual System (Abernathy & McMillan, 2016).  The system allows the analysis of an organization to be presented to different groups in the organization in ways that relate to the groups’ responsibilities. The enterprise architecture framework is not security oriented. However, it helps organizations relay information for personnel in a language and format that is most useful to them.  Since the inception of Zachman’s enterprise architecture, a few architectures were developed.  The next section will address some of these architectures.

Architectures Expansion

            Since the inception of Zachman’s Enterprise Architecture, a few architectures have been expanded along with the technology growth.  Organizations should choose the enterprise architecture framework that represents the organization in the most useful manner, based on the needs of the stakeholders.  This section discusses some of these architectures that expanded since Zachman’s Framework.

The Open Group Architecture Framework (TOGAF) is another enterprise architecture framework that aids organization design, plan, implement and govern enterprise information architecture (Abernathy & McMillan, 2016). TOGAF is based on four inter-related domains: technology, applications, data, and business.

            The Department of Defense Architecture Framework (DoDAF) is another architecture framework that organizes a set of products under eight views starting with “All viewpoint” (AV) for required, capability viewpoint (CV), data and information viewpoint (DIV), operation viewpoint (OV), project viewpoint (PV), services viewpoint (SvcV), standards viewpoint (STDV), and system viewpoint (SV).  This framework is used to ensure that new Department of Defense (DoD) techn9ologies integrate correctly with the current infrastructures (Abernathy & McMillan, 2016).

            The British Ministry of Defence Architecture Framework (MODAF) is another architecture framework that divides information into seven viewpoints starting with strategic viewpoint (StV), operational viewpoint (OV), service-oriented viewpoint (SOV), systems viewpoints (SV), acquisition viewpoint (AcV), technical viewpoint (TV), and all viewpoint (AV) (Abernathy & McMillan, 2016).

Sherwood Applied Business Security Architecture (SABSA) is an enterprise security architecture framework which is similar to Zachman’s Framework (Abernathy & McMillan, 2016).  It uses six communication questions of What, Where, When, Why, Who and How which intersect with six players of Operational, Component, Physical, Logical, Conceptual and Contextual. It is described to be a risk-driven architecture (Abernathy & McMillan, 2016).  Table 1 shows the SABSA Framework Matrix. 

Table 1.  SABSA Framework Matrix (Abernathy & McMillan, 2016).

Control Objectives for Information and Related Technology (CobiT) is a security control development framework that documents five principles.  The first principle is about meeting stakeholder needs while covering the enterprise end-to-end is the second principle.  The application of a single integrated framework is the third principle, following by enabling a holistic approach and separating governance from management principle. These five principles drive control objectives categorized into seven enablers starting with principles, policies, and framework, followed by processes, organizational structures. Culture, ethics, and behavior is the fourth enabler, followed by the information. Services, infrastructure, and application is another enabler, followed by the last enabler of people, skills and competencies.

Conclusion

            This project discussed Zachman Enterprise Architecture which is also known as the Zachman Framework.  Zachman introduced the concept of architecture in 1987 and compared the framework to the construction architecture which requires components, builders, time frame and so forth.  The framework is not a methodological but a logical framework. It is a two-dimensional framework.  The framework is not a security-based framework.  However, it allows analysis of enterprise to be presented to different groups in the enterprise in ways that relate to the responsibilities of the groups.  A few architectures expanded since the inception of Zachman’s Framework. This project discussed a few of them such as TOGAF, DoDAF, MODAF, SABSA, and CobiT. In brief, the concept of architecture did not exist until the Zachman’s initiative. The architecture concept was limited to the construction in the Industrial Age. However, in the Information Age, Zachman was inspired to develop an information system architecture and framework for Enterprise.  The application of architecture concept to the enterprise information system was an innovative idea from Zachman that deserves recognition.

References

Abernathy, R., & McMillan, T. (2016). CISSP Cert Guide: Pearson IT Certification.

Zachman, J. A. (1987). A framework for information systems architecture. IBM Systems Journal

Retrieved from https://www.zachman.com/images/ZI_PIcs/ibmsj2603e.pdf, 26(3), 276.

Zachman, J. A. (1997). The Framework for Enterprise Architecture: Background, Description, and Utility by John A. Zachman. Retrieved from https://www.zachman.com/resources/ea-articles-reference/327-the-framework-for-enterprise-architecture-background-description-and-utility-by-john-a-zachman. 

Zachman, J. A. (2008). The Concise Definition of The Zachman Framework by John A. Zachman. Retrieved from https://www.zachman.com/about-the-zachman-framework.

Significant Challenges Facing Information Technology (IT)

Dr. O. Aly
Computer Science

The purpose of this discussion is to write a research position on some of the most significant challenges facing information technology (IT) today.  The focus is on the top 5 issues that are considered the most important from the researcher’s point of view.  These challenges can be a strategy, budget, pace, scope, architectures, mergers or acquisitions, technologies, devices, skills, and chief information officer (CIO) role.

Challenges Facing Information Technology Department

Various reports and studies discussed various challenges that the information technology (IT) department is facing (Brooks, 2014; Global Knowledge, 2018; Heibutzki, 2018). The top five challenges that are chosen for this discussion include budget, pace, security, strategy and skills.

Budget:  Business requires an allotment of the budget not only to keep up with the technology but also to keep up with the regulations (Heltzel, 2018).  Small and medium-size businesses are confronted with more budget challenges than large organizations. Understanding the business capabilities and the use of the information technology can help understand the budget requirements.  The budget requirements involve every department of the business, as it is all-encompassing.  If the budget is limited, the business will be limited and can be dragged behind while the wheel of technology is still moving on an unprecedented pace, and other competitors are gaining more advantages in the market.  Thus, careful examination of the financial resources must be performed by an organization to act as fast as other competitors.

Technology Pace: The next challenge that is facing the IT department is the pace of the technology. In the age of the digital world, the data generation is increasing at a fast pace.  McKinsey Global Institute indicates that Big data is the next frontier for innovation, competition, and productivity (Manyika et al., 2011).  The application of Big Data (BD) and Big Data Analytics (BDA) will become a fundamental basis for competition and growth from businesses. Organizations can gain competitive advantages when using BD and BDA.  The emerging technology of cloud computing, internet of things, the blockchain, quantum computing and so forth place pressure on business to consider the latest technology to stay in business.

Security: Security is the third major challenge that is facing the IT department.  Security comes with various regulations and rules.  Some security regulations and rules are broadly applicable, while other are industry specific (CSO, 2012).  Sarbanes-Oxley Act (SOX) is an example of the broadly applicable security law and regulations, while the Health Insurance Portability and Accountability Act (HIPAA) is an example of the industry-specific guidelines and requirements.  IT department should not only keep up with these regulations but also fully comply with them to protect users private information and avoid penalties.

Strategy:  One of the challenges that face IT is the strategy that encompasses all the requirement of the business in a governance framework.  IT strategy is not a nice to have, but it is required for sound organizational performance (Arefin, Hoque, & Bao, 2015). It should be aligned with the business strategy. The strategy should involve various aspects of the business from storing the data to customer relationship management systems, to analyzing data.  Strategic IT is a comprehensive plan which outlines how technology should be used to meet IT and business goals.  It is driven by the mission statement and mission objectives of the business.  The IT strategy affects the budget of the business as it will require some investments in technology, devices, tools, and workforces. 

Skills:  In the age of the digital world and the era of BD and BDA, the IT department is challenged with hiring the professionals who have the skills to work with the latest technology.  Skills for traditional systems such as data warehouse, or relational database are not the challenge, but the skills for the new technologies such as machine learning algorithms, analytical skills, cloud computing, the blockchain, and quantum computing, all of which require skills that are lacking in the professional market.  While organizations are under pressure to apply BD and BDA, statistics show that 37% shortage of skilled professionals (McCafferly, 2015), which is an example of the shortage of the skills that add additional burden on the IT.

Conclusion

This discussion addressed five significant challenges that are facing the information technology. The budget constraint in the presence of fast technology pace is the first challenge while keeping up with the emerging technologies in the age of the digital world is another challenge. IT department is required to comply with all of the security regulations and rules. Otherwise, heavy penalties can add more constraints on the budget.  The strategic IT is mandatory and should be aligned with the business goals and objectives. The skilled workforce is another challenge as technology is evolving and developing the required skills require time which organizations cannot afford in the age of fast pace evolving technologies.

References

Arefin, M. S., Hoque, M. R., & Bao, Y. (2015). The impact of business intelligence on organization’s effectiveness: an empirical study. Journal of Systems and Information Technology, 17(3), 263-285.

Brooks, C. (2014). The 5 Big Challenges Facing IT Departments.

CSO. (2012). The security laws, regulations and guidelines directory.  

Global Knowledge. (2018). 12 Challenges Facing IT Professionals. Retrieved from https://www.globalknowledge.com/us-en/resources/resource-library/articles/12-challenges-facing-it-professionals/. 

Heibutzki, R. (2018). Challenges of Information Technology Management in the 21st Century.

Heltzel, P. (2018). The 12 Biggest Issues IT Faces Today. Retrieved from https://www.cio.com/article/3245772/it-strategy/the-12-biggest-issues-it-faces-today.html. 

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute.

McCafferly, D. (2015). How To Overcome Big Data Barriers. Retrieved from https://www.cioinsight.com/it-strategy/big-data/slideshows/how-to-overcome-big-data-barriers.html.

Customer Relationship Management (CRM): Significant Topics

Dr. O. Aly
Computer Science

Customers are the source of all revenue. Understanding, delighting, and retaining customers over time requires carefully managing a relationship with them. Research articles on customer relationship management (CRM). Regarding technology, there has been an explosion in CRM platforms with a few established players and many niche players.

The purpose of this discussion is to address significant topics regarding CRM.  It begins with CRM systems and rationale for using them, followed by challenges and costs. The discussion also covers the building blocks of CRM systems and Their Integration, followed by the best practices in implementing the CRM systems.

CRM Systems and Rationale for Using Them

CRM systems assist organizations to manage customer interaction and customer data, automate marketing, sales, and customer support, assess business information and managing partner, vendor, and employee relationships.  A quality CRM system can be scalable to serve the needs of small, medium or large business (Financesonline, 2018).  CRM systems can be customized to allow business is taking actionable customer insights using back-end analytics, identify opportunities with predictive analytics, personalize customer support, and streamline operations based on the history of the customers’ interaction with the business.  Organizations must be aware of the CRM system software available to select the most appropriate CRM system that can better serve their needs. 

Various reports identified various CRM systems.  The best CRM systems include Salesforce CRM, Hubspot CRM, Fresh sales, Pipedrive, Insightly, Zoho CRM, Nimble, PipelineDeals, Nutshell CRM, Microsoft Dynamics CRM, SalesforceIQ, Spiro, and ExxpertApps.  Table 1 shows the best CRM systems available in the market.


Table 1.  CRM Systems  (Financesonline, 2018).

Customer satisfaction is the critical element to the success of the business (Bygstad, 2003; Pearlson & Saunders, 2001).  Businesses need to continuously satisfy customers, understand their needs and expectations, provide high-quality products or service at a competitive price to maintain success.  These interactions needed to be tracked by the business and analyzed in an organized way to foster long-lasting customer relationships which get transformed into long-term success. 

CRM can aid business increase sales efficiency, drive the satisfaction of customers, streamline the process of the business and make it more efficient, and identify and resolve bottlenecks at any of the operational processes from marketing, sales to the product development (Ahearne, Rapp, Mariadoss, & Ganesan, 2012; Bygstad, 2003).  The development of customer relationship is not a trivial or straightforward task. When it is done right, it places the business in a competitive edge. However, the implementation of CRM is challenging. 

Challenges and Costs

The implementation of CRM demonstrates the value of customers to the business and placing customer service on top priority (Pearlson & Saunders, 2001).  CRM plays a significant role in collaborating the effort between customer service, marketing, and sales in an organization.  However, the implementation of CRM is challenging especially for small business and startups. 

Various reports addressed various challenges when implementing CRM.  The cost is the most significant challenges organizations are confronted with when implementing the CRM solution (Sage Software, 2015).  The development of a clear objective to achieve with the CRM system is another challenge when implementing CRM.  Organizations are confronted with the type of deployment whether it should be on-premise or cloud-based CRM.  Other challenges involve the employees’ training, the right CRM solution provider and the integration plan in advance (Sage Software, 2015). 

The cost of CRM systems varies from one vendor to another based on the features and deployment key such as data importing, analytics, email integrations, mobile accessibility, email marketing, multi-channel support, SaaS platform, on-premise platform, and SaaS and on-premise.  Some vendors offer CRM for small and medium, or small only, while others offer CRM systems for small, medium and large businesses.  In a report by (Business-Software, 2019), the cost is categorized for more expensive to least expensive using the dollar sign as $$$$ for most expensive, $$$ for expensive, $$ for less expensive and $ for least expensive.  Each vendor CRM system has certain features which must be examined by organizations before making the decision to adopt such a system.  Table 2 provides an idea about the cost from the most expensive, expensive, less expensive, to least expensive.


Table 2.  CRM System Costs based on the Report by (Business-Software, 2019).

The Building Blocks of CRM Systems and Their Integration

Understanding the buildings blocks of the CRM system can assist in the implementation and integration of CRM systems.  CRM involves four core building blocks (Meyer & Kolbe, 2005). The acquirement and continuous update of the knowledge base on the needs of customers, motivations, and behavior over the lifetime of the relationship with customers.  The application of the customers’ knowledge to continuously improve performance through a process of learning from success and failures is the second building block of CRM system.  The integration of marketing, sales, and service activities to achieve a common goal is another building block of the CRM system.  The last building block of the CRM system involves the implementation of appropriate systems to support customer knowledge acquisition, sharing, and the measurement of CRM effectiveness. 

CRM integration is a critical building block for CRM success (Meyer, 2005).  The process of integrating CRM involves various organizational and operational functions of the business such as marketing, sales and service activities.  CRM requires detailed business processes which can be categorized into three core elements; CRM delivery process, CRM support process, and CRM analysis process.  The delivery process involves direct contact with customers to cover part of the customer process such as campaign management, sales management, service management, and complaint management. The support process involves direct contact with the customer that are not designed to fulfill supporting functions within the CRM context such as market research and loyalty management.  The analysis process consolidates and analyzes the knowledge of customers collected in other CRM processes.  The result of this analysis process is passed to the delivery process, support process and to the service innovation and service production processes to enhance their effectiveness such as customer scoring and lead management, customer profiling and segmentation, feedback and knowledge management. 

Best Practices in Implementing These CRM Systems

Various studies and reports addressed best practices in the implementation and integration of CRM systems into the business (Salesforce, 2018; Schiff, 2018).  Organizations must choose a CRM that fits their needs.  Not every CRM is created equally, and if organizations choose a CRM system without properly researching its features, capabilities, and weaknesses, organizations could end up committed to a system that is not appropriate for the business, and as a result, could lose money.  Organizations should decide whether CRM should be cloud-based or on-premise base CRM (Salesforce, 2018; Schiff, 2018; Wailgum, 2008).  Organizations should decide whether CRM should be a service contract or one that costs more upfront to install.  Business should also decide whether it needs in-depth, highly customizable features, or basic functionality will be sufficient to serve the needs of the business.  Organizations should analyze the options and decide on the CRM system that is most appropriate for the business which can serve the needs to build strong customer relationship and gain a competitive edge in the market.

Well-trained personnel and workforce will help organizations achieve its strategic CRM goal. If organizations do not invest in the training of the workforce on how to utilize the CRM system, CRM tools will become useless.  The CRM systems become effective as organizations allow them to be. When the workforce is not using the CRM system to its full potentials, or if the workforce is misusing the CRM systems, CRM will not perform its functions properly and will not serve the needs of the business as expected (Salesforce, 2018; Schiff, 2018). 

Automation is another critical factor for best practice when implementing CRM systems.  Tasks that are associated with data entry can be automated so that CRM systems will be up to date.  The automation will increase the efficiency of the CRM systems as well as the business overall (Salesforce, 2018; Schiff, 2018). 

One of the significant benefits of CRM is its potential in improving and enhancing the cooperative efforts across departments of the business.  When the same information is accessible across various departments, CRM systems eliminate confusions that can be caused by using different terms and different information.  Data without analysis is not meaningless.  Organizations should consider mining the data to get the value that can aid in making sound business decisions.  CRM systems are designed to capture and organize massive amounts of data. If organizations do not take advantages of this massive amount of data to turn it into actionable data, the implementation of CRM will be so limited. The best CRM systems are those that come with built-in analytics features which use advanced programming to mine all captured data and use that information to produce valuable conclusions which can be used for future business decisions.  When organizations take advantages of the CRM built-in analytical feature and analyze the data that CRM system procures, the valuable information can provide insight for business decisions (Salesforce, 2018).

The last element for best practice of the implementation of CRM is for organizations to keep it simple. The best CRM system is the one that will best fit the needs and requirements of the business. The simplicity is a crucial element when implementing CRM.  Organizations should implement CRM that is not complex while it is useful and provides everything the business needs.  Organizations should also consider making changes to the CRM policies where necessary.  The effectiveness of day-to-day operations will be the best indicator of whether the CRM performs as expected, and if it is not, some changes must be made until it performs as expected (Salesforce, 2018; Wailgum, 2008).

Conclusion

This discussion addressed major topics about CRM systems. It began with the identification of the best CRM system in the market and the justification for businesses to implement CRM systems.  It also discusses the benefits and advantages of CRM systems which place businesses into a competitive edge by building a strong relationship with customers to meet customers’ need consistently.  The implementation of a CRM system is not trivial and requires primary considerations from organizations.  Business is confronted with various challenges when implementing CRM systems, among which is the cost.  Thus, organizations should consider analyzing every CRM system vendor to ensure the CRM system will be the best fit for the business needs with a return on investment. The discussion also addressed various best practices among which the workforce is training as a critical factor for successful CRM program, and the simplicity of CRM systems so that organizations can fully utilize the potential of the systems for the benefit of the business to make a sound business decision.

References

Ahearne, M., Rapp, A., Mariadoss, B. J., & Ganesan, S. (2012). Challenges of CRM implementation in business-to-business markets: A contingency perspective. Journal of Personal Selling & Sales Management, 32(1), 117-129.

Business-Software. (2019). Top 40 CRM Software Report.  

Bygstad, B. (2003). The implementation puzzle of CRM systems in knowledge-based organizations. Information Resources Management Journal (IRMJ), 16(4), 33-45.

Financesonline. (2018). 15 Best CRM Systems for Your Business. Retrieved from https://financesonline.com/15-best-crm-software-systems-business/. 

Meyer, M. (2005). Multidisciplinarity of CRM Integration and its Implications. Paper presented at the System Sciences, 2005. HICSS’05. Proceedings of the 38th Annual Hawaii International Conference on.

Meyer, M., & Kolbe, L. M. (2005). Integration of customer relationship management: status quo and implications for research and practice. Journal of strategic marketing, 13(3), 175-198.

Pearlson, K., & Saunders, C. (2001). Managing and Using Information Systems: A Strategic Approach. 2001: USA: John Wiley & Sons.

Sage Software. (2015). Top Challenges in CRM Implementation.  

Salesforce. (2018). 7 CRM Best Practices to Get the Most out of your CRM. Retrieved from https://www.salesforce.com/crm/best-practices/. 

Schiff, J. L. (2018). 8 CRM implementation best practices.

Wailgum, T. (2008). Five Best Practices for Implementing SaaS CRM. Retrieved from https://www.cio.com/article/2435928/customer-relationship-management/five-best-practices-for-implementing-saas-crm.html.

Customer Relationship Management (CRM)

Dr. O. Aly
Computer Science

Abstract

The purpose of this project is to discuss customer relationship management (CRM) based on the identified article by (Payne & Frow, 2005).  The lack of the precise definition and lack of clear framework directed the authors to develop a generic technology-based definition for CRM that has been acceptable by some practitioners. The authors proposed a strategic CRM conceptual framework that is based on five essential processes. It begins with the strategy development process, followed by the value creation process, multi-channel integration process, information management process, and performance assessment process.  Each process plays a significant role in the proposed strategic process-based CRM framework.  This article can aid organizations which are confused about CRM definition and framework.  It can help them implement the building blocks of the CRM strategy based on this proposed framework.

Keywords: Customer Relationship Management (CRM).

Introduction

This project discusses customer relationship management (CRM) using the identified article by (Payne & Frow, 2005).   The project begins with the inception and various definitions of CRM, followed by the CRM adoption problems.  The discussion covered the proposed technology-based definition for CRM based on various literature reviews and proposed strategic process-based CRM conceptual framework by the authors.

CRM Inception and Various Definitions

            The term CRM emerged in the mid-1990s in information technology IT vendor community and practitioner community.  The term CRM is often used to describe technology-based customer solutions such as sales force automation (SFA).  The term CRM and relationship marketing (RM) are used interchangeably in the academic community. 

(Payne & Frow, 2005) identified twelve definitions for customer relationship management (CRM). These definitions describe the meaning and interpretation of CRM from the various aspects.  This project will address only few that are worth mentioning.  CRM is defined as an enterprise initiative that belongs in all area of an organization.  It is also defined as a comprehensive strategy and process of acquiring, retaining, and partnering with selective customers to create superior value for the company and the customer.  CRM is an attempt to provide a strategic bridge between information technology and marketing strategies aimed at developing long-term relationships and profitability, which require information-intensive strategies.  CRM is data-driven marketing.  CRM is making business more customer-centric, using web-based tools and internet presence.  In brief, CRM is all about customers and how organizations can deal with its customers to ensure providing a good product, excellent customer service, with more savings.  Amazon is an excellent example of being customer-centric.  “We see our customers as invited guests to a party, and we are the hosts. It’s our job every day to make every important aspect of the customer experience a little bit better” Jeff Bezos (Expert Market, n.d.).

CRM Adoption Problem

Many organizations are confronted with the adoption of CRM due to the ambiguous view of CRM in business.  CRM meant to some business as direct mail, a loyal card scheme, or a database, while others envisioned CRM as a help desk or a call center, or a data warehouse for data mining.  Other businesses considered CRM as an e-commerce solution such as personalization engine on the internet.  The lack of the standard definition of CRM can contribute to the failure of a CRM project when organizations view CRM from a limited technology perspective or implementing CRM on a fragmented basis.  The lack of a strategic framework for CRM from which to define success is another reason for the disappointing results of many CRM initiatives. 

CRM Proposed Technology-Based Definition

            As a result of the lack of official definition for CRM, the authors developed the following definition for CRM that is based on technology for the purpose of their study. This technology-based definition provides directions for the strategic and cross-functional emphasis of their proposed conceptual framework.

 “CRM is a strategic approach that is concerned with creating improved shareholder value through the development of appropriate relationships with key customers and customer segments. CRM unites the potential of relationship marketing strategies and IT to create profitable, long-term relationships with customers and other key stakeholders. CRM provides enhanced opportunities to use data and information to both understand customers and cocreate [sic] value with them. This requires a cross-functional integration of processes, people, operations, and marketing capabilities that are enabled through information, technology, and applications.”  

CRM Proposed Process-Based Strategic Conceptual Framework

The authors proposed a conceptual framework that is based on five CRM processes; the strategy development process, the value creation process, the multi-channel integration process, the information management process, and the performance assessment process.  The proposed conceptual framework provides an illustration of the interactive set of strategic processes that begins with the strategy development process reflecting a detailed review of the strategy of the business and concludes with the performance assessment process reflecting the improvement in the results and increased share value.  Figure 1 shows the CRM proposed conceptual framework.

Process 1: Strategy Development

The first layer of the proposed framework requires a dual focus on the business strategy and its customer strategy.  The business strategy should first be considered to determine the strategy of the customer.  It begins with a review or articulation of the vision of the business, especially as it related to CRM.  The customer strategy is the responsibility of the chief executive officer (CEO), the board, and the strategy director.  It is also the responsibility of the marketing department. It involves examining the existing and potential customer base and identifying the most appropriate customer segmentation.  To summarize, the strategy development process involves a detailed evaluation of the business strategy and the development of the appropriate customer strategy, providing a concise non-ambiguous platform based on which CRM activities will be developed.

Process 2: Value Creation

The second process of the proposed conceptual framework is about the value creation.  The value creation process shifts the outputs of the strategy development process into programs which extract and deliver value. It involves three key elements; determining the value which the company can provide to its customer, determining the value which the company can receive from its customers, and managing this value exchange. The first key element of the value the company can provide to customers draws on the concept of the benefits that enhance the customer offer.  Businesses should implement a value assessment to quantify the relative importance that customers place on the various characteristics of a product.  Analytical tools can also discover significant market segments with service needs which are not entirely offered to the customer by the characteristics of existing products.  The second key element of this process involves the value to organizations and the lifetime value. The retention of the customer is a crucial value to the organization.  It reflects a significant part of the research on value creation. 

Process 3: Multi-Channel Integration

The third process involves multi-channel integration.  This process is one of the most critical processes in CRM because it takes the output of the first two processes of the business strategy and the value creation process and translates them into value-adding activities with customers.  This process of multi-channel integration involves channel options and integrated channel management. The channel options involve sales force, outlets, telephony, direct marketing, e-commerce, m-commerce.  The integrated channel management depends on the ability to uphold the same high standards across multiple, different channels.  The multi-channel integration process is a critical process in CRM because it represents the point of co-creation of customer value.  However, the success of this process depends on the ability of the business to collect and deploy customer information from all channels and to integrate it with other relevant information.

Process 4: Information Management

The fourth process involves information management. This process involves the collection, collation and the use of the customers’ data to generate insight and appropriate marketing responses. This process involves data repository, IT systems, analytical tools, front office and back office applications, and CRM technology market participants.  The data repository is the critical component of this process as it provides a corporate memory of the customers.  The IT systems are required before the database is integrated into a data warehouse and user access can be provided across the organization.  The analytical tools enable effective use of the data warehouse which can be found in data mining.  The front office applications are used to support all those activities that involve direct interface with customers such as SFA and call center management. The back-office application support internal administration activities and supplier relationship, including human resources, procurement, warehouse management.  The critical concern of the front office and back office is the cooperation to improve the customer relationship and workflow. The CRM technology market participants are the last component of the information management process. CRM applications and CRM service providers are categorized into specific categories.  The critical segments for CRM applications are Integrated CRM and Enterprise Resource Planning Suite, CRM Suite, CRM Framework, CRM Best of Breed, and Built it Yourself. These CRM service providers and consultants offer implementation support and specialize in areas such as corporate strategy, CRM strategy, change management, organization design, training, human resources, business transformation, infrastructure building, and systems integration, infrastructure outsourcing, business insight, research, and business process outsourcing.

To summarize, this information management process provides a means of sharing relevant information of customers throughout the enterprise and replicating the mind of the customer.  IT planning should be implemented to support the CRM strategy. Data analysis tools can be used to measure the business activities, providing the basis for the performance assessment process. 

Process 5: Performance Assessment

The last process of the proposed strategic CRM conceptual framework is the performance assessment covering the critical task of ensuring that the strategic approach of the organization about CRM is being delivered to an appropriate and acceptable standard and that a basis for future enhancement is established. This process involves two significant steps; the shareholder results, and performance monitoring. Organizations should consider building employees value, customer value, and shareholder value and cost reduction to achieve the ultimate goal of the strategic CRM.   The performance monitoring is another aspect of this process.  Metrics used by organizations to measure and monitor the CRM performance should be well developed and well communicated.

Figure 1.  CRM Proposed Conceptual Framework (Payne & Frow, 2005).

Conclusion

            The project discussed CRM based on the identified article by (Payne & Frow, 2005).  The lack of the precise definition and lack of clear framework directed the authors to develop a generic technology-based definition for CRM that have been acceptable by some practitioners. The authors proposed a strategic CRM conceptual framework that is based on five important processes. It begins with the strategy development process, followed by the value creation process, multi-channel integration process, information management process, and performance assessment process.  Each process plays a significant role in the strategic CRM framework.  This article can aid organizations which are confused about CRM definition and framework.  It can help them implement the building blocks of the CRM strategy base on this proposed framework.

References

Expert Market. (n.d.). Amazon CRM Case Study. Retrieved from https://www.expertmarket.co.uk/crm-systems/amazon-crm-case-study. 

Payne, A., & Frow, P. (2005). A strategic framework for customer relationship management. Journal of marketing, 69(4), 167-176.

The Importance of Enterprise Resource Planning (ERP) System

Dr. O. Aly
Computer Science

The purpose of this discussion is to answer the following questions and the importance of enterprise resource planning (ERP) systems in the context of enterprise planning:

  • Today, ERP systems sit at the center of any organization’s information technology infrastructure. Why?
  • What advantages do ERPs give to an organization? How can standardized ERPs help provide a competitive advantage?
  • What is the underlying structure and architecture of an ERP system?

The Justification for ERP Systems Importance in Information Technology Infrastructure

The term ERP began probably for the first time in 1992 (Klaus, Rosemann, & Gable, 2000).  Klaus, Helmut, and Gable (2000) indicated that Lopes in his article of 1992 showed how that ERP was conceived of at the time the term was coined and praised ERP systems as “better, faster and more economical business solutions” (p. 27).  ERP is described as the new information systems paradigm. Thomas Davenport introduced IS community to ERP systems in 1996.  ERP papers were presented at three international information systems conferences in 1997 which marked the beginning of the period of literature.  Thomas Davenport avoided the term ERP and called it mega-packages.  Figure 1 shows the evolution of ERP.

Figure 1. The Evolution of ERP and The Introduction of Information System to ERP (Klaus et al., 2000).

The importance of ERP has increase in the information system literature over the past few years (Klaus et al., 2000).  ERP attracted the attention of the IS field once it became apparent that large, and mainly US-based corporations had begun to install these systems. (Nah, Zuckweiler, & Lee-Shang Lau, 2003) indicated that Holland, Light, and Gison (1999) found that business and IT legacy systems determine the degree of IT and organizational change required for ERP implementation success.

Enterprise resource planning (ERP) system is a packaged software system that enables organizations to manage the efficiency and effectiveness of resources use such as materials, human resources, finance and so forth (Klaus et al., 2000; Nah et al., 2003; Wailgum & Perkins, 2018).  The ERP system supports a process-oriented view of an enterprise and standardizes business process across the organization (Nah et al., 2003).  These ERP systems are comprehensive, packaged software solutions to integrate the complete range of a business’s processes and functions to present a holistic view of the business from single information and IT architecture (Klaus et al., 2000). Organizations which implemented ERP systems found it cost effective and a competitive necessity (Klaus et al., 2000).

ERP Advantages and ERP Standardization

ERP systems provide various benefits to organizations including operational benefits, managerial benefits, strategic benefits, IT infrastructure benefits, and organizational benefits (Shang & Seddon, 2000). The operational benefits include cost reduction, cycle time reduction, productivity improvement, quality improvement, and customer services improvement. The managerial benefits include better resource management, better decision making, and better performance control. The strategic benefits include support current and future business growth plan, support business alliance, building business innovation, building cost leadership, generating or enhancing product differentiation, building external linkages, worldwide expansion, and enabling e-business.  The IT infrastructure benefits include increased business flexibility, IT costs reduction, increased IT infrastructure capability.  The organizational benefits include supporting business organizational change, facilitating business learning and broaden employment skills; empowerment changed the culture with common visions, changing employees’ behavior with shifted focus, and better employees’ morale and satisfaction (Shang & Seddon, 2000).

The significant advantages of the enterprise system are that all modules of the IS can easily communicate with each other, offering various efficiencies over the stand-alone system (Pearlson & Saunders, 2001). Information from one functional area is often needed by another area in business.  For instance, the inventory system stores information about vendors who supply specific parts. The same information is also required by the accounts payable system, which pays the vendors for their products.  It makes sense to integrate these two systems to have a single accurate record of vendors.  ERP systems are useful tools for organizations seeking to centralize operations and decision making because this centralization will provide effective use of the organizational databases (Pearlson & Saunders, 2001).  Redundancy of the data entries and duplicate data will be eliminated; standards for numbering, naming and coding may be forced; and data and records can be cleaned up through standardization.  The ERP system can reinforce the use of standard procedures across different locations.   

Standardization plays a significant role in the efficiency of the enterprise (Pearlson & Saunders, 2001).  The inconsistency of the data can cause significant issues and must be addressed in ERP systems.  For instance, when integrating two systems such as inventory and payable, the vendor name can be different in inventory than in payable. Example of this scenario can be IBM can be listed in the inventory as IBM corp., while in payable International Business Machines.  This inconsistency of the data makes it challenging to integrate databases and must be addressed for ERP systems to provide the optimal advantages.  The implementation of ERP system requires organizations to make changes in the structure of the organization and often in the individual tasks implemented by workers. Managers are required to change the business process and more likely to redesign them completely to accommodate the information system. 

ERP System Architecture

Various studies discussed ERP system framework and architecture.  (Al-Mudimigh, Ullah, & Saleem, 2009) discussed ERP framework of an automated data mining system.  The proposed framework has three primary layers; CRM layer, the ERP layer, and Knowledge Discovery Layer.  The CRM layer contains sales management, marketing management, customer service management, and prediction and forecasting.  The ERP layer contains purchasing, sales, technology maintenance, production, accounting, audit, and warehouse.  The knowledge discovery layer contains selected data, transformed data, rule-based DB, data warehouse, data mining, and results.  Figure 2 illustrates the proposed ERP framework.

Figure 2.  ERP Proposed Framework (Al-Mudimigh et al., 2009).

(Bahssas, AlBar, & Hoque, 2015) discussed various types of ERP architectures from the client-server framework, web-based ERP, Cloud ERP, N-tier ERP, and mobile ERP architecture.  The mobile ERP architecture is selected for this discussion as it is a practical example in the age of the digital world.  The mobile ERP architecture is divided into four tiers; the ERP system tier, content access engine and cache storage tier, content extraction engine tier, and user interface tier.  The tier of the content access engine and cache storage contains cache structures, CML and remote function call (RFC) server. This tier is responsible for building queries based on mobile users request, and data retrieve in XML format.  The RFC server is used to enable the business functions of an ERP system remotely.  Tier three is the content extraction engine which takes charge of presentation logic and determines the type of browsers used by user’s mobile devices. Tier four is the user interface tier where mobile devices such as WAP-enabled phones, and PDAs with their particular browser and GUI are integrated (Bahssas et al., 2015).  Figure 3 illustrates the selected Mobile ERP framework.

Figure 3.  Mobile ERP Architecture (Bahssas et al., 2015).

Conclusion

In the age of Big Data and Big Data Analytics, the role of information system in ERP has increased than ever before.  ERP at the beginning was isolated from IS until Thomas Davenport introduced IS community to ERP systems in 1996.  The integration of ERP with IS is a complex process and requires commitment from management and long-term vision. Enterprises should plan for such a shift at the budget level, IT professionals’ levels and operational level.  This process is not an overnight process, but it requires a holistic view of the business operation at present as well as for the future.  It requires a comprehension of the role of the current technology such as BD and BDA in ERP.  Organizations are under pressure to be competitive and stay competitive in the current digital world.  ERP provides various benefits to the organization from operational benefits to IT benefits.  Various studies proposed various ERP frameworks. In summary, ERP systems sit at the center of any organization’s information technology infrastructure because of the various benefits of ERP systems empowering businesses.

References

Al-Mudimigh, A. S., Ullah, Z., & Saleem, F. (2009). A framework of an automated data mining systems using ERP model. International journal of computer and Electrical Engineering, 1(5), 651.

Bahssas, D. M., AlBar, A. M., & Hoque, M. R. (2015). Enterprise resource planning (ERP) systems: design, trends and deployment. The International Technology Management Review, 5(2), 72-81.

Klaus, H., Rosemann, M., & Gable, G. G. (2000). What is ERP? Information Systems Frontiers, 2(2), 141-162.

Nah, F. F.-H., Zuckweiler, K. M., & Lee-Shang Lau, J. (2003). ERP implementation: chief information officers’ perceptions of critical success factors. International journal of Human-computer Interaction, 16(1), 5-22.

Pearlson, K., & Saunders, C. (2001). Managing and Using Information Systems: A Strategic Approach. 2001: USA: John Wiley & Sons.

Shang, S., & Seddon, P. B. (2000). A comprehensive framework for classifying the benefits of ERP systems. AMCIS 2000 proceedings, 39.

Wailgum, T., & Perkins, B. (2018). What is ERP?  A Guide to Enterprise Resource Planning Systems. Retrieved from https://www.cio.com/article/2439502/enterprise-resource-planning/enterprise-resource-planning-erp-definition-and-solutions.html#tk.cio_rs.

Proposal: State-of-the-Art Healthcare System in Four States.

Dr. O. Aly
Computer Science

Abstract

The purpose of this proposal is to design a state-of-the-art healthcare system in four States of Colorado, Utah, Arizona, and New Mexico.   Big Data and Big Data Analytics have played significant roles in various industries including the healthcare industry.  The value that is driven by BDA can save lives and minimize costs for patients.  The project proposes a design to apply BD and BDA in the healthcare system across these identified four States.  Cloud computing is the most appropriate technology to deal with the large volume of healthcare data at the storage level as well as at the data processing level.  Due to the security issue of the cloud computing, the Virtual Private Cloud (VPC) will be used.  VPC provides a secure cloud environment using network traffic security setup using security groups and network access control lists.   The project requires other components to be fully implemented using the latest technology such as Hadoop and MapReduce for data streaming processing, machine learning for artificial intelligence, which will be used for Internet of Things (IoT).  The NoSQL database HBase and MongoDB will be used to handle the semi-structured data such as XML and unstructured data such as logs and images.  Spark will be used for real-time data processing which can be vital for urgent care and emergency services.  This project addresses the assumptions and limitations plus the justification for selecting these specific components.  All stakeholders in the healthcare sector including providers, insurers, pharmaceuticals, practitioners should cooperate and coordinate to facilitate the implementation process.  The rigid culture and silo pattern need to change for better healthcare which can save millions of dollars to the healthcare industry and provide excellent care to the patients at the same time.

Keywords: Big Data Analytics; Hadoop; Healthcare Big Data System; Spark.

Introduction

            In the age of Big Data (BD), information technology plays a significant role in the healthcare industry (HIMSS, 2018).  The role of information technology in healthcare The healthcare sector generates a massive amount of data every day to conform to standards and regulations (Alexandru, Alexandru, Coardos, & Tudora, 2016).  The generated Big Data has the potential to support many medical and healthcare operations including clinical decision support, disease surveillance and population health management (Alexandru et al., 2016). This project proposes a state-of-the-art integrated system for hospitals located in Arizona, Colorado, New Mexico, and Utah.  The system is based on the Hadoop ecosystem to help the hospitals maintain and improve human health via diagnosis, treatment and disease prevention. 

It begins with Big Data Analytics in Healthcare Overview, which covers the benefits and challenges of BD and BDA in the healthcare industry.  The overview also covers the various healthcare data sources for data analytics, in different formats such as semi-structured, e.g., XML and JSON, and unstructured, e.g., images and XRays.  The second section addresses the healthcare BDA Design Proposal Using Hadoop. This section covers various components.  The first component discusses the requirements for this design.  These requirements include state-of-the-art technology such as Hadoop/MapReduce, Spark, NoSQL database, Artificial Intelligence (AI), Internet of Things (IoT).  The project also covers various diagrams including the data flow diagram, a communication flow chart, and the overall system diagram.  The healthcare design system is bounded by regulation, policies, and governance such as HIPAA, that is also covered in this project.  The justification, limitation, and assumptions are also discussed.

Big Data Analytics in Healthcare Overview

BD and BDA are terms that have been used interchangeably and described as the next frontier for innovation, competitions, and productivity (Maltby, 2011; Manyika et al., 2011).  BD has a multi-V model with unique characteristics, such as volume referring to the large dataset, velocity refers to the speed of the computation as well as data generation, and variety referring to the various data types such as semi-structured and unstructured (Assunção, Calheiros, Bianchi, Netto, & Buyya, 2015; Hu, Wen, Chua, & Li, 2014).  BD is described as the next frontier for competition, innovation, and productivity.  Various industries including healthcare have taken this opportunity and applied BD and BDA in their business models (Manyika et al., 2011).  McKinsey Institute predicted $300 billion as a potential annual value to US healthcare (Manyika et al., 2011).  

The healthcare industry generated extensive data driven by keeping patients’ records, complying with regulations and policies, and patients care (Raghupathi & Raghupathi, 2014).  The current trend is digitalizing this explosive growth of the data in the age of Big Data (BD) and Big Data Analytics (BDA) (Raghupathi & Raghupathi, 2014).  BDA has made a revolution in healthcare by transforming the valuable information, knowledge to predict epidemics, cure diseases, improve quality of life, and avoid preventable deaths (Van-Dai, Chuan-Ming, & Nkabinde, 2016).  Various applications of BDA in healthcare include pervasive health, fraud detection, pharmaceutical discoveries, clinical decision support system, computer-aided diagnosis, and biomedical applications. 

Healthcare Big Data Benefits and Challenges

            Healthcare sector employs BDA in various aspect of healthcare such as detecting diseases at early stages, providing evidence-based medicine, minimizing doses of medication to avoid any side effects, and delivering useful medicine base on genetic analysis.  The use of BD and BDA can reduce the re-admission rate, and thereby the healthcare related costs for patients are reduced.  Healthcare BDA can be used to detect spreading diseases earlier before the disease gets spread using real-time analytics (Archenaa & Anita, 2015; Raghupathi & Raghupathi, 2014; Wang, Kung, & Byrd, 2018).   Example of the application of BDA in the healthcare system is Kaiser Permanente implementing a HealthConnect technique to ensure data exchange across all medical facilities and promote the use of electronic health records (Fox & Vaidyanathan, 2016).

            Despite the various benefits of BD and BDA in the healthcare sector, various challenges and issues are emerging from the application of BDA in healthcare.  The nature of the healthcare industry poses challenging to BDA (Groves, Kayyali, Knott, & Kuiken, 2016).  The episodic culture, the data puddles, and the IT leadership are the three significant challenges of the healthcare industry to apply BDA.  The episodic culture addresses the conservative culture of the healthcare and the lack of IT technologies mindset creating rigid culture.  Few providers have overcome this rigid culture and started to use the BDA technology. The data puddles reflect the silo nature of healthcare.  Silo is described as one of the most significant flaws in the healthcare sector (Wicklund, 2014).  The use of the technology properly is lacking in healthcare sector resulting in making the industry fall behind other industries. All silos use their methods to collect data from labs, diagnosis, radiology, emergency, case management and so forth.  The IT leadership is another challenge is caused by the rigid culture of the healthcare industry.  The lack of the latest technologies among the IT leadership in the healthcare industry is a severe problem. 

Healthcare Data Sources for Data Analytics

            The current healthcare data is collected from clinical and non-clinical sources (InformationBuilders, 2018; Van-Dai et al., 2016; Zia & Khan, 2017).  The electronic healthcare records are digital copies of the medical history of the patients.  It contains a variety of data relevant to the care of the patients such as demographics, medical problems, medications, body mass index, medical history, laboratory test data, radiology reports, clinical notes, and payment information. These electronic healthcare records are the most important data in healthcare data analytics, because it provides effective and efficient methods for the providers and organizations to share data (Botta, de Donato, Persico, & Pescapé, 2016; Palanisamy & Thirunavukarasu, 2017; Van-Dai et al., 2016; Wang et al., 2018).  

The biomedical imaging data plays a crucial role in healthcare data to aid disease monitoring, treatment planning and prognosis.  This data can be used to generate quantitative information and make inferences from the images that can provide insights into a medical condition.  The images analytics is more complicated due to the noises of the data associated with the images and is one of the significant limitations with biomedical analysis (Ji, Ganchev, O’Droma, Zhang, & Zhang, 2014; Malik & Sangwan, 2015; Van-Dai et al., 2016). 

The sensing data is ubiquitous in the medical domain both for real-time and for historical data analysis.  The sensing data involve several forms of medical data collection instruments such as the electrocardiogram (ECG) and electroencephalogram (EEG) which are vital sensors to collect signals from various parts of the human body.  The sensing data plays a significant role for intensive care units (ICU) and real-time remote monitoring of patients with specific conditions such as diabetes or high blood pressure.  The real-time and long-term analysis of various trends and treatment in remote monitoring programs can help providers monitor the state of those patients with certain conditions(Van-Dai et al., 2016). 

The biomedical signals are collected from many sources such as hearts, blood pressure, oxygen saturation levels, blood glucose, nerve conduction, and brain activity.  Examples of biomedical signals include electroneurogram (ENG), electromyogram (EMG), electrocardiogram (ECG), electroencephalogram (EEG), electrogastrogram (EGG), and phonocardiogram (PCG).  The biomedical signals real-time analytics will provide better management of chronic diseases, earlier detection of adverse events such as heart attacks, and strokes and earlier diagnosis of disease.   These biomedical signals can be discrete or continuous based on the kind of care or severity of a particular pathological condition (Malik & Sangwan, 2015; Van-Dai et al., 2016).

The genomic data analysis helps better understand the relationship between various genetic, mutations, and disease conditions. It has great potentials in the development of various gene therapies to cure certain conditions.  Furthermore, the genomic data analytics can assist in translating genetic discoveries into personalized medicine practice (Liang & Kelemen, 2016; Luo, Wu, Gopukumar, & Zhao, 2016; Palanisamy & Thirunavukarasu, 2017; Van-Dai et al., 2016).

The clinical text data analytics using the data mining are the transformation process of the information from clinical notes stored in unstructured data format to useful patterns.  The manual coding of clinical notes is costly and time-consuming, because of their unstructured nature, heterogeneity, different format, and context across different patients and practitioners.  Various methods such as natural language processing (NLP) and information retrieval can be used to extract useful knowledge from large volume of clinical text and automatically encoding clinical information in a timely manner (Ghani, Zheng, Wei, & Friedman, 2014; Sun & Reddy, 2013; Van-Dai et al., 2016).

The social network healthcare data analytics is based on various kinds of collected social media sources such as social networking sites, e.g., Facebook, Twitter, Web Logs, to discover new patterns and knowledge that can be leveraged to model and predict global health trends such as outbreaks of infections epidemics (InformationBuilders, 2018; Luo et al., 2016; Van-Dai et al., 2016; Zia & Khan, 2017). Figure 1 shows a summary of these healthcare data sources.


Figure 1.  Healthcare Data Sources.

Healthcare Big Data Analytics Design Proposal Using Hadoop

            The implementation of BDA in the hospitals within the four States aims to improve the safety of the patient, the clinical outcomes, promoting wellness and disease management (Alexandru et al., 2016; HIMSS, 2018).  The BDA system will take advantages of the large healthcare-generated data to provide various applied analytical disciplines such as statistical, contextual, quantitative, predictive and cognitive spectrums (Alexandru et al., 2016; HIMSS, 2018).  These applied analytical disciplines will drive the fact-based decision making for planning management and learning in hospitals (Alexandru et al., 2016; HIMSS, 2018). 

            The proposal begins with the requirements, followed by the data flow diagram, the communication flowcharts, and the overall system diagram.  The proposal addresses the regulations, policies, and governance for the medical system.  The limitation and assumptions are also addressed in this proposal, followed by the justification for the overall design.

1.      Basic Design Requirements

The basic requirement for the implementation of this proposal included not only the tools and required software, but also the training at all levels from staff, to nurses, to clinicians, to patients.  The list of the requirements is divided into system requirement, implementation requirement, and training requirements. 

1.1 Cloud Computing Technology Adoption Requirement

The volume is one of the significant characteristics of BD, especially in the healthcare industry (Manyika et al., 2011).  Based on the challenges addressed earlier when dealing with BD and BDA in healthcare, the system requirements cannot be met using the traditional on-premise technology center, as it cannot handle the intensive computation requirements of BD, and the storage requirement for all the medical information from various hospitals from the four States (Hu et al., 2014). Thus, the cloud computing environment is found to be more appropriate and a solution for the implantation of this proposal.  Cloud computing plays a significant role in BDA (Assunção et al., 2015).  The massive computation and storage requirement of BDA brings the critical need for cloud computing emerging technology (Mehmood, Natgunanathan, Xiang, Hua, & Guo, 2016).  Cloud computing offers various benefits such as cost reduction, elasticity, pay per use, availability, reliability, and maintainability (Gupta, Gupta, & Mohania, 2012; Kritikos, Kirkham, Kryza, & Massonet, 2017).  However, although cloud computing offers various benefits, it has security and privacy issues using the standard deployment models of public cloud, private cloud, hybrid cloud, and community cloud.  Thus, one of the major requirements is to adopt the Virtual Private Cloud as it has been regarded as the most prominent approach to trusted computing technology (Abdul, Jena, Prasad, & Balraju, 2014).

 1.2 Security Requirement

Cloud computing has been facing various threats (Cloud Security Alliance, 2013, 2016, 2017).   Records showed that over the last three years from 2015 until 2017, the number of breaches, lost medical records, and settlements of fines are staggering (Thompson, 2017).  The Office of Civil Rights (OCR) issued 22 resolution agreements, requiring monetary settlements approaching $36 million (Thompson, 2017).  Table 1 shows the data categories and the total for each year. 

Table 1.  Approximation of Records Lost by Category Disclosed on HHS.gov (Thompson, 2017)

Furthermore, a recent report published by HIPAA showed the first three months of 2018 experienced 77 healthcare data breaches reported to the OCR (HIPAA, 2018d).  In the second quarter of 2018, at least 3.14 million healthcare records were exposed (HIPAA, 2018a).  In the third quarter of 2018, 4.39 million records exposed in 117 breaches (HIPAA, 2018c).

Thus, the protection of the patients’ private information requires the technology to extract, analyze, and correlated potentially sensitive dataset (HIPAA, 2018b).  The implementation of BDA requires security measures and safeguards to protect the privacy of the patients in the healthcare industry (HIPAA, 2018b).  Sensitive data should be encrypted to prevent the exposure of data in the event of theft (Abernathy & McMillan, 2016).  The security requirements involve security at the VPC cloud deployment model as well as at the local hospitals in each State (Regola & Chawla, 2013).  The security at the VPC cloud deployment model should involve the implementation of security groups and network access control lists to allow access to the right individuals to the right applications and patients’ records.  Security group in VPC acts as the first line of defense firewall for the associated instances of the VPC (McKelvey, Curran, Gordon, Devlin, & Johnston, 2015).  The network access control lists act as the second layer of defense firewall for the associated subnets, controlling the inbound and the outbound traffic at the subnet level (McKelvey et al., 2015). 

The security at the local hospitals level in each State is mandatory to protect patients’ records and comply with HIPAA regulations (Regola & Chawla, 2013).  The medical equipment must be secured with authentication and authorization techniques so that only the medical staff, nurses and clinicians have access to the medical devices based on their role.  The general access should be prohibited as every member of the hospital has a different role with different responses.  The encryption should be used to hide the meaning or intent of communication from unintended users (Stewart, Chapple, & Gibson, 2015).   The encryption is an essential element in security control especially for the data in transit (Stewart et al., 2015).  The hospital in all four State should implement the encryption security control using the same type of the encryption across the hospitals such as PKI, cryptographic application, and cryptography and symmetric key algorithm (Stewart et al., 2015).

The system requirements should also include the identity management systems that can correspond with the hospitals in each state. The identity management system provides authentication and authorization techniques allowing only those who should have access to the patients’ medical records.  The proposal requires the implementation of various encryption techniques such as secure socket layer (SSL), Transport Layer Security (TLS), and Internet Protocol Security (IPSec) to protect information transferred in public network (Zhang, R. & Liu, 2010).  

 1.3 Hadoop Implementation for Data Stream Processing Requirement

While the velocity of BD leads to the speed of generating large volume of data and requires speed in data processing (Hu et al., 2014), the variety of the data requires specific technology capabilities to handle various types of dataset such as structured, semi-structured, and unstructured data (Bansal, Deshpande, Ghare, Dhikale, & Bodkhe, 2014; Hu et al., 2014).  Hadoop ecosystem is found to be the most appropriate system that is required to implement BDA (Bansal et al., 2014; Dhotre, Shimpi, Suryawanshi, & Sanghati, 2015).  The implementation requirements include various technologies and various tools.  This section covers various components that are required when implementing Hadoop technology in the four States for healthcare BDA system.

Hadoop has three significant limitations, which must be addressed in this design.  The first limitation is the lack of technical support and document for open source Hadoop (Guo, 2013).   Thus, this design requires the Enterprise Edition of Hadoop to get around this limitation using Cloudera, Hortonworks, and MapR (Guo, 2013). The final decision for which product will be determined by the cost analysis team.  The second limitation is that Hadoop is not optimal for real-time data processing (Guo, 2013). The solution for this limitation will require the integration of real-time streaming program as Spark or Storm or Kafka (Guo, 2013; Palanisamy & Thirunavukarasu, 2017). This requirement of integrating Spark is discussed below in a separate requirement for this design (Guo, 2013). The third limitation is that Hadoop is not a good fit for large graph dataset (Guo, 2013). The solution for this limitation requires the integration of GraphLab which is also discussed below in a separate requirement for this design.

1.3.1 Hadoop Ecosystem for Data Processing

Hadoop technologies have been in the front-runner for Big Data application (Bansal et al., 2014; Chrimes, Zamani, Moa, & Kuo, 2018).  Hadoop ecosystem will be part of the implementation requirement as it is proven to serve well with intensive computation using large datasets (Raghupathi & Raghupathi, 2014; Wang et al., 2018).   The implementation of Hadoop technology will be performed in the VPC deployment model.  The Hadoop version that is required is version 2.x to include YARN for resource management  (Karanth, 2014).  Hadoop 2.x also include HDFS snapshots to provide a read-only image of the entire or a particular subset of a filesystem to protect against user errors, backup, and disaster recovery (Karanth, 2014). The Hadoop platform can be implemented to gain more insight into various areas (Raghupathi & Raghupathi, 2014; Wang et al., 2018). Hadoop ecosystem involves Hadoop Distributed File System, MapReduce, and NoSQL database such as HBase, and Hive to handle a large volume of dataset using various algorithms and machine learning to extract values from the medical records that are structured, semi-structured, and unstructured (Raghupathi & Raghupathi, 2014; Wang et al., 2018).  Other components to support Hadoop ecosystem include Oozie for workflow, Pig for scripting, and Mahout for machine learning which is part of the artificial intelligence (AI) (Ankam, 2016; Karanth, 2014).  Hadoop ecosystem will also include Flume for log collector, Sqoop for data exchange, and Zookeeper for coordination (Ankam, 2016; Karanth, 2014).  HCatalog is a required component to manage the metadata in Hadoop (Ankam, 2016; Karanth, 2014).   Figure 2 shows the Hadoop ecosystem before integrating Spark for real-time analytics.


Figure 2.  Hadoop Architecture Overview (Alguliyev & Imamverdiyev, 2014).

1.3.2 Hadoop-specific File Format for Splittable and Agnostic Compression

The ability of splittable files plays a significant role during the data processing (Grover, Malaska, Seidman, & Shapira, 2015).  Therefore, Hadoop-specific file formats of SequenceFile, and Serialization formats like Avro, and columnar formats such as RCFile and Parquet should be used because these files share two essential characteristics that are essential for Hadoop applications: splittable compression and agnostic compression (Grover et al., 2015).  Hadoop allows large files to be split for input to MapReduce and other types of jobs, which is required for parallel processing and an essential key to leveraging data locality feature of Hadoop (Grover et al., 2015). The agnostic compression is required to compress data using any compression codec without readers having to know the codec because the codec is stored in the header metadata of the file format (Grover et al., 2015).  Figure 3 summarizes the three Hadoop file types with the two common characteristics.  


Figure 3. Three Hadoop File Types with the Two Common Characteristics.  

1.3.3 XML and JSON Use in Hadoop

The clinical data include semi-structured formats such as XML and JSON.  The split process of XML and JSON is not straightforward and can present unique challenges using Hadoop (Grover et al., 2015).  Since and Hadoop does not provide a built-in InputFormat for either format of XML and JSON (Grover et al., 2015).  Furthermore, JSON presents more challenges to Hadoop than XML because no token is available to mark the beginning or end of a record (Grover et al., 2015). When using these file format, two primary considerations must be taken.  The container format such as Avro should be used because Avro provides a compact and efficient method to store and process the data when transforming the data into Avro (Grover et al., 2015).  A library for processing XML or JSON should be designed (Grover et al., 2015).  XMLLoader in PiggyBank library for Pig is an example when using XML data type.  The Elephant Bird project is an example of a JSON data type file (Grover et al., 2015). 

1.4 HBase and MongoDB NoSQL Database Integration Requirement

In the age of BD and BDA, the traditional data store is found inadequate to handle not only the large volume of the dataset but also the various types of the data format such as unstructured and semi-structured (Hu et al., 2014).   Thus, Not Only SQL (NoSQL) database is emerged to meet the requirement of the BDA.  These NoSQL data stores are used for modern, and scalable databases (Sahafizadeh & Nematbakhsh, 2015).  The scalability feature of the NoSQL data stores enables the systems to increase the throughput when the demand increases during the processing of the data (Sahafizadeh & Nematbakhsh, 2015).  The platform can incorporate two scalability types to support the large volume of the datasets; the horizontal and vertical scalability.  The horizontal scaling allows the distribution of the workload across many servers and nodes to increase the throughput, while the vertical scaling requires more processors, more memories and faster hardware to be installed on a single server (Sahafizadeh & Nematbakhsh, 2015). 

NoSQL data stores have various types such as MongoDB, CouchDB, Redis, Voldemort, Cassandra, Big Table, Riak, HBase, Hypertable, ZooKeeper, Vertica, Neo4j, db4o, and DynamoDB.  These data stores are categorized into four types: document-oriented, column-oriented or column-family stores, graph database, and key-value (EMC, 2015; Hashem et al., 2015). The document-oriented data store can store and retrieve collections of data and documents using complex data forms in various formats such as XML and JSON as well as PDF and MS word (EMC, 2015; Hashem et al., 2015).  MongoDB and CouchDB are examples of document-oriented data stores (EMC, 2015; Hashem et al., 2015).  The column-oriented data store can store the content in columns aside from rows with the attributes of the columns stored contiguously (Hashem et al., 2015).  This type of datastore can store and render blog entries, tags, and feedback (Hashem et al., 2015).  Cassandra, DynamoDB, and HBase are examples of column-oriented data stores (EMC, 2015; Hashem et al., 2015).  The key-value can store and scale large volumes of data and contains value and a key to access the value (EMC, 2015; Hashem et al., 2015).  The value can be complicated, but this type of data stores can be useful in storing the user’s login ID as the key referencing the value of patients.  Redis and Riak are examples of the key-value NoSQL data store (Alexandru et al., 2016).  Each of these NoSQL data stores has its limitations and advantages.  The graph NoSQL database can store and represent data using graph models with nodes, edges, and properties related to one another through relations which will be useful for unstructured medical data such as images, and lab results. Neo4j is an example of this type of graph NoSQL database (Hashem et al., 2015).  Figure 4 summarizes these NoSQL data stores, data types for storage, and examples.

Figure 4.  Big Data Analytics NoSQL Data Store Types.

The proposed design requires one or more NoSQL data stores to meet the requirement of BDA using Hadoop environment for this healthcare BDA system.  Healthcare big data has unique characteristics which must be addressed when selecting the data store and consideration must be taken for the various types of data.   HBase and HDFS are the commonly used storage manager in the Hadoop environment (Grover et al., 2015).  HBase is a column-oriented data store which will be used to store multi-structured data (Archenaa & Anita, 2015).  HBase sets on top of HDFS in the Hadoop ecosystem framework (Raghupathi & Raghupathi, 2014).   

MongoDB will also be used to store the semi-structured data set such as XML and JSON. Metadata for HBase data schema, to improve the accessibility and readability of HBase data schema (Luo et al., 2016).  Riak will be used for a key-value dataset which can be used for the dictionary, hash tables and associative arrays that can be used for login and user ID information for patients as well as for providers and clinicians (Klein et al., 2015).  Neo4j NoSQL will be used to store the images with nodes and edges such as Lab images, XRays (Alexandru et al., 2016).

The proposed healthcare system has a logical data model and query patterns that need to be supported by NoSQL databases (Klein et al., 2015). The data model will include reading the medical test results for patients is a core function used to populate the user interface. It will also include a strong replica consistency when a new medical result is written for a patient.  Providers can make patient care decisions using these records.  All providers will be able to see the same information within the hospital systems in the four States, whether they are at the same site as the patients, or providing telemedicine support from another location. 

The logical data model includes mapping the application-specific model into the particular data model, indexing, and query language capabilities of each database.  The HL7 Fast Healthcare Interoperability Resources (FHIR) is used as the logical data model for records analysis.  The patient’s data such as demographic information such as names, addresses, and telephone will be modeled using the FHIR Patient Resources such as result quantity, and result units (Klein et al., 2015). 

1.5 Spark Integration for Real-Time Data Processing Requirement

While the architecture of Hadoop ecosystem has been designed in various scenarios for data storage, data management statistical analysis, and statistical association between various data sources distributed computing and batch processing, this proposal requires real-time data processing which cannot be met by Hadoop alone (Basu, 2014).  Real-time analytics will tremendous value to the healthcare proposed system.  Thus, Apache Spark is another component which is required to implement this proposal (Basu, 2014).  Spark allows in-memory processing for fast response time, bypassing MapReduce operations (Basu, 2014).  With Spark integration with Hadoop, stream processing, machine learning, interactive analytics, and data integration will be possible (Scott, 2015).  Spark will run on top of Hadoop to benefit from YARN and the underlying storage of HDFS, HBase and other Hadoop ecosystem building blocks (Scott, 2015).  Figure 5 shows the core engines of the Spark.


Figure 5. Spark Core Engines (Scott, 2015).

 1.6 Big Healthcare Data Visualization Requirement

Visualization is one of the most powerful presentations of the data (Jayasingh, Patra, & Mahesh, 2016).  It helps in viewing the data in a more meaningful way in the form of graphs, images, pie charts that can be understood easily.  It helps in synthesizing a large volume of data set such as healthcare data to get at the core of such raw big data and convey the key points from the data for insight (Meyer, 2018).  Some of the commercial visualization tools include Tableau, Spotfire, QlikView, and Adobe Illustrator.  However, the most commonly used visualization tools in healthcare include Tableau, PowerBI, and QlikView. This healthcare design proposal will utilize Tableau. 

Healthcare providers are successfully transforming data from information to insight using Tableau software.  Healthcare organizations can utilize three approaches to get more from the healthcare datasets.  The first approach is to break the data access by empowering the departments in healthcare to explore their data.  The second approach is to uncover answers with data from multiple systems to reveal trends and outliers.  The third approach is to share insights with executives, providers, and others to drive collaboration (Tableau, 2011).  It has several advantages including the interactive visualization using drag-n-drop techniques, handling large amounts of data and millions of rows of data with ease, and other scripts such as Python can be integrated with Tableau (absentdata.com, 2018).  It also provides mobile support and responsive dashboard.  The limitation of Tableau is that it requires substantial training to fully master the platform, among other limitations including lack of automatic refreshing,  conditional formatting and 16-column table limit (absentdata.com, 2018).   Figure 6 shows the Patient Cycle Time data visualization using Tableau software.


Figure 6. Patient Cycle Time Data Visualization Example (Tableau, 2011).

1.7 Artificial Intelligence Integration Requirement

Artificial Intelligence is a computational technique allowing machines to perform cognitive functions such as acting or reacting to input, similar to the way humans do (Patrizio, 2018).  The traditional computing applications react to data, and the reactions and responses must be hand-coded with human intervention (Patrizio, 2018).  The AI systems are continuously in a flux mode changing their behavior to accommodate any changes in the results and modifying their reactions accordingly (Patrizio, 2018). The AI techniques can include video recognition, natural language processing, speech recognition, machine learning engines, and automation (Mills, 2018)

Healthcare system can benefit from BDA integration with Artificial Intelligence (AI) (Bresnick, 2018).  Since AI can play a significant role in BDA in the healthcare system, this proposal suggests the implementation of machine learning which is part of the AI to deploy more precise and impactful interventions at the right time in the care of patients (Bresnick, 2018).  The application of AI in the proposed design requires machine learning (Patrizio, 2018).  Since the data used in the AI and machine learning is already cleaned after removing the duplicates and unnecessary data, AI can take advantages of these filtered data leading to many healthcare breakthroughs such as genomic and proteomic experiments to enable personalized medicine (Kersting & Meyer, 2018).

The healthcare industry has been utilizing AI, machine learning (ML) and data mining (DM) to extract value from BD by transforming the large medical datasets into actionable knowledge performing predictive and prescriptive analytics (Palanisamy & Thirunavukarasu, 2017).   The ML will be used to utilize the AI to develop sophisticated algorithm processing massive medical datasets including the structured, unstructured, and semi-structured data performing advanced analytics (Palanisamy & Thirunavukarasu, 2017).  Apache Mahout, which is an open source for ML, will be integrated with Hadoop to facilitate the execution of scalable machine learning algorithms, offering various techniques such as recommendation, classification, and clustering (Palanisamy & Thirunavukarasu, 2017).

1.8 Internet of Things (IoT) Integration Requirement

Internet of Things (IoT) refers to the increased connected devices with IP addresses which were not common years ago  (Anand & Clarice, 2015; Thompson, 2017).  These connected devices collect and use the IP addresses to transmit information (Thompson, 2017).    Providers in healthcare take advantages of the collected information to find new treatment methods and increase efficiency (Thompson, 2017).

The implementation of IoT will involve various technologies including frequency identification (RFID), near field communication (NFC), machine to machine (M2M), wireless sensor network (WSM), and addressing schemes (AS) (IPv6 addresses) (Anand & Clarice, 2015; Kumari, 2017).  The implementation of IoT requires machine learning and algorithm to find patterns, correlations, and anomalies that have the potential of enabling healthcare improvements (O’Brien, 2016).  Machine learning is a critical component of artificial intelligence. Thus, the success of IoT depends on AI implementation. 

1.9 Training Requirement

This design proposal requires various training to IT professionals, providers and clinician and those who will be using this healthcare ecosystem depending on their role (Alexandru et al., 2016; Archenaa & Anita, 2015). Each component of this ecosystem should have training such as training for Hadoop/MapReduce, Spark, Security, and so forth.  The training will play a significant role in the success of this design implementation to apply BD and BDA in the healthcare system in the four States of Colorado, Utah, Arizona, and New Mexico.   Patients should be considered in training for remote monitoring programs such as blood sugar monitoring, and blood pressure monitoring applications.  The senior generation might face some challenges.  However, with the technical support, this challenge can be alleviated.

2.      Data Flow Diagram

            This section discusses the data flow for the proposed design for the healthcare ecosystem for the application of BDA. 

2.1 HBase Cluster and HDFS Data Flow

HBase stores data into table schema and specify the column family (Yang, Liu, Hsu, Lu, & Chu, 2013).  The table schema must be predefined, and the column families must be specified.  New columns can be added to families as required making the schema-flexible and can adapt to changing application requirements (Yang et al., 2013).   HBase is developed in a similar way like HDFS with a NameNode and slave nodes, and MapReduce with JobTracker and TaskTracker slaves (Yang et al., 2013).  HBase will play a vital role in the cluster environment of Hadoop system.  In HBase master node called HMaster will manage the cluster, and region servers store portions of the tables and perform the work on the data. The HMaster reflects the Master Server and is responsible for monitoring all RegionServer instances in the cluster and is the interface for all metadata changes.  This Master executes on the NameNode in the distributed cluster Hadoop environment.  The HRegionServer represents the RegionServer and is responsible for serving and managing regions.  The RegionServer runs on a DataNode in the distributed cluster Hadoop environment.   The ZooKeeper will assist other machines are selected within the cluster as HMaster in case of a failure, unlike HDFS framework where NameNode has a single point of availability issue.  Thus, the data flow between the DataNodes and the NameNodes when integrating HBase on top of HDFS is shown in Figure 7.  


Figure 7.  HBase Cluster Data Flow (Yang et al., 2013).

2.2 HBase and MongoDB with Hadoop/MapReduce and HDFS Data Flow

The healthcare system integrates four significant components such as HBase, MongoDB, MapReduce, and Visualization.  HBase is used for data storage, MongoDB is used for metadata, MapReduce using Hadoop for computation, and data visualization tool.  The signal data will be stored in HBase while the metadata and other clinical data will be stored in MongoDB.  The data stored in both HBase and MongoDB will be accessible from the Hadoop/MapReduce environment for processing and the data visualization layer as well.   One master node and eight slave nodes, and several supporting servers.   The data will be imported to Hadoop and processed via MapReduce.  The result of the computational process will be viewed through a data visualization tool such as Tableau.  Figure 8 shows the data flow between these four components of the proposed healthcare ecosystem.


Figure 8.  The Proposed Data Flow Between Hadoop/MapReduce and Other Databases.

2.3 XML Design Flow Using ETL Process with MongoDB 

Healthcare records have various types of data from structured, semi-structured to unstructured (Luo et al., 2016).   Some of these healthcare records are XML-based records in the semi-structured format using tags.  XML stands for eXtensible Markup Language (Fawcett, Ayers, & Quin, 2012).  Healthcare sector can drive value from these XML documents which reflect semi-structured data (Aravind & Agrawal, 2014).  Example of this XML-based patients records shows in Figure 9.


Figure 9.  Example of the Patient’s Electronic Health Record (HL7, 2011)

XML-based records need to get ingested into Hadoop system for the analytical purpose to derive value from this semi-structured XML-based data.   However, Hadoop does not offer a standard XML “RecordReader” (Lublinsky, Smith, & Yakubovich, 2013).  XML is one of the standard file formats for MapReduce.  Various approaches can be used to process XML semi-structured data.  The process of ETL (Extract, Transform and Load) can be used to process XML data in Hadoop.  MongoDB is a NoSQL database which is required in this design proposal.  It handles XML document-oriented type. 

The ETL process in MongoDB starts with the extract and transform.  The MongoDB application provides the ability to map the XML elements within the document to the downstream data structure.  The application supports the ability to unwind simple arrays or present embedded documents using appropriate data relationships such as one-to-one (1:1), one-to-many (1: M), or many-to-many (M: M) (MongoDB, 2018).  The application infers the schema information by examining a subset of documents within target collections.  Organizations can add fields to the discovered data model that may not have been present within the subset of documents used for schema inference.  The application infers information about the existing indexes for collections to be queried.  It prompts or warns of queries that do not contain any indexes fields.  The application can return a subset of fields from documents using query projections.  For queries against MongoDB Replica Sets, the application supports the ability to specify custom MongoDB Read Preferences for individual query operations.  The application then infers information about sharded cluster deployment and note the shard key fields for each sharded collection.  For queries against MongoDB Sharded Clusters, the application warns against queries that do not use proper query isolation.  Broadcast queries in a sharded cluster can have a negative impact on database performance (MongoDB, 2018). 

The load process in MongoDB is performed after the extract and transform process.  The application supports the ability to write data to any MongoDB deployment whether a single node, replica set or sharded cluster.  For writes to a MongoDB Sharded Cluster, the application informs or display an error message to the user if XML documents do not contain a shard key.  A custom WriteConcern can be used for any write operations to a running MongoDB deployment.  For the bulk loading operations, writing documents in batches using the insert() method can be used using the MongoDB 2.6 version or above, which supports the bulk update database command. For the bulk loading into a MongoDB sharded deployment, the bulk insert into a sharded collection is supported, including the pre-splitting of the collections’ shard key and inserting via multiple mongos processes.   Figure 10 shows this ETL process for XML-based patients records using MongoDB.


Figure 10.  The Proposed XML ETL Process in MongoDB.

2.4 Real-Time Streaming Spark Data Flow

Real-Time streaming can be implemented using any real-time streaming program such as Spark, Kafka, or Storm.  This healthcare design proposal will integrate Spark open-source program for the real-time streaming data such as sensing data, from various sources such as intensive care units, remote monitoring programs, biomedical signals. The data from various sources will be flow into Spark for analytics and then imported to the data storage systems.  Figure 11 illustrates the data flow for real-time streaming analytics.

Figure 11.  The Proposed Spark Data Flow.

3.      Communication Workflow

The communication flow involves the stakeholders involves in the healthcare system. These stakeholders include providers, insurer, pharmaceutical, and IT professionals and practitioners.  The communication flow is centered with the patient-centric healthcare system using the cloud computing technology for the four States of Colorado, Utah, Arizona, and New Mexico.  These stakeholders are from these states.  The patient-centric healthcare system is the central point for communication.  The patients communicate with the central system using the web-based platform, and clinical forums as needed.  The providers communicate with the patient-centric healthcare system using resource usages, patient feedback, and hospital visits, and services details.  The insurers communicate with the central system using claims database, and census and societal data. The pharmaceutical vendors will communicate with the central system using prescription and drug reports which can be retrieved by the providers from anywhere in these four states. The IT professionals and practitioners will communicate with the central system for data streaming, medical records, genomics, and all omics data analysis and reporting.  Figure 12 shows the communication flow between these stakeholders and the central system in the cloud that can be accessed from any of these identified four States.

Figure 12.  The Proposed Patient-Centric Healthcare System Communication Flow.

4.      Overall System Diagram

The overall system represents the state-of-the-art healthcare ecosystem system that utilizes the latest technology for healthcare Big Data Analytics. The system is bounded by the regulations and policy such as HIPAA to ensure the protection of the patients’ privacy across the various layers of the overall system.  The system integrated components include the Hadoop latest technology with MapReduce and HDFS.  The data government layer is the bottom layer which contains three major building blocks:  master data management (MDM), data life-cycle management (DLM) components, and data security and privacy management.  The MDM component is responsible for data completeness, accuracy, and availability, while the DLM is responsible for archiving the data, maintaining the data warehousing, data deletion, and disposal.   The data security and privacy management building block is responsible for sensitive data discovery, vulnerability and configuration assessment, security policies application, auditing and compliance reporting, activity monitoring, identify and access management, and protecting data.  The top layers include data layer, data aggregation layer, data analytics layer, and information exploration layer.  The data layer is responsible for data sources and content format, while the data aggregation layer involves various components from data acquisition process, transformation engines, and data storage area using Hadoop, HDFS, NoSQL databases such as MongoDB and HBase.  The data analytics layer involves the Hadoop/MapReduce mapping process, stream computing, real-time streaming, and database analytics.  AI and IoT are part of the data analytics layer.  The information exploration layer involves the data visualization layer, visualization reporting, real-time monitoring using healthcare dashboard, and clinical decision support. Figure 13 illustrates the overall system diagram with these layers.


Figure 13.  The Proposed Healthcare Overall System Diagram.

5.      Regulations, Policies, and Governance for the Medical Industry

Healthcare data must be stored in a secure storage area to protect the information and the privacy of patients (Liveri, Sarri, & Skouloudi, 2015).  When the healthcare industry fails to comply with the regulation and policies, the fines and the cost can cause financial stress on the industry (Thompson, 2017).  Records showed that the healthcare industry paid millions of dollars in fines.  The Advocate Health Care in suburban Chicago agreed to the most significant figure as of August 2016 with a total amount of $5.55 million (Thompson, 2017).  Memorial Health System in southern Florida became the second entity to top of paying $5 million (Thompson, 2017). Table 2 shows the five most substantial fines posted to the Office of Civil Rights (OCR) site. 

Table 2.  Five Largest Fines Posted to OCR Web Site (Thompson, 2017)

The hospitals must adhere to the data privacy regulations and legislative rules carefully to protect the patients’ medical records from data breaches (HIPAA).  The proper security policy and risk management must be implemented to ensure the protection of private information as well to minimize the impact of confidential data in case of loss or theft (HIPAA, 2018a, 2018c; Salido, 2010).  The healthcare system design proposal requires the implementation of a system for those hospitals or providers who are not compliant with the regulation and policies and the escalation path (Salido, 2010).  This design proposal implements four major principles as the best practice to comply with required policies and regulation and protect the confidential data assets of the patients and users (Salido, 2010).  The first principle is to honor policies throughout private data life (Salido, 2010).  The second principle for best practice in healthcare design system is to minimize the risk of unauthorized access or misuse of confidential data (Salido, 2010).  The third principle is to minimize the impact of confidential data loss, while the fourth principle is to document appropriate controls and demonstrate their effectiveness (Salido, 2010).  Figure 14 shows these four principles which this healthcare design proposal adheres to ensure protection healthcare data from unauthorized users and comply with the required regulation and policies. 


Figure 14.  Healthcare Design Proposal Four Principles.

6.      Assumptions and Limitations

This design proposal assumes that the healthcare sector in the four States will support the application of BD and BDA across these fours States.  The support includes investment in the proper technology, proper tools and proper training based on the requirements of this design proposal.  The proposal also assumes that the stakeholders including the providers, patients, insurer, pharmaceutical vendors, and practitioners will welcome the application of BDA to take advantages of it to provide efficient healthcare services, increase productivity, decrease costs for healthcare sector as well as for patients, and provide better care to patients.

            The limitation of this proposal is the timeframe that is required to implement it.  With the support of the healthcare sector from these four States, the implementation can be expedited.  However, the silo and the rigid culture of the healthcare may interfere with the implementation which can take longer than expected.   The initial implementation might face unexpected challenges. However, these unexpected challenges will come from the lack of experienced IT professionals and managers in the field of BD and BDA domain.  This design proposal will be enhanced based on the observations from the first few months of the implementation. 

7.      The justification for Overall Design

            The traditional database and analytical systems are found inadequate when dealing with healthcare data in the age of BDA.  The characteristics of the healthcare datasets including the large volume medical records, the variety of the dataset from structured, to semi-structured, to the unstructured dataset, and the velocity of the dataset generation and the data processing requires technology such as cloud computing (Fernández et al., 2014). Cloud computing is found the best solution when dealing with BD and BDA to address the challenges of BD storage, and the intensive-computing processing demands (Alexandru et al., 2016; Hashem et al., 2015).  The healthcare system in the four States will shift the communication technology and services for applications across the hospitals and providers (Hashem et al., 2015).  Some of the advantages of cloud computing adoption include virtualized resources, parallel processing, security and data service integration with scalable data storage (Hashem et al., 2015).  With the cloud computing technology, the healthcare sector in the four States will reduce the cost, and increase the efficiency (Hashem et al., 2015).  When quick access to critical data for patients care is required quickly, the mobility of accessing the data from anywhere is one of the most significant advantages of the cloud computing adoption as recommended by this proposed design  (Carutasu, Botezatu, Botezatu, & Pirnau, 2016). The benefits of cloud computing include technological benefits such as visualization, multi-tenancy, data and storage, security and privacy compliance (Chang, 2015).  The cloud computing also offers economic benefits such as pay per use, cost reduction, return on investment (Chang, 2015).  The non-functional benefits of the cloud computing cover the elasticity, quality of service, reliability, and availability (Chang, 2015).  Thus, the proposed design justifies the use of cloud computing for several benefits as cloud computing is proven the best technology for BDA especially for healthcare data analytics.

            Although cloud computing offers several benefits to the proposed healthcare system, cloud computing has been suffering from security and privacy concerns (Balasubramanian & Mala, 2015; Kazim & Zhu, 2015).  The security concerns involve risk areas such as external data storage, dependency on the public internet, lack of control, multi-tenancy and integration with internal security (Hashizume, Rosado, Fernández-medina, & Fernandez, 2013). The traditional security techniques such as identity, authentication, and authorization are not sufficient for cloud computing environments in their current forms using the standard deployment models of the public cloud, and private cloud  (Hashizume et al., 2013).  The increasing trend in the security threats data breaches, and the current deployment models of private and public clouds, which are not meeting the security challenges, have triggered the need for another deployment to ensure security and privacy protection.  Thus, the VPC deployment model which is a new deployment model of cloud computing technology (Botta et al., 2016; Sultan, 2010; Venkatesan, 2012; Zhang, Q., Cheng, & Boutaba, 2010).  The VPC is taking advantages of technologies such as a virtual private network (VPN) which will allow hospitals and providers to set up their required network settings such as security (Botta et al., 2016; Sultan, 2010; Venkatesan, 2012; Zhang, Q. et al., 2010).  The VPC deployment model will have dedicated resources with the VPN to provide the required isolation for security to protect the patients’ information (Botta et al., 2016; Sultan, 2010; Venkatesan, 2012; Zhang, Q. et al., 2010). Thus, this proposed design will be using VPC cloud computing deployment mode to store and use healthcare data in a secure and isolated environment to protect the patients’ medical records (Regola & Chawla, 2013).

Hadoop ecosystem is a required component in this proposed design for several reasons.  Hadoop technology is a commonly used computing paradigm for massive volume data processing in the cloud computing (Bansal et al., 2014; Chrimes et al., 2018; Dhotre et al., 2015).  Hadoop is the only technology that enables large healthcare volumes of data to be stored in its native forms (Dezyre, 2016).  Hadoop is proven to develop better treatments for diseases such as cancer by accelerating the design and testing of effective treatments tailored to patients, expanding genetically based clinical cancer trials, and establishing a national cancer knowledge network to guide treatment decision (Dezyre, 2016).  With Hadoop system, hospitals in the four States will be able to monitor the patient vitals (Dezyre, 2016).  The Children’s Healthcare of Atlanta is an example of using the Hadoop ecosystem to treat over six thousand children in their ICU units (Dezyre, 2016).

The proposed design requires the integration of NoSQL database because it offers benefits such as mass storage support, reading and writing operations which are fast, and the expansion is easy with a low cost (Sahafizadeh & Nematbakhsh, 2015). HBase is proposed as a required NoSQL database as it is faster when reading more than six million variants which are required when analyzing large healthcare datasets (Luo et al., 2016).  Besides, query engine such as SeqWare can be integrated with HBase as needed to help bioinformatics researchers access large-scale whole-genome datasets (Luo et al., 2016).  HBase can store clinical sensors where the row key serves as the time stamp of a single value, and the column stores patients’ physiological values that correspond with the row key time stamp (Luo et al., 2016). HBase is scalable, high-performance and low-cost NoSQL data store that can be integrated with Hadoop sitting on top of HDFS (Yang et al., 2013). As a column-oriented NoSQL data store that runs on top of HDFS of Hadoop ecosystem, HBase is well suited to parse the healthcare large data sets (Yang et al., 2013). HBase supports applications written in Avro, REST and Thrift (Yang et al., 2013).  MongoDB is another NoSQL data store, which will be used to store metadata to improve the accessibility and readability of the HBase data schema (Luo et al., 2016).

The integration of Spark is required in order to overcome the Hadoop limitation of real-time data processing because Hadoop is not optimal for real-time data processing (Guo, 2013).  Thus, Apache Spark is a required component to implement this proposal so that the healthcare BDA system can take advantages of data processing at rest using the batching technique as well as a motion using the real-time processing technique (Liang & Kelemen, 2016).  Spark allows in-memory processing for fast response time, bypassing MapReduce operations (Liang & Kelemen, 2016).   Spark is a high integration to the recent Hadoop cluster deployment (Scott, 2015).  While Spark is a powerful tool on its own for processing a large volume of medical and healthcare datasets, Spark is not well-suited for production workload.  Thus, the integration of Spark with Hadoop ecosystem provides many capabilities which Spark cannot offer on its own, and Hadoop cannot offer on its own.

The integration of AI as part of this proposal is justified by the examination of Harvard Business Review (HBR) that shows ten promising AI application in healthcare (Kalis, Collier, & Fu, 2018). The findings of HBR’s examination showed that the application of AI could create up to $150 billion in annual savings for U.S. healthcare by 2026 (Kalis et al., 2018).  The result also showed that AI currently creates the most value in assisting the frontline clinicians to be more productive and in making back-end processes more efficient (Kalis et al., 2018).   Furthermore, IBM invested $1 billion in AI through the IBM Watson Group, and healthcare industry is the most significant application of Watson (Power, 2015).

Conclusion

Big Data and Big Data Analytics have played significant roles in various industries including the healthcare industry.  The value that is driven by BDA can save lives and minimize costs for patients.  This project proposes a design to apply BDA in the healthcare system across four States of Colorado, Utah, Arizona, and New Mexico.  Cloud computing is the most appropriate technology to deal with the large volume of healthcare data.  Due to the security issue of the cloud computing, the Virtual Private Cloud (VPC) will be used.  VPC provides a secure cloud environment using network traffic security setup using security groups and network access control lists. 

The project requires other components to be fully implemented using the latest technology such as Hadoop and MapReduce for data streaming processing, machine learning for artificial intelligence, which will be used for Internet of Things (IoT).  The NoSQL database HBase and MongoDB will be used to handle the semi-structured data such as XML and unstructured data such as logs and images.  Spark will be used for real-time data processing which can be vital for urgent care and emergency services.  This project addressed the assumptions and limitations plus the justification for selecting these specific components. 

In summary, all stakeholders in the healthcare sector including providers, insurers, pharmaceuticals, practitioners should cooperate and coordinate to facilitate the implementation process.  All stakeholders are responsible to facilitate the integration of BD and BDA into the healthcare system.  The rigid culture and silo pattern need to change for better healthcare system which can save millions of dollars to the healthcare industry and provide excellent care to the patients at the same time.

References

Abdul, A. M., Jena, S., Prasad, S. D., & Balraju, M. (2014). Trusted Environment In Virtual Cloud. International Journal of Advanced Research in Computer Science, 5(4).

Abernathy, R., & McMillan, T. (2016). CISSP Cert Guide: Pearson IT Certification.

absentdata.com. (2018). Tableau Advantages and Disadvantages. Retrieved from https://www.absentdata.com/advantages-and-disadvantages-of-tableau/.

Alexandru, A., Alexandru, C., Coardos, D., & Tudora, E. (2016). Healthcare, Big Data and Cloud Computing. management, 1, 2.

Alguliyev, R., & Imamverdiyev, Y. (2014). Big data: big promises for information security. Paper presented at the Application of Information and Communication Technologies (AICT), 2014 IEEE 8th International Conference on.

Anand, M., & Clarice, S. (2015). Artificial Intelligence Meets Internet of Things. Retrieved from http://www.ijcset.net/docs/Volumes/volume5issue6/ijcset2015050604.pdf.

Ankam, V. (2016). Big Data Analytics: Packt Publishing Ltd.

Aravind, P. S., & Agrawal, V. (2014). Processing XML data in BigInsights 3.0. Retrieved from https://developer.ibm.com/hadoop/2014/10/31/processing-xml-data-biginsights-3-0/.

Archenaa, J., & Anita, E. M. (2015). A survey of big data analytics in healthcare and government. Procedia Computer Science, 50, 408-413.

Assunção, M. D., Calheiros, R. N., Bianchi, S., Netto, M. A. S., & Buyya, R. (2015). Big Data Computing and Clouds: Trends and Future Directions. Journal of Parallel and Distributed Computing, 79, 3-15. doi:10.1016/j.jpdc.2014.08.003

Balasubramanian, V., & Mala, T. (2015). A Review On Various Data Security Issues In Cloud Computing Environment And Its Solutions. Journal of Engineering and Applied Sciences, 10(2).

Bansal, A., Deshpande, A., Ghare, P., Dhikale, S., & Bodkhe, B. (2014). Healthcare data analysis using dynamic slot allocation in Hadoop. International Journal of Recent Technology and Engineering, 3(5), 15-18.

Basu, A. (2014). Real-Time Healthcare Analytics on Apache Hadoop* using Spark* and Shark. Retrieved from https://www.intel.com/content/dam/www/public/us/en/documents/white-papers/big-data-real-time-healthcare-analytics-whitepaper.pdf.

Botta, A., de Donato, W., Persico, V., & Pescapé, A. (2016). Integration of Cloud Computing and Internet Of Things: a Survey. Future Generation computer systems, 56, 684-700.

Bresnick, J. (2018). Top 12 Ways Artificial Intelligence Will Impact Healthcare. Retrieved from https://healthitanalytics.com/news/top-12-ways-artificial-intelligence-will-impact-healthcare.

Carutasu, G., Botezatu, M., Botezatu, C., & Pirnau, M. (2016). Cloud Computing and Windows Azure. Electronics, Computers and Artificial Intelligence.

Chang, V. (2015). A Proposed Framework for Cloud Computing Adoption. International Journal of Organizational and Collective Intelligence, 6(3).

Chrimes, D., Zamani, H., Moa, B., & Kuo, A. (2018). Simulations of Hadoop/MapReduce-Based Platform to Support its Usability of Big Data Analytics in Healthcare.

Cloud Security Alliance. (2013). The Notorious Nine: Cloud Computing Top Threats in 2013. Cloud Security Alliance: Top Threats Working Group. 

Cloud Security Alliance. (2016). The Treacherous 12: Cloud Computing Top Threats in 2016. Cloud Security Alliance: Top Threats Working Group. 

Cloud Security Alliance. (2017). The Treacherous 12 Top Threats to Cloud Computing. Cloud Security Alliance: Top Threats Working Group. 

Dezyre. (2016). 5 Healthcare Applications of Hadoop and Big Data Retrieved from https://www.dezyre.com/article/5-healthcare-applications-of-hadoop-and-big-data/85.

Dhotre, P., Shimpi, S., Suryawanshi, P., & Sanghati, M. (2015). Health Care Analysis Using Hadoop. Internationaljournalofscientific&tech nologyresearch, 4(12), 279r281.

EMC. (2015). Data Science and Big Data Analytics: Discovering, Analyzing, Visualizing and Presenting Data. (1st ed.): Wiley.

Fawcett, J., Ayers, D., & Quin, L. R. (2012). Beginning XML: John Wiley & Sons.

Fernández, A., del Río, S., López, V., Bawakid, A., del Jesus, M. J., Benítez, J. M., & Herrera, F. (2014). Big Data with Cloud Computing: An Insight on the Computing Environment, MapReduce, and Programming Frameworks. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 4(5), 380-409. doi:10.1002/widm.1134

Fox, M., & Vaidyanathan, G. (2016). Impacts of Healthcare Big Data:  A Framwork With Legal and Ethical Insights. Issues in Information Systems, 17(3).

Ghani, K. R., Zheng, K., Wei, J. T., & Friedman, C. P. (2014). Harnessing big data for health care and research: are urologists ready? European urology, 66(6), 975-977.

Grover, M., Malaska, T., Seidman, J., & Shapira, G. (2015). Hadoop Application Architectures: Designing Real-World Big Data Applications: ” O’Reilly Media, Inc.”.

Groves, P., Kayyali, B., Knott, D., & Kuiken, S. V. (2016). The ‘Big Data’ Revolution in Healthcare: Accelerating Value and Innovation.

Guo, S. (2013). Hadoop operations and cluster management cookbook: Packt Publishing Ltd.

Gupta, R., Gupta, H., & Mohania, M. (2012). Cloud Computing and Big Data Analytics: What is New From Databases Perspective? Paper presented at the International Conference on Big Data Analytics, Springer-Verlag Berlin Heidelberg.

Hashem, I. A. T., Yaqoob, I., Anuar, N. B., Mokhtar, S., Gani, A., & Khan, S. U. (2015). The Rise of “Big Data” on Cloud Computing: Review and Open Research Issues. Information Systems, 47, 98-115. doi:10.1016/j.is.2014.07.006

Hashizume, K., Rosado, D. G., Fernández-medina, E., & Fernandez, E. B. (2013). An analysis of security issues for cloud computing. Journal of internet services and applications, 4(1), 1-13. doi:10.1186/1869-0238-4-5

HIMSS. (2018). 2017 Security Metrics:  Guide to HIPAA Compliance: What Healthcare Entities and Business Associates Need to Know. . Retrieved on 12/1/2018 from  http://www.himss.org/file/1318331/download?token=h9cBvnl2. 

HIPAA. (2018a). At Least 3.14 Million Healthcare Records Were Exposed in Q2, 2018. Retrieved 11/22/2018 from https://www.hipaajournal.com/q2-2018-healthcare-data-breach-report/. 

HIPAA. (2018b). How to Defend Against Insider Threats in Healthcare. Retrieved 8/22/2018 from https://www.hipaajournal.com/category/healthcare-cybersecurity/. 

HIPAA. (2018c). Q3 Healthcare Data Breach Report: 4.39 Million Records Exposed in 117 Breaches. Retrieved 11/22/2018 from https://www.hipaajournal.com/q3-healthcare-data-breach-report-4-39-million-records-exposed-in-117-breaches/. 

HIPAA. (2018d). Report: Healthcare Data Breaches in Q1, 2018. Retrieved 5/15/2018 from https://www.hipaajournal.com/report-healthcare-data-breaches-in-q1-2018/. 

HL7. (2011). Patient Example Instance in XML.  

Hu, H., Wen, Y., Chua, T., & Li, X. (2014). Toward Scalable Systems for Big Data Analytics: A Technology Tutorial. Practical Innovation, Open Solution, 2, 652-687. doi:10.1109/ACCESS.2014.2332453

InformationBuilders. (2018). Data In Motion – Big Data Analytics in Healthcare. Retrieved from http://docs.media.bitpipe.com/io_10x/io_109369/item_674791/datainmotionbigdataanalytics.pdf, White Paper.

Jayasingh, B. B., Patra, M. R., & Mahesh, D. B. (2016, 14-17 Dec. 2016). Security issues and challenges of big data analytics and visualization. Paper presented at the 2016 2nd International Conference on Contemporary Computing and Informatics (IC3I).

Ji, Z., Ganchev, I., O’Droma, M., Zhang, X., & Zhang, X. (2014). A cloud-based X73 ubiquitous mobile healthcare system: design and implementation. The Scientific World Journal, 2014.

Kalis, B., Collier, M., & Fu, R. (2018). 10 Promising AI Applications in Health Care. Retrieved from https://hbr.org/2018/05/10-promising-ai-applications-in-health-care, Harvard Business Review.

Karanth, S. (2014). Mastering Hadoop: Packt Publishing Ltd.

Kazim, M., & Zhu, S. Y. (2015). A Survey on Top Security Threats in Cloud Computing. International Journal Advanced Computer Science and Application, 6(3), 109-113.

Kersting, K., & Meyer, U. (2018). From Big Data to Big Artificial Intelligence? : Springer.

Klein, J., Gorton, I., Ernst, N., Donohoe, P., Pham, K., & Matser, C. (2015, June 27 2015-July 2 2015). Application-Specific Evaluation of No SQL Databases. Paper presented at the 2015 IEEE International Congress on Big Data.

Kritikos, K., Kirkham, T., Kryza, B., & Massonet, P. (2017). Towards a Security-Enhanced PaaS Platform for Multi-Cloud Applications. Future Generation computer systems, 67, 206-226. doi:10.1016/j.future.2016.10.008

Kumari, W. M. P. (2017). Artificial INtelligence Meets Internet of Things.

Liang, Y., & Kelemen, A. (2016). Big Data Science and its Applications in Health and Medical Research: Challenges and Opportunities. Austin Journal of Biometrics & Biostatistics, 7(3).

Liveri, D., Sarri, A., & Skouloudi, C. (2015). Security and Resilience in eHealth: Security Challenges and Risks. European Union Agency For Network And Information Security.

Lublinsky, B., Smith, K. T., & Yakubovich, A. (2013). Professional hadoop solutions: John Wiley & Sons.

Luo, J., Wu, M., Gopukumar, D., & Zhao, Y. (2016). Big data application in biomedical research and health care: a literature review. Biomedical informatics insights, 8, BII. S31559.

Malik, L., & Sangwan, S. (2015). MapReduce Framework Implementation on the Prescriptive Analytics of Health Industry. International Journal of Computer Science and Mobile Computing, ISSN, 675-688.

Maltby, D. (2011). Big Data Analytics. Paper presented at the Annual Meeting of the Association for Information Science and Technology.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big Data: The Next Frontier for Innovation, Competition, and Productivity. McKinsey Global Institute.

McKelvey, N., Curran, K., Gordon, B., Devlin, E., & Johnston, K. (2015). Cloud Computing and Security in the Future Guide to Security Assurance for Cloud Computing (pp. 95-108): Springer.

Mehmood, A., Natgunanathan, I., Xiang, Y., Hua, G., & Guo, S. (2016). Protection of Big Data Privacy. Institute of Electrical and Electronic Engineers, 4, 1821-1834. doi:10.1109/ACCESS.2016.2558446

Meyer, M. (2018). The Rise of Healthcare Data Visualization.

Mills, T. (2018). Eight Ways Big Data And AI Are Changing The Business World.

MongoDB. (2018). ETL Best Practice.  

O’Brien, B. (2016). Why The IoT Needs ARtificial Intelligence to Succeed.

Palanisamy, V., & Thirunavukarasu, R. (2017). Implications of Big Data Analytics in developing Healthcare Frameworks–A review. Journal of King Saud University-Computer and Information Sciences.

Patrizio, A. (2018). Big Data vs. Artificial Intelligence.

Power, B. (2015). Artificial Intelligence Is Almost Ready for Business.

Raghupathi, W., & Raghupathi, V. (2014). Big data analytics in healthcare: promise and potential. Health Information Science and Systems, 2(1), 1.

Regola, N., & Chawla, N. (2013). Storing and Using Health Data in a Virtual Private Cloud. Journal of medical Internet research, 15(3), 1-12. doi:10.2196/jmir.2076

Sahafizadeh, E., & Nematbakhsh, M. A. (2015). A Survey on Security Issues in Big Data and NoSQL. Int’l J. Advances in Computer Science, 4(4), 2322-5157.

Salido, J. (2010). Data Governance for Privacy, Confidentiality and Compliance: A Holistic Approach. ISACA Journal, 6, 17.

Scott, J. A. (2015). Getting Started with Spark: MapR Technologies, Inc.

Stewart, J., Chapple, M., & Gibson, D. (2015). ISC Official Study Guide.  CISSP Security Professional Official Study Guide (7th ed.): Wiley.

Sultan, N. (2010). Cloud Computing for Education: A New Dawn? International Journal of Information Management, 30(2), 109-116. doi:10.1016/j.ijinfomgt.2009.09.004

Sun, J., & Reddy, C. (2013). Big Data Analytics for Healthcare. Retrieved from https://www.siam.org/meetings/sdm13/sun.pdf.

Tableau. (2011). Three Ways Healthcare Probiders are transforming data from information to insight. White Paper.

Thompson, E. C. (2017). Building a HIPAA-Compliant Cybersecurity Program, Using NIST 800-30 and CSF to Secure Protected Health Information.

Van-Dai, T., Chuan-Ming, L., & Nkabinde, G. W. (2016, 5-7 July 2016). Big data stream computing in healthcare real-time analytics. Paper presented at the 2016 IEEE International Conference on Cloud Computing and Big Data Analysis (ICCCBDA).

Venkatesan, T. (2012). A Literature Survey on Cloud Computing. i-Manager’s Journal on Information Technology, 1(1), 44-49.

Wang, Y., Kung, L. A., & Byrd, T. A. (2018). Big Data Analytics: Understanding its Capabilities and Potential Benefits for Healthcare Organizations. Technological Forecasting and Social Change, 126, 3-13. doi:10.1016/j.techfore.2015.12.019

Wicklund, E. (2014). ‘Silo’ one of healthcare’s biggest flaws. Retrieved from http://www.healthcareitnews.com/news/silo-one-healthcares-biggest-flaws.

Yang, C. T., Liu, J. C., Hsu, W. H., Lu, H. W., & Chu, W. C. C. (2013, 16-18 Dec. 2013). Implementation of Data Transform Method into NoSQL Database for Healthcare Data. Paper presented at the 2013 International Conference on Parallel and Distributed Computing, Applications and Technologies.

Zhang, Q., Cheng, L., & Boutaba, R. (2010). Cloud Computing: State-of-the-Art and Research Challenges. Journal of internet services and applications, 1(1), 7-18. doi:10.1007/s13174-010-0007-6

Zhang, R., & Liu, L. (2010). Security models and requirements for healthcare application clouds. Paper presented at the Cloud Computing (CLOUD), 2010 IEEE 3rd International Conference on.

Zia, U. A., & Khan, N. (2017). An Analysis of Big Data Approaches in Healthcare Sector. International Journal of Technical Research & Science, 2(4), 254-264.

 

The Relationship Between Internet of Things (IoT) and Artificial Intelligence (AI)

Dr. O. Aly
Computer Science

The purpose of this discussion is to address the relationship between the Internet of Things (IoT) and the Artificial Intelligence (AI), and whether one can be used efficiently without the help from the other.  The discussion begins with the Internet of Things (IoT) and artificial intelligence (AI) overview, followed by the relationship between them. 

Internet of Things (IoT) and Artificial Intelligence Overview

Internet of Things (IoT) refers to the increased connected devices with IP addresses that years ago were not common (Anand & Clarice, 2015; Thompson, 2017).  The connected devices collect and use these IP addresses to transmit information (Thompson, 2017).  Organizations take advantages of the collected information for innovation, enhancing customer service, optimizing processes (Thompson, 2017). Providers in healthcare take advantages of the collected information to find new treatment methods and increase efficiency (Thompson, 2017).

IoT implementation involves various technologies such as radio frequency identification (RFID), near field communication (NFC), machine to machine (M2M), wireless sensor network (WSM), and addressing schemes (AS) (IPv6 addresses) (Anand & Clarice, 2015; Kumari, 2017).   The RFID uses electromagnetic fields to identify and track tags attached to objects.  The NFC is a set of thoughts and technologies where smartphones and other objects want to communicate under IoT.  The M2M is used often for remote monitoring. WSM is a set of a large number of sensors used to monitor environmental conditions.  The AS is the primary tool which is used in IoT and giving IP addresses to each object which wants to communicate (Anand & Clarice, 2015; Kumari, 2017).

Machine learning (ML) is a subset of AI.  Machine learning (ML) involves supervise and unsupervised ML (Thompson, 2017).  In the AI domain, the advances in computer science result in creating intelligent machines that resemble humans in their functions (NMC, 2018).  The access to categories, properties, and relationships between various datasets help develop knowledge engineering allowing computers to simulate the perception, learning, and decision making of human (NMC, 2018).  The ML enables computers to learn without being explicitly programmed (NMC, 2018).  The unsupervised ML and AI allow for security tools such as behavior-based-analytics and anomaly detection (Thompson, 2017).  The neural network of AI help model the biological function of the human brain to interpret and react to specific inputs such as words and tone of voice (NMC, 2018).  The neural networks have been used for voice recognition, and natural language processing (NLP), enabling a human to interact with machines.

The Relationship Between IoT and AI

Various reports and studies have discussed the relationship between IoT and AI.  (O’Brien, 2016) has reported the need of IoT to AI to succeed.  (Jaffe, 2014) suggested the same thing that IoT will not work without AI.  IoT future depends on ML to find patterns, correlations, and anomalies that have the potential of enabling improvement in almost every facet of the daily lives (Jaffe, 2014).

Thus, the success of IoT depends on AI.  IoT follows five necessary steps: sense, transmit, store, analyze and act (O’Brien, 2016). AI plays a significant role in the analyzing step, where the ML which is the subset of AI gets involved in this step.  When ML is applied in the analysis step, it can change the subsequent step of “act” which dictates whether the action has high value or no value to the consumer (O’Brien, 2016).   

(Schatsky, Kumar, & Bumb, 2018) suggested the AI can unlock the potential of IoT. As cited in (Schatsky et al., 2018), Gartner predicts by 2022, more than 80% of enterprise IoT projects will include AI components which are up from only 10% in 2018.  International Data Corp (IDC) predicts by 2019, AI will support “all effective” IoT efforts, and without AI, data from the deployments will have limited value (Schatsky et al., 2018).

Various companies are crafting an IoT strategy to include AI (Schatsky et al., 2018).  Venture capital funding of AI-focused IoT start-ups is growing, while vendors of IoT platforms such as Amazon, GE, IBM, Microsoft, Oracle, and Salesforce are integrating AI capabilities (Schatsky et al., 2018).  The value of AI is the ability to extract insight from data quickly. The ML, which is a subset of AI, enables the automatic identification of patterns and detected anomalies in the data that smart sensors and devices generate (Schatsky et al., 2018).  IoT is expected to combine with the power of AI, blockchain, and other emerging technologies to create the “smart hospitals” of the future (Bresnick, 2018).  Example of AI-powered IoT devices includes automated vacuum cleaners, like that of the iRobot Roomba, smart thermostat solutions, like that of Nest Labs, and self-driving cars, such as that of Tesla Motors (Faggella, 2018; Kumari, 2017).   

Conclusion

This discussion has addressed artificial intelligence (AI) and the internet of things (IoT) and the relationship between them.  Machine learning which is a subset of AI is required for IoT at the analysis phase.  Without this analysis phase, IoT will not provide the value-added insight organizations anticipate.  Various studies and reports have indicated that the success and the future of IoT depend on AI. 

References

Anand, M., & Clarice, S. (2015). Artificial Intelligence Meets Internet of Things. Retrieved from http://www.ijcset.net/docs/Volumes/volume5issue6/ijcset2015050604.pdf.

Bresnick, J. (2018). Internet of Things, AI to Play Key Role in Future Smart Hospitals.

Faggella, D. (2018). Artificial Intelligence Plus the Internet of Things (IoT) – 3 Examples Worth Learning From.

Jaffe, M. (2014). IoT Won’t Work Without Artificial Intelligence.

Kumari, W. M. P. (2017). Artificial Intelligence Meets Internet of Things.

NMC, H. P. (2018). NMC Horizon Report: 2017 Higher Education Edition. Retrieve from https://www.nmc.org/publication/nmc-horizon-report-2017-higher-education-edition/.

O’Brien, B. (2016). Why The IoT Needs ARtificial Intelligence to Succeed.

Schatsky, D., Kumar, N., & Bumb, S. (2018). Bringing the power of AI to the Internet of Things.

Thompson, E. C. (2017). Building a HIPAA-Compliant Cybersecurity Program, Using NIST 800-30 and CSF to Secure Protected Health Information.

The Impact of Artificial Intelligence (AI) on Big Data Analytics (BDA)

Dr. O. Aly
Computer Science

The purpose of this discussion is to discuss the influence of artificial intelligence on big data analytics. As discussed in the previous discussion, Big Data empowers artificial intelligence.  This discussion is about the impact of artificial intelligence in the Big Data Analytics domain.  The discussion begins with artificial intelligence building blocks and big data building blocks, following by the impact of the artificial intelligence in the BDA. 

Artificial Intelligence Building Blocks and Their Impact on BDA

Understanding the building blocks of AI could help understand the impact of AI on BDA.  Various reports and studies have identified various building blocks for AI.  Four building blocks have been identified  In (Chibuk, 2018), four building blocks that are expected to shape the next stage of AI.  The computation methodology is the first building block of AI.  This component is structured in a way to improve the computers move from binary to infinite connections. The storage of the information is the second building block of AI improving storing and accessing data in the more efficient form.  Brain-computer interface is the third building block of AI, through which the human minds would speak silently with a computer, and our thought would turn into actions.  The mathematics and algorithms form the last building block of AI to include advanced mathematics called capsule network and having networks to teach each other based on rules defined (Chibuk, 2018). 

(Rao, 2017) has identified five fundamental building blocks for AI in the banking sector, while they can be easily applicable to other sectors. Machine learning (ML) is the first component of AI in banking where the software can learn on its own without being programmed and adjust its algorithms to respond to new insights. The data mining algorithms hand over findings to a human for further work, while machine learning can act on its own (Rao, 2017).  The financial and banking industry can benefit from machine learning for fraud detection, security settlement and alike (Rao, 2017).  The deep learning (DL) is another building block of AI in the banking industry (Rao, 2017).  DL can leverage a hierarchy of artificial neural networks, similar to the human brain to do its job.  DL mimics the human brain to perform non-linear deductions, unlike the linearly traditional programs (Rao, 2017).  DL can produce better decisions by factoring learning from previous transactions or interactions to conclude (Rao, 2017).  Example of DL is the collected information about customers and their behaviors from social networks, from which their likes and preferences can be inferred, and financial institutions can utilize this insight to make contextual, relevant offers to those customers in real-time (Rao, 2017).   Natural language process (NLP) is the third building block for AI in banking (Rao, 2017).  NLP is a key building block in AI to help computers learn, analyze and understand human language (Rao, 2017).  NLP can be used to organize and structure knowledge in order to answer queries, translate content from one language to another, recognize people by their speech, mine text, and perform sentiment analysis (Rao, 2017). The natural language generation (NLG) is another essential building block in AI, which can help computers analyze, understand, and make sense of human language (Rao, 2017).  It can help converse and interact intelligently with humans (Rao, 2017).  NLG can transform raw data into a narrative, which banks such as Credit Suisse are using to generate portfolio review (Rao, 2017).  Visual recognition is the last component of AI which help recognize images and their content (Rao, 2017). It uses DL to perform its role of finding faces, tagging images, identifying the components of visuals, and picking out similar images from a large dataset (Rao, 2017). Various banks such as Australia’s Westpac is using this technology to allow customers to activate their new card from their smartphone camera, and Bank of America, Citibank, Wells Fargo, and TD Bank are using this technology of visual recognition to allow customers to deposit checks remotely via mobile app (Rao, 2017).

(Gerbert, Hecker, Steinhäuser, & Ruwolt, 2017) have identified ten building blocks for AI.  They have suggested that the simplest AI use cases often consist of a single building block. However, they often evolve to combine two or more blocks over time (Gerbert et al., 2017).  The machine vision is one of the building blocks of AI. The machine vision building block of AI is the classification and tracking of real-world objects based on visual, x-ray, laser or other signals.  The quality of machine vision depends on the labels of a large number of reference images which is performed by a human (Gerbert et al., 2017).  Video-based computer vision is anticipated to recognize actions and predict motions within the next five years (Gerbert et al., 2017).  The speech recognition is another building block which involves the transformation of auditory signals into text (Gerbert et al., 2017).  Siri and Alexa can identify most words in a general vocabulary, but as vocabulary becomes specific, tailored programs such as the PowerScribe of Nuance for radiologist will be needed (Gerbert et al., 2017).  Information processing building block of AI involves searching billions of documents or constructing basic knowledge graphs identifying relationships in text.  This building block is closely related to NLP, which is also identified as another building block of AI (Gerbert et al., 2017).  NLP can provide basic summaries of text and infer intent in some instances (Gerbert et al., 2017). Learning from data is another component of AI, which is a machine learning and able to predict values or classify information based on historical data (Gerbert et al., 2017).  While ML is an element in AI building blocks of machine vision and NLP, it is also a separate building block of AI (Gerbert et al., 2017).  Other building blocks of AI include the planning and exploring agents that can help identify the best sequence of actions to achieve certain goals.  Self-driving cars rely on this building clock for navigation (Gerbert et al., 2017).  The image generation is another building block of AI, which is the opposite of machine vision block, as it creates images based on models.  Speech generation is another building block of AI which covers both data-based text generation and text-based speech synthesis. The handling and control building block of AI refers to interactions with real-world objects (Gerbert et al., 2017). The navigating and movement building block of AI covers the ways where robots move through a given physical environment. The self-driving cars and drones do well with their wheels and rotors.  However, walking on legs especially a single pair of legs is challenging (Gerbert et al., 2017).

Artificial Intelligence (AI) and machine learning (ML) have observed an increasing trend across industries, and public sector (Brook, 2018).  Such increasing trend plays a significant role in the digital world (Brook, 2018).  This increasing trend is driven by the customer-centric view of data involving use data as part of the product or service (Brook, 2018). The customer-centric model assumes data enrichment with data from multiple sources, and the data is divided into real-time data and historical data (Brook, 2018).  Businesses build a trust relationship with customers, where data is becoming the central model for many consumer services such as Amazon, and Facebook (Brook, 2018).   The data value increases over time (Brook, 2018).  The impact of machine learning and artificial intelligence have driven the need for “corporate memory” to be rapidly adopted in organizations.  (Brook, 2018) have suggested organizations implement loosely coupled data silos and data lake which can contribute to the corporate memory and the super-fast data usage in the age of AI-driven data usage.  Various examples of AL and ML impact on BDA and the value of data over time include Coca-Cola’s global market and extensive product list, IBM’s machine learning system Watson, GE Power using BD, ML, and internet of things (IoT) to build internet of energy (Marr, 2018).  Figure 1 shows the impact of AI and ML on Big Data Analytics and the value of the data over time.


Figure 1.  Impact of AI and ML on BDA and the Value of Data Overtime (Brook, 2018).

AI is anticipated to be the most dominant factor that will have a disruptive impact on organizations and businesses (Hansen, 2017).  (Mills, 2018) has suggested that organizations need to embrace BD and AI to help their businesses.  EMC survey has shown that 69% of information technology decision-makers in New Zealand believe that BDA is critical to their business strategy, and 41% already incorporated BD into the everyday business decision (Henderson, 2015).     

The application of AI to BDA can assist businesses and organizations to detect a correlation between factors humans cannot perceive (Henderson, 2015).  It can allow organizations to deal with the speed of the information change today in the business world (Henderson, 2015).   AI can help organization add a level of intelligence to their BDA to understand complex issues better quicker than humans can in the absence of AI (Henderson, 2015).  AI can also serve to fill the gap left by not having enough data analysts available (Henderson, 2015).  AI can also reveal insights that can lead to novel solutions to existing problems or even uncover issues that are not previously known (Henderson, 2015).  A good example of AI impact on BDA is the AI-powered BDA in Canada which is used to identify patterns in the vital signs of premature babies that can be used in the early detection of life-threatening infections.  Figure 2 shows AI and BD working together for better analytics and better insight. 


Figure 2:  Artificial Intelligence and Big Data (Hansen, 2017).

Conclusion

This assignment has discussed the impact of artificial intelligence (AI) on Big Data Analytics (BDA).  It began with the identification of the building blocks of the AI and the impact of each building block on BDA.  BDA has an essential impact on AI as it empowers it, and AI has a crucial role in BDA as demonstrated and proven in various fields especially in the healthcare and financial industries.  The researcher would like to summarize this relationship between AI and BDA in a single statement: “AI without BDA is lame, and BDA without AI is blind.”  

References

Brook, P. (2018). Trends in Big Data and Artificial Intelligence Data

Chibuk, J. D. (2018). Four Building Blocks for a General AI.

Gerbert, P., Hecker, M., Steinhäuser, S., & Ruwolt, P. (2017). The Building Blocks of Artificial Intelligence.

Hansen, S. (2017). How Big Data Is Empowering AI and Machine Learning?

Henderson, J. (2015). Insight: What role does Artificial Intelligence Play in Big Data?  What are the links between artificial intelligence and Big Data?

Marr, B. (2018). 27 Incredible Examples Of AI And Machine Learning In Practice.

Mills, T. (2018). Eight Ways Big Data And AI Are Changing The Business World.

Rao, S. (2017). The Five Fundamental Building Blocks for Artificial Intelligence in Banking.