Building Blocks of a System for Healthcare Big Data Analytics

Dr. Aly, O.
Computer Science

Introduction

The purpose of this discussion is to create the building blocks of a system for healthcare Big Data Analytics and compare the building block design to a DNA networked cluster currently used by an organization in the current market.

The discussion begins with the Cloud Computing Building Blocks, followed by Big Data Analytics Building Blocks, and DNA Sequencing. The discussion also addresses the building blocks for the health analytics and the building blocks for DNA Sequencing System, and the comparison between both systems.

Cloud Computing Building Blocks

The Cloud Computing model contains two elements: the front end and the back end.  Both elements are connected to the network. The user interacts with the system using the front end, while the cloud itself is the back end. The front end is the client which the user uses to access the cloud through a device such as a smartphone, tablet, and laptops.  The backend represented by the Cloud provides applications, computers, servers and data storage which creates the services (IBM, 2012).   

As indicated in (Macias & Thomas, 2011), three building blocks are required to enable Cloud Computing. The first block is the “Infrastructure,” where the organization can optimize data center consolidation, enhance network performance, connect anyone, anywhere seamlessly, and implement pre-configured solutions.  The second block is the “Applications,” where the organization can identify applications for rapid deployment, and utilize automation and orchestration features.  The third block is the “Services,” where the organization can determine the right implementation model, and create a phased cloud migration plan.

In (Mousannif, Khalil, & Kotsis, 2013-14), the building blocks for the Cloud Computing involve the physical layer, the virtualization layer, and the service layer.  Virtualization is a basic building block in Cloud Computing.  Virtualization is the technology which hides the physical characteristics of the computing platform from the front end users.  Virtualization provides an abstract and emulated computing platform.  The clusters and grids are features and characteristics in Cloud Computing for high-performance computing applications such as simulations. Other building blocks of the Cloud Computing include Service-Oriented Architectures (SOA) and Web Services (Mousannif et al., 2013-14). 

Big Data Building Block

As indicated in (Verhaeghe, n.d.), there are four major building blocks for Big Data Analytics.  The first building block is Big Data Management to enable organization capture, store and protect the data. The second building block for the Big Data is the Big Data Analytics to extract value from the data.  Big Data Integration is the third building block to ensure the application of governance over the data.  The last building block in Big Data is the Big Data Applications for the organization to apply the first three building blocks using the Big Data technologies.

DNA Sequencing

DNA stands for Deoxyribonucleic Acid which represents the smallest building block of life (Matthews, 2016).  As indicated in (Salzberg, 1999), advances in biotechnology have produced enormous volumes of DNA-related information.  However, the rate of data generation is outpacing the ability of the scientists to analyze the data.  DNA Sequencing is a technique used to determine the order of the four chemical building blocks, called “bases,” which make up the DNA molecule (genome.gov, 2015).  The sequence provides the kind of genetic information which is carried in a particular DNA segment.  DNA sequencing can provide valuable information about the role of inheritance in susceptibility to disease and response to the influence of environment.  Moreover, DNA sequencing provides rapid and cost-effective diagnosis and treatments.  Markov chains and hidden Markov models are probabilistic techniques which can be used to analyze the result of the DNA sequencing (Han, Pei, & Kamber, 2011).  Example of the DNA Sequencing application is discussed and analyzed in (Leung et al., 2011), where the researchers employed Data Mining on DNA Sequences biological data sets for Hepatitis B Virus. 

DNA Sequencing was performed on non-networked computers, using a limited subset of data due to the limited computer processing speed (Matthews, 2016).  However, DNA Sequencing has been experiencing various advanced technologies and techniques.  Predictive Analytic is an example of these techniques which are applied to DNA Sequencing resulting Predictive Genomics.  Cloud Computing plays a significant role in the success of the Predictive Genomics for two major reasons.  The first reason is the volume of the genomic data, while the second reason is the low cost (Matthews, 2016).  Cloud Computing is becoming a valuable tool for various domains including the DNA Sequencing.   As cited in (Blaisdell, 2017), the study of the Transparency Market Research showed that the healthcare Cloud Computing market is going to evolve further, reaching up to $6.8 Billion by 2018. 

Building Block for Healthcare System

Healthcare data requires protection due to the security and privacy concerns.  Thus, Private Cloud will be used in this use case.  To build a Private Cloud, the virtualization layer, the physical layer, and the service layer are required.  The virtualization layer consists a hypervisor to allow multiple operating systems to share a single hardware system.  The hypervisor is a program which controls the host processors and resources by allocating the resources to each operating system.  Two types of hypervisors: native and also called bare-metal or type 1 and hosted also called type 2.  Type 1 runs directly on the physical hardware while Type 2 runs on a host operating system which runs on the physical hardware.  Examples of the native hypervisor include VMware’s ESXi, Microsoft’s Hyper-V. Example of the hosted hypervisor includes Oracle VirtualBox and VMware’s Workstation.  The physical layer can consist of two computer pools one for PC and the other for the server (Mousannif et al., 2013-14).   

In (Archenaa & Anita, 2015), the researchers illustrated the secure Healthcare Analytic System.  The Electronic health record is a heterogeneous dataset which is given as input to HDFS through Flume and Sqoop. The analysis of the data is performed using MapReduce and Hive by implementing Machine Learning algorithm to analyze the similar pattern of data, and to predict the risk for patient health condition at an early stage.  HBase database is used for storing the multi-structured data. STORM is used to perform live streaming and any emergency conditions such as patient temperature rate falling beyond the expected level. Lambda function is also used in this healthcare system.  The final component of a building block in Healthcare system involves the reports generated by the top layer tools such as “Hunk.”  Figure 1 illustrates the Healthcare System, adapted from

Figure 1.  Healthcare Analytics System. Adapted from (Archenaa & Anita, 2015)

Building Block for DNA and Next Generation Sequencing System

Besides the DNA Sequencing, there is a next-generation sequencing (NGS) which is increasing exponentially since 2007 (Bhuvaneshwar et al., 2015).  In (Bhuvaneshwar et al., 2015), the Globus Genomic System is proposed as an enhanced Galaxy workflow system made available as a service offering users the capability to process and transfer data easily, reliably and quickly.  This system addresses the end-to-end NGS analysis requirements and is implemented using Amazon Cloud Computing Infrastructure.  Figure 2 illustrates the framework for the Globus Genomic System taking into account the security measures for protecting the data.  Examples of healthcare organizations which are using Genomic Sequencing include Kaiser Permanente in Northern California, and Geisinger Health System in Pennsylvania (Khoury & Feero, 2017).  

Figure 2. Globus Genomics System for Next Generation Sequencing (NGS). Adapted from (Bhuvaneshwar et al., 2015).

In summary, Cloud Computing has reshaped the healthcare industry in many aspects.  Healthcare Cloud Computing and Analytics provide many benefits from the easy access to the electronic patient records to DNA Sequencing and NGS.  The building blocks of the Cloud Computing must be implemented with care for security and privacy consideration to protect the patients’ data from unauthorized users.  The building blocks for Healthcare Analytics system involves advanced technologies such as Hadoop, MapReduce, STORM, Flume as illustrated in Figure 1.  The building blocks for DNA Sequencing and NGS System involves Dynamic Worker Pool, HTCondor, Shared File System, Elastic Provisioner, Globus Transfer and Nexus, and Galaxy as illustrated in Figure 2.  Each system has the required building blocks to perform the analytics tasks.  

References

Archenaa, J., & Anita, E. M. (2015). A survey of big data analytics in healthcare and government. Procedia Computer Science, 50, 408-413.

Bhuvaneshwar, K., Sulakhe, D., Gauba, R., Rodriguez, A., Madduri, R., Dave, U., . . . Madhavan, S. (2015). A case study for cloud-based high throughput analysis of NGS data using the globus genomics system. Computational and structural biotechnology journal, 13, 64-74.

Blaisdell, R. (2017). DNA Sequencing in the Cloud. Retrieved from https://rickscloud.com/dna-sequencing-in-the-cloud/.

genome.gov. (2015). DNA Sequencing. Retrieved from https://www.genome.gov/10001177/dna-sequencing-fact-sheet/.

Han, J., Pei, J., & Kamber, M. (2011). Data mining: concepts and techniques: Elsevier.

IBM. (2012). Cloud computing fundamentals: A different way to deliver computer resources. Retrieved from https://www.ibm.com/developerworks/cloud/library/cl-cloudintro/cl-cloudintro-pdf.pdf.

Khoury, M. J., & Feero, G. (2017). Genome Sequencing for Healthy Individuals? Think Big and Act Small! Retrieved from https://blogs.cdc.gov/genomics/2017/05/17/genome-sequencing-2/.

Leung, K., Lee, K., Wang, J., Ng, E. Y., Chan, H. L., Tsui, S. K., . . . Sung, J. J. (2011). Data mining on dna sequences of hepatitis b virus. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB), 8(2), 428-440.

Macias, F., & Thomas, G. (2011). Three Building Blocks to Enable the Cloud. Retrieved from https://www.cisco.com/c/dam/en_us/solutions/industries/docs/gov/white_paper_c11-675835.pdf.

Matthews, K. (2016). DNA Sequencing. Retrieved from https://cloudtweaks.com/2016/11/cloud-dna-sequencing/.

Mousannif, H., Khalil, I., & Kotsis, G. (2013-14). Collaborative learning in the clouds. Information Systems Frontiers, 15(2), 159-165. doi:10.1007/s10796-012-9364-y

Salzberg, S. L. (1999). Gene discovery in DNA sequences. IEEE Intelligent Systems and their Applications, 14(6), 44-48.

Verhaeghe, X. (n.d.). The Building Blocks of a Big Data Strategy. Retrieved from https://www.oracle.com/uk/big-data/features/bigdata-strategy/index.html.