Case Study Demonstrating The Need for Data-In-Motion Analytics

Dr. Aly, O.
Computer Science

Introduction

The purpose of this discussion is to discuss and analyze a case study which demonstrates the need for data-in-motion analytics.  The discussion begins with real-time data and data-in motion followed by the need for data-in-motion analytics.

Real-Time Data and Data-in-Motion

There are three types of status for data: data in use, data at rest and data in motion.  The data in use indicates that the data are used for services or users require them for their work to accomplish specific tasks.  The data at rest indicates that the data are not in use and are stored or archived in storage.  The data in motion indicates that the data state is about to change from data at rest to data in use or transferred from one place to another successfully (Chang, Kuo, & Ramachandran, 2016).

One of the significant characteristics of Big Data is velocity.  The speed of data generation is described by (Abbasi, Sarker, & Chiang, 2016) as “hallmark” of Big Data.   Wal-Mart is an example of generating the explosive amount of data, by collecting over 2.5 petabytes of customer transaction data every hour.  Moreover, over one billion new tweets occur every three days, and five billion search queries occur daily (Abbasi et al., 2016).  Velocity is the data in motion (Chopra & Madan, 2015; Emani, Cullot, & Nicolle, 2015; Katal, Wazid, & Goudar, 2013; Moorthy, Baby, & Senthamaraiselvi, 2014; Nasser & Tariq, 2015).  Velocity involves streams of data, structured data, and the availability of access and delivery (Emani et al., 2015). The velocity of the incoming data does not only represent the challenge of the speed of the incoming data because this data can be processed using the batch processing but also in streaming such high speed-generated data during the real-time for knowledge-based decision (Emani et al., 2015; Nasser & Tariq, 2015).  Real-Time Data (a.k.a Data in Motion) is the streaming data which needs to be analyzed as it comes in (Jain, 2013).

As indicated in (CSA, 2013), the technologies of Big Data are divided into two categories; batch processing for analyzing data that is at rest, and stream processing for analyzing data in motion. Example of data at rest analysis includes sales analysis, which is not based on a real-time data processing (Jain, 2013).  Example of data in motion analysis includes Association Rules in e-commerce. The response time for each data processing category is different.  For the stream processing, the response time of data was from millisecond to seconds, but the more significant challenge is to stream data and reduce the response time under much lower than milliseconds, which is very challenging (Chopra & Madan, 2015; CSA, 2013). The data in motion reflecting the stream processing or real-time processing does not always need to reside in memory, and new interactive analysis of large-scale data sets through new technologies like Apache Drill and Google’s Dremel provide new paradigms for data analytics.  Figure 1 illustrates the response time for each processing type.

Figure 1.  The Batch and Stream Processing Responsiveness (CSA, 2013).

There are two kinds of systems for the data at rest; the NoSQL systems for interactive data serving environments, and the systems for large-scale analytics based on MapReduce paradigm, such as Hadoop.  The NoSQL systems are designed to have a simpler key-value based Data Model having in-built sharding, and work seamlessly in a distributed cloud-based environment (Gupta, Gupta, & Mohania, 2012).  The data stream management system allows the user to analyze data in motion, rather than collecting vast quantities of data, storing it on disk, and then analyzing it.  There are various streams processing systems such as IBM InfoSphere Streams (Gupta et al., 2012; Hirzel et al., 2013), Twitter’s Storm, and Yahoo’s S4.   These systems are designed and geared towards clusters of commodity hardware for real-time data processing (Gupta et al., 2012).

The Need for Data In-Motion Analytics

The explosive growth of data provides significant implications for “real-time” predictive analytics in various application areas, ranging from health to finance (Abbasi et al., 2016).  The analysis of data in motion presents new challenges as the desired patterns and insights are moving targets which are different when dealing with static data (Abbasi et al., 2016).  Adding streaming analytics processes might be required because of the increased velocity of the data, to focus on the evaluation of the precision, accuracy, and integrity of the data while the data is in motion.  Moreover, the availability window is decreased because of the high velocity of the systems as well. 

However, the traditional batch processing cycle times can expose the business to high risk, and any delay protracts the exposure in cases such as the fraud or public safety threats (Ballard et al., 2014), or intrusion detection.  As indicated in (Sokol & Ames, 2012), the frameworks of streaming analytics enable organizations to apply various continuous and predictive analytics to structured and unstructured data in motion.  These streaming analytics frameworks bring high-value information in real-time or near real-time rather than waiting to store and perform traditional business intelligence operations which might be too late to affect situational awareness.  Thus, there is a need for real-time data analytics or data analytics in motion.

Case Study

The value of Big Data in various industries is demonstrated in various case studies. In (Przybyszewski, 2016), the value of Big Data Analytics in real time is demonstrated across various industries such as banking, finance, communications, public sector, retail and CPG, manufacturing and healthcare life science.  Figure 2 illustrates the value of Big Data Analytics in real-time (a.k.a in motion).

Figure 2. The Value of Data Analytics in Real-Time (Przybyszewski, 2016).

The use case involves a major specialty department store.  The business challenge faced by a major specialty department store was to improve its product marketing precision.  The company was interested in enabling in-store, real-time production promotion among its shoppers.  The solution was to use Big Data Analytics in real-time.  The company ingested and integrated data in real-time and batch, in both structured and unstructured formats.  An ETL process transforms the raw data, which was then consumed by learning algorithms. The retailer can now deliver real-time recommendations and promotions through all channels, including its website, store kiosks, and mobile apps.  The use of the Big Data Analytics in real-time resulted in building omnichannel recommendation engine similar to what Amazon does online.  Thirty-five percent of what consumers purchase on Amazon and seventy-five percent of what they watch on Netflix comes from such product recommendations based on that type of analysis.  The retailed benefitted from the recommendations engine by providing recommendations based on weather, loyalty, purchase history, abandoned carts or life stage triggers, and deliver those to the shopper in its stores (Przybyszewski, 2016).

Another example is in the healthcare industry. Big Data is making significant impacts throughout many industries such as healthcare leading cancer patients to full recovery, increasing the reach of disaster relief efforts, and much more (Capella.edu, 2017). As indicated in (InformationBuilders, 2018), the providers can be granted real-time, single-view access to the patient, clinical and other relevant health data to support improved decision-making and facilitated effective, efficient and error-free care.  They can also ensure accurate, on-time payment which promptly reimburses them for their time and care (InformationBuilders, 2018).  Moreover, as indicated in (White-House, 2014), the Centers for Medicare and Medicaid Services have begun using predictive analytics software to likely flag instances of reimbursement fraud before claims are paid. The Fraud Prevention System helps identify the highest risk healthcare providers for fraud, waste, and abuse in real-time, and has already stopped, prevented or identified $115 in fraudulent payments saving $3 for every $1 spent in the program’s first year (White-House, 2014).

In summary, Big Data Analytics in real-time adds much value to the organization, besides the batch processing technique which is based on processing data at rest.  The analytics is based on streaming real-time data which is transformed into knowledge for better decisions instantaneously.  The data in motion analytics is being implemented successfully across various industries including healthcare, retail, banking and more.

References

Abbasi, A., Sarker, S., & Chiang, R. (2016). Big data research in information systems: Toward an inclusive research agenda. Journal of the Association for Information Systems, 17(2), 3.

Ballard, C., Compert, C., Jesionowski, T., Milman, I., Plants, B., Rosen, B., & Smith, H. (2014). Information governance principles and practices for a big data landscape: IBM Redbooks.

Capella.edu. (2017). 4 Examples of Data Analytics In Action. Retrieved from https://www.capella.edu/blogs/cublog/big-data-and-analytics-in-action/.

Chang, V., Kuo, Y.-H., & Ramachandran, M. (2016). Cloud computing adoption framework: A security framework for business clouds. Future Generation computer systems, 57, 24-41. doi:http://dx.doi.org/10.1016/j.future.2015.09.031

Chopra, A., & Madan, S. (2015). Big Data: A Trouble or A Real Solution? International Journal of Computer Science Issues (IJCSI), 12(2), 221.

CSA, C. S. A. (2013). Big Data Analytics for Security Intelligence. Big Data Working Group.

Emani, C. K., Cullot, N., & Nicolle, C. (2015). Understandable big data: A survey. Computer science review, 17, 70-81.

Gupta, R., Gupta, H., & Mohania, M. (2012). Cloud computing and big data analytics: what is new from databases perspective? Paper presented at the International Conference on Big Data Analytics.

Hirzel, M., Andrade, H., Gedik, B., Jacques-Silva, G., Khandekar, R., Kumar, V., . . . Soulé, R. (2013). IBM streams processing language: Analyzing big data in motion. IBM Journal of Research and Development, 57(3/4), 7: 1-7: 11.

InformationBuilders. (2018). Data In Motion – Big Data Analytics in Healthcare. Retrieved from http://docs.media.bitpipe.com/io_10x/io_109369/item_674791/datainmotionbigdataanalytics.pdf, White Paper.

Jain, R. (2013). Big Data Fundamentals. Retrieved from http://www.cse.wustl.edu/~jain/cse570-13/ftp/m_10abd.pdf.

Katal, A., Wazid, M., & Goudar, R. (2013). Big data: issues, challenges, tools and good practices. Paper presented at the Contemporary Computing (IC3), 2013 Sixth International Conference on Contemporary Computing.

Moorthy, M., Baby, R., & Senthamaraiselvi, S. (2014). An Analysis for Big Data and its Technologies. International Journal of Science, Engineering and Computer Technology, 4(12), 412.

Nasser, T., & Tariq, R. (2015). Big Data Challenges. J Comput Eng Inf Technol 4: 3. doi:10.4172/2324, 9307, 2.

Przybyszewski, T. (2016). Big Data – Case Studies Examples for Different Industries. Retrieved from https://www.racunarstvo.hr/wp-content/uploads/2016/03/OA_day_Big_Data_Tomasz_Przybysewski.pdf.

Sokol, L., & Ames, R. (2012). Analytics in a Big Data Environment. IBM Redbooks.

White-House. (2014). Big Data: Seizing Opportunities, Preserving Values. Executive Office of the President, White House Report to the President.