The Relationship Between Internet of Things (IoT) and Artificial Intelligence (AI)

Dr. O. Aly
Computer Science

The purpose of this discussion is to address the relationship between the Internet of Things (IoT) and the Artificial Intelligence (AI), and whether one can be used efficiently without the help from the other.  The discussion begins with the Internet of Things (IoT) and artificial intelligence (AI) overview, followed by the relationship between them. 

Internet of Things (IoT) and Artificial Intelligence Overview

Internet of Things (IoT) refers to the increased connected devices with IP addresses that years ago were not common (Anand & Clarice, 2015; Thompson, 2017).  The connected devices collect and use these IP addresses to transmit information (Thompson, 2017).  Organizations take advantages of the collected information for innovation, enhancing customer service, optimizing processes (Thompson, 2017). Providers in healthcare take advantages of the collected information to find new treatment methods and increase efficiency (Thompson, 2017).

IoT implementation involves various technologies such as radio frequency identification (RFID), near field communication (NFC), machine to machine (M2M), wireless sensor network (WSM), and addressing schemes (AS) (IPv6 addresses) (Anand & Clarice, 2015; Kumari, 2017).   The RFID uses electromagnetic fields to identify and track tags attached to objects.  The NFC is a set of thoughts and technologies where smartphones and other objects want to communicate under IoT.  The M2M is used often for remote monitoring. WSM is a set of a large number of sensors used to monitor environmental conditions.  The AS is the primary tool which is used in IoT and giving IP addresses to each object which wants to communicate (Anand & Clarice, 2015; Kumari, 2017).

Machine learning (ML) is a subset of AI.  Machine learning (ML) involves supervise and unsupervised ML (Thompson, 2017).  In the AI domain, the advances in computer science result in creating intelligent machines that resemble humans in their functions (NMC, 2018).  The access to categories, properties, and relationships between various datasets help develop knowledge engineering allowing computers to simulate the perception, learning, and decision making of human (NMC, 2018).  The ML enables computers to learn without being explicitly programmed (NMC, 2018).  The unsupervised ML and AI allow for security tools such as behavior-based-analytics and anomaly detection (Thompson, 2017).  The neural network of AI help model the biological function of the human brain to interpret and react to specific inputs such as words and tone of voice (NMC, 2018).  The neural networks have been used for voice recognition, and natural language processing (NLP), enabling a human to interact with machines.

The Relationship Between IoT and AI

Various reports and studies have discussed the relationship between IoT and AI.  (O’Brien, 2016) has reported the need of IoT to AI to succeed.  (Jaffe, 2014) suggested the same thing that IoT will not work without AI.  IoT future depends on ML to find patterns, correlations, and anomalies that have the potential of enabling improvement in almost every facet of the daily lives (Jaffe, 2014).

Thus, the success of IoT depends on AI.  IoT follows five necessary steps: sense, transmit, store, analyze and act (O’Brien, 2016). AI plays a significant role in the analyzing step, where the ML which is the subset of AI gets involved in this step.  When ML is applied in the analysis step, it can change the subsequent step of “act” which dictates whether the action has high value or no value to the consumer (O’Brien, 2016).   

(Schatsky, Kumar, & Bumb, 2018) suggested the AI can unlock the potential of IoT. As cited in (Schatsky et al., 2018), Gartner predicts by 2022, more than 80% of enterprise IoT projects will include AI components which are up from only 10% in 2018.  International Data Corp (IDC) predicts by 2019, AI will support “all effective” IoT efforts, and without AI, data from the deployments will have limited value (Schatsky et al., 2018).

Various companies are crafting an IoT strategy to include AI (Schatsky et al., 2018).  Venture capital funding of AI-focused IoT start-ups is growing, while vendors of IoT platforms such as Amazon, GE, IBM, Microsoft, Oracle, and Salesforce are integrating AI capabilities (Schatsky et al., 2018).  The value of AI is the ability to extract insight from data quickly. The ML, which is a subset of AI, enables the automatic identification of patterns and detected anomalies in the data that smart sensors and devices generate (Schatsky et al., 2018).  IoT is expected to combine with the power of AI, blockchain, and other emerging technologies to create the “smart hospitals” of the future (Bresnick, 2018).  Example of AI-powered IoT devices includes automated vacuum cleaners, like that of the iRobot Roomba, smart thermostat solutions, like that of Nest Labs, and self-driving cars, such as that of Tesla Motors (Faggella, 2018; Kumari, 2017).   

Conclusion

This discussion has addressed artificial intelligence (AI) and the internet of things (IoT) and the relationship between them.  Machine learning which is a subset of AI is required for IoT at the analysis phase.  Without this analysis phase, IoT will not provide the value-added insight organizations anticipate.  Various studies and reports have indicated that the success and the future of IoT depend on AI. 

References

Anand, M., & Clarice, S. (2015). Artificial Intelligence Meets Internet of Things. Retrieved from http://www.ijcset.net/docs/Volumes/volume5issue6/ijcset2015050604.pdf.

Bresnick, J. (2018). Internet of Things, AI to Play Key Role in Future Smart Hospitals.

Faggella, D. (2018). Artificial Intelligence Plus the Internet of Things (IoT) – 3 Examples Worth Learning From.

Jaffe, M. (2014). IoT Won’t Work Without Artificial Intelligence.

Kumari, W. M. P. (2017). Artificial Intelligence Meets Internet of Things.

NMC, H. P. (2018). NMC Horizon Report: 2017 Higher Education Edition. Retrieve from https://www.nmc.org/publication/nmc-horizon-report-2017-higher-education-edition/.

O’Brien, B. (2016). Why The IoT Needs ARtificial Intelligence to Succeed.

Schatsky, D., Kumar, N., & Bumb, S. (2018). Bringing the power of AI to the Internet of Things.

Thompson, E. C. (2017). Building a HIPAA-Compliant Cybersecurity Program, Using NIST 800-30 and CSF to Secure Protected Health Information.

The Impact of Artificial Intelligence (AI) on Big Data Analytics (BDA)

Dr. O. Aly
Computer Science

The purpose of this discussion is to discuss the influence of artificial intelligence on big data analytics. As discussed in the previous discussion, Big Data empowers artificial intelligence.  This discussion is about the impact of artificial intelligence in the Big Data Analytics domain.  The discussion begins with artificial intelligence building blocks and big data building blocks, following by the impact of the artificial intelligence in the BDA. 

Artificial Intelligence Building Blocks and Their Impact on BDA

Understanding the building blocks of AI could help understand the impact of AI on BDA.  Various reports and studies have identified various building blocks for AI.  Four building blocks have been identified  In (Chibuk, 2018), four building blocks that are expected to shape the next stage of AI.  The computation methodology is the first building block of AI.  This component is structured in a way to improve the computers move from binary to infinite connections. The storage of the information is the second building block of AI improving storing and accessing data in the more efficient form.  Brain-computer interface is the third building block of AI, through which the human minds would speak silently with a computer, and our thought would turn into actions.  The mathematics and algorithms form the last building block of AI to include advanced mathematics called capsule network and having networks to teach each other based on rules defined (Chibuk, 2018). 

(Rao, 2017) has identified five fundamental building blocks for AI in the banking sector, while they can be easily applicable to other sectors. Machine learning (ML) is the first component of AI in banking where the software can learn on its own without being programmed and adjust its algorithms to respond to new insights. The data mining algorithms hand over findings to a human for further work, while machine learning can act on its own (Rao, 2017).  The financial and banking industry can benefit from machine learning for fraud detection, security settlement and alike (Rao, 2017).  The deep learning (DL) is another building block of AI in the banking industry (Rao, 2017).  DL can leverage a hierarchy of artificial neural networks, similar to the human brain to do its job.  DL mimics the human brain to perform non-linear deductions, unlike the linearly traditional programs (Rao, 2017).  DL can produce better decisions by factoring learning from previous transactions or interactions to conclude (Rao, 2017).  Example of DL is the collected information about customers and their behaviors from social networks, from which their likes and preferences can be inferred, and financial institutions can utilize this insight to make contextual, relevant offers to those customers in real-time (Rao, 2017).   Natural language process (NLP) is the third building block for AI in banking (Rao, 2017).  NLP is a key building block in AI to help computers learn, analyze and understand human language (Rao, 2017).  NLP can be used to organize and structure knowledge in order to answer queries, translate content from one language to another, recognize people by their speech, mine text, and perform sentiment analysis (Rao, 2017). The natural language generation (NLG) is another essential building block in AI, which can help computers analyze, understand, and make sense of human language (Rao, 2017).  It can help converse and interact intelligently with humans (Rao, 2017).  NLG can transform raw data into a narrative, which banks such as Credit Suisse are using to generate portfolio review (Rao, 2017).  Visual recognition is the last component of AI which help recognize images and their content (Rao, 2017). It uses DL to perform its role of finding faces, tagging images, identifying the components of visuals, and picking out similar images from a large dataset (Rao, 2017). Various banks such as Australia’s Westpac is using this technology to allow customers to activate their new card from their smartphone camera, and Bank of America, Citibank, Wells Fargo, and TD Bank are using this technology of visual recognition to allow customers to deposit checks remotely via mobile app (Rao, 2017).

(Gerbert, Hecker, Steinhäuser, & Ruwolt, 2017) have identified ten building blocks for AI.  They have suggested that the simplest AI use cases often consist of a single building block. However, they often evolve to combine two or more blocks over time (Gerbert et al., 2017).  The machine vision is one of the building blocks of AI. The machine vision building block of AI is the classification and tracking of real-world objects based on visual, x-ray, laser or other signals.  The quality of machine vision depends on the labels of a large number of reference images which is performed by a human (Gerbert et al., 2017).  Video-based computer vision is anticipated to recognize actions and predict motions within the next five years (Gerbert et al., 2017).  The speech recognition is another building block which involves the transformation of auditory signals into text (Gerbert et al., 2017).  Siri and Alexa can identify most words in a general vocabulary, but as vocabulary becomes specific, tailored programs such as the PowerScribe of Nuance for radiologist will be needed (Gerbert et al., 2017).  Information processing building block of AI involves searching billions of documents or constructing basic knowledge graphs identifying relationships in text.  This building block is closely related to NLP, which is also identified as another building block of AI (Gerbert et al., 2017).  NLP can provide basic summaries of text and infer intent in some instances (Gerbert et al., 2017). Learning from data is another component of AI, which is a machine learning and able to predict values or classify information based on historical data (Gerbert et al., 2017).  While ML is an element in AI building blocks of machine vision and NLP, it is also a separate building block of AI (Gerbert et al., 2017).  Other building blocks of AI include the planning and exploring agents that can help identify the best sequence of actions to achieve certain goals.  Self-driving cars rely on this building clock for navigation (Gerbert et al., 2017).  The image generation is another building block of AI, which is the opposite of machine vision block, as it creates images based on models.  Speech generation is another building block of AI which covers both data-based text generation and text-based speech synthesis. The handling and control building block of AI refers to interactions with real-world objects (Gerbert et al., 2017). The navigating and movement building block of AI covers the ways where robots move through a given physical environment. The self-driving cars and drones do well with their wheels and rotors.  However, walking on legs especially a single pair of legs is challenging (Gerbert et al., 2017).

Artificial Intelligence (AI) and machine learning (ML) have observed an increasing trend across industries, and public sector (Brook, 2018).  Such increasing trend plays a significant role in the digital world (Brook, 2018).  This increasing trend is driven by the customer-centric view of data involving use data as part of the product or service (Brook, 2018). The customer-centric model assumes data enrichment with data from multiple sources, and the data is divided into real-time data and historical data (Brook, 2018).  Businesses build a trust relationship with customers, where data is becoming the central model for many consumer services such as Amazon, and Facebook (Brook, 2018).   The data value increases over time (Brook, 2018).  The impact of machine learning and artificial intelligence have driven the need for “corporate memory” to be rapidly adopted in organizations.  (Brook, 2018) have suggested organizations implement loosely coupled data silos and data lake which can contribute to the corporate memory and the super-fast data usage in the age of AI-driven data usage.  Various examples of AL and ML impact on BDA and the value of data over time include Coca-Cola’s global market and extensive product list, IBM’s machine learning system Watson, GE Power using BD, ML, and internet of things (IoT) to build internet of energy (Marr, 2018).  Figure 1 shows the impact of AI and ML on Big Data Analytics and the value of the data over time.


Figure 1.  Impact of AI and ML on BDA and the Value of Data Overtime (Brook, 2018).

AI is anticipated to be the most dominant factor that will have a disruptive impact on organizations and businesses (Hansen, 2017).  (Mills, 2018) has suggested that organizations need to embrace BD and AI to help their businesses.  EMC survey has shown that 69% of information technology decision-makers in New Zealand believe that BDA is critical to their business strategy, and 41% already incorporated BD into the everyday business decision (Henderson, 2015).     

The application of AI to BDA can assist businesses and organizations to detect a correlation between factors humans cannot perceive (Henderson, 2015).  It can allow organizations to deal with the speed of the information change today in the business world (Henderson, 2015).   AI can help organization add a level of intelligence to their BDA to understand complex issues better quicker than humans can in the absence of AI (Henderson, 2015).  AI can also serve to fill the gap left by not having enough data analysts available (Henderson, 2015).  AI can also reveal insights that can lead to novel solutions to existing problems or even uncover issues that are not previously known (Henderson, 2015).  A good example of AI impact on BDA is the AI-powered BDA in Canada which is used to identify patterns in the vital signs of premature babies that can be used in the early detection of life-threatening infections.  Figure 2 shows AI and BD working together for better analytics and better insight. 


Figure 2:  Artificial Intelligence and Big Data (Hansen, 2017).

Conclusion

This assignment has discussed the impact of artificial intelligence (AI) on Big Data Analytics (BDA).  It began with the identification of the building blocks of the AI and the impact of each building block on BDA.  BDA has an essential impact on AI as it empowers it, and AI has a crucial role in BDA as demonstrated and proven in various fields especially in the healthcare and financial industries.  The researcher would like to summarize this relationship between AI and BDA in a single statement: “AI without BDA is lame, and BDA without AI is blind.”  

References

Brook, P. (2018). Trends in Big Data and Artificial Intelligence Data

Chibuk, J. D. (2018). Four Building Blocks for a General AI.

Gerbert, P., Hecker, M., Steinhäuser, S., & Ruwolt, P. (2017). The Building Blocks of Artificial Intelligence.

Hansen, S. (2017). How Big Data Is Empowering AI and Machine Learning?

Henderson, J. (2015). Insight: What role does Artificial Intelligence Play in Big Data?  What are the links between artificial intelligence and Big Data?

Marr, B. (2018). 27 Incredible Examples Of AI And Machine Learning In Practice.

Mills, T. (2018). Eight Ways Big Data And AI Are Changing The Business World.

Rao, S. (2017). The Five Fundamental Building Blocks for Artificial Intelligence in Banking.

Can Artificial Intelligence Support or Replace Decision Makers?

Dr. O. Aly
Computer Science

The purpose of this discussion is to discuss artificial intelligence and whether it should be used as a tool to support or replace decision makers.  The discussion begins with a brief history of artificial intelligence (AI), followed by the foundation of the AI, and the question about AI whether it should be used to support or replace decision makers.

The History of Artificial Intelligence

Artificial intelligence is defined as a computational technique allowing machines to perform cognitive functions such as acting or reacting to input, similar to the way humans do (Patrizio, 2018).  The gestation of AI was between the year of 1943 and 1955. The work of Warren McCulloch and Walter Pins (1943) is regarded to be the first work of Artificial Intelligence (AI) (Russell & Norvig, 2016).  Their work drew on three sources: knowledge of the underlying physiology and function of neurons in the brain, a formal analysis of propositional logic due, and Turing’s theory of computation (Russell & Norvig, 2016).  Hebbian learning is the result of the work from Donald Hebb (1949) who demonstrated a simple updating a rule for modifying the connection strengths between neurons (Russell & Norvig, 2016). The Hebbian theory is still an influential model to this day (Russell & Norvig, 2016).  

The birth of AI was in 1956, when John McCarthy, who was another influential figure in AI in Princeton, initiated a project for AI. AI witnessed early enthusiasm, and high expectation from 1952 until 1969 (Russell & Norvig, 2016). AI witnessed a dose of reality between 1966 and 1973.  The knowledge-based systems as the key to power began in 1969 until 1979.  In 1980 until the present time, the AI became an industry.  From 1986 until today, the neural networks are returned.  From 1987 until the present, AI adopts the scientific method.  The emergence of intelligent agents is developed from 1995 until the present time.  The large dataset became available from 2001 until the present.  Recent works of AI suggest that the emphasis should be on data and not an algorithm to solve many problems (Russell & Norvig, 2016).

The Foundation of Artificial Intelligence

AI, ideally, takes the best possible action in a situation (Russell & Norvig, 2016).  Building an agent that is intelligent is not an easy task and is described as problematic.  There are eight foundations for building an intelligent agent.  The early philosophers such as Aristotle (400 B.C.) made the AI conceivable by considering the ideas that the mind is in some ways like a machine, that it operates on knowledge encoded in some internal language, and that thought can be used to choose what actions to take (Russell & Norvig, 2016).  The mathematics is another block for building an intelligent agent, where mathematician provides the tools to manipulate certain and uncertain statement, as well as probabilistic statements.  Mathematics also set the groundwork for understanding computation and reasoning about algorithms (Russell & Norvig, 2016).  The economics formalize the problem of making decisions that maximize the expected outcome of the decision makers (Russell & Norvig, 2016).  The neuroscience discovered some facts about how the brain works and how it is similar to and different from computers.  The computer engineering provided the ever-more-powerful machines that make AI applications possible.  The control theory deals with designing devices that act optimally by feedback from the environment (Russell & Norvig, 2016).  Understanding language requires an understanding of the subject matter and context, not just an understanding of the structure of sentences, which can cause a problem in AI (Russell & Norvig, 2016).

Can AI Support or Replace Decision-Maker?

AI has already entered various industries such as healthcare (navatiosolutions.com, 2018; UNAB, 2018).  It has been used in managing medical records and other data.  It has also been used for doing repetitive jobs such as analyzing tests, X-Rays, CT scans, and data entry (navatiosolutions.com, 2018).  AI has been used to analyzing data, and reports to help select the correct individually customized treatment path (navatiosolutions.com, 2018).  Patients can report their symptoms into an AI app which uses speech recognition to compare against a database of illness.  AI acts as virtual nurses to help monitor the conditions of patients and follow up with treatments between doctor visits (navatiosolutions.com, 2018).  AI has also been used to monitor the use of medication by a patient.  Pharmaceutical has taken advantage of AI in creating drugs faster and cheaper.  AI has been used for genetics and genomics for mutations and links to disease from information in DNA (navatiosolutions.com, 2018).  AI has been used to sift through the data to highlight mistakes in treatments, workflow inefficiencies, and helps area healthcare systems avoid unnecessary patient hospitalization (navatiosolutions.com, 2018).  Other examples of AI’s benefits include the autonomous transport system decreasing the number of accidents, the medical systems making quantum advances possible in health monitoring (UNAB, 2018). 

The UNAB think tank (UNAB, 2018) has raised valid questions among which the singularity of human and AI and whether the human and AI can become integrated.  AI control of the human with no regard to the human value is causing fears towards AI technology (UNAB, 2018).  The other questions include the following (UNAB, 2018):

  • “What if AI was wholly monitoring human behavior, without human participation?
  • Who or what will be engaged in the related decision-making process?
  • To what extent would individuals accept AI despite the consequences?
  • Will the human factor as we know it disappears completely?”

These questions are valid questions to fully adopt AI technology and integrate it fully into the human lives.  (James, 2018) has raised another valid question “Can We Trust AI?”  Despite the benefits of AI especially in the healthcare industry, these systems can still make mistakes, caused by limited training, or unknown bias in the algorithm due to lack of understanding of the neural network models operation (James, 2018).  Several high profile instances of machines have demonstrated bias, which caused by wrong training dataset, and a malicious attacker who hacks into the training dataset to make it bias (IBM, n.d.).

Ethics issues come along with AI technology adoption (James, 2018; UNAB, 2018). IBM has suggested instilling human values and morality into AI systems (IBM, n.d.).  However, there is no single ethical system for AI (IBM, n.d.).  Transparency seems to be a key in trusting AI (IBM, n.d.; James, 2018).  People need to know how the AI system arrives at a particular conclusion and make a decision or a recommendation (IBM, n.d.; James, 2018).

Conclusion

This discussion has addressed the artificial intelligence and its key dimension in human life.  It has contributed to various industries including healthcare and pharmaceutical and proven to provide value in certain areas.  However, it is also proven to make mistakes and demonstrated bias due to wrong training data set or malicious attacks.  There is a fear about integrating AI technology fully into human lives with no regard to human’s participation and human’s values.  Integrating values and ethics is not an easy task. 

From the researcher point of view, AI should not be used for making decisions that are related to human values and ethics.  Human lives have many dimensions that are not always black and white.  There are some areas where human integrity, principles, values, and ethics play a role.   In the court, there is always a statement of “benefit of the doubt.” Can AI decision be based on the “benefit of the doubt” rule in the court?  Another aspect of AI, from the researcher’s point of view, is: who develops AI?  The AI technology is developed by humans. Are humans trying to get rid of humans and put AI in a superior role?  AI technology has its role and its dimension in certain fields but not in all fields and domains where a human can move and interact with other humans with integrity and values. Let AI technology take place and make decisions in areas where it is proven to be most useful to human such as promoting sales and marketing, automating certain processes to increase efficiency, and productivity, etc.  Let the human takes place and makes decisions in areas where it is proven to be most useful to human lives promoting ethics, values, integrity, and principles.  “Computers are becoming great assistants to us however they still need our thought to make good decisions” (Chibuk, 2018).

References

Chibuk, J. D. (2018). Four Building Blocks for a General AI.

IBM. (n.d.). Building Trust in AI. Retrieved from https://www.ibm.com/watson/advantage-reports/future-of-artificial-intelligence/building-trust-in-ai.html.

James, R. (2018). Can We Trust AI? Retrieved from https://www.electronicdesign.com/industrial-automation/can-we-trust-ai.

navatiosolutions.com. (2018). 10 Common Applications of Artificial Intelligence in Healthcare. Retrieved from https://novatiosolutions.com/10-common-applications-artificial-intelligence-healthcare/.

Patrizio, A. (2018). Big Data vs. Artificial Intelligence.

Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach: Malaysia; Pearson Education Limited.

UNAB. (2018). Human Decision Thoughts On AI. Retrieved from http://unesdoc.unesco.org/images/0026/002615/261563E.pdf, United Nations Educational, Scientificn and Cultrual Organization.

The Significance of Big Data and Artificial Intelligence to any Industry

Dr. O. Aly
Computer Science

The purpose of this discussion is to address whether the combination of Big Data and Artificial Intelligence is significant to any industry.  The discussion also provides an example where Artificial Intelligence has been used and applied successfully.  The chosen sector for such the use of AI is the health care.    

The Significance of Big Data and Artificial Intelligence Integration

            As discussed in U4-DB2, Big Data empowers artificial intelligence.  Thus, there is no doubt about the benefits and advantages of utilizing Big Data in artificial intelligence for businesses.  However, in this discussion, the question is whether the significance of their combination in any industry or specific industries only. 

            McKinsey Global Institute reported in 2011 that not all industries are created equal when parsing the benefits of Big Data (Brown, Chui, & Manyika, 2011).  The report has indicated that although Big Data is changing the game for virtually every sector, it favors some companies and industries over the others, especially in the early stages of the adoption.  McKinsey has also reported in (Manyika et al., 2011) five domains that could take advantages of the transformative potential of Big Data. These domains include for U.S. healthcare, retail, and public sector administration, retail for European Union, and personal location data globally.  Figure 1 illustrates the value of Big Data significant financial value across sectors.


Figure 1.  Big Data Financial Value Across Sectors (Manyika et al., 2011).

            Thus, the value of Big Data Analytics is tremendous already for almost every business, and the value varies from one sector to another.  The combination of Big Data and artificial intelligence is good for innovation (Bean, 2018; Seamans, 2017).  There is no limit for innovation for any business.  Figure 2 shows the 19-year Go player Ke Jie reacts during the second match against Google’s artificial intelligence program AlphaGo in Wuzhen. 


Figure 2.  19-year old Ke Jie Reacts During the Second Match Against Google’s Artificial Intelligence Program AlphaGo (Seamans, 2017).

If the combination of Big Data and artificial intelligence is good for innovation, then, logically every organization and every sector need innovations to survive the competition.  In the survey conducted by NewVantage Partner, 97.2% of the executive decision-makers have reported that their companies are investing in building or launching Big Data and Artificial Intelligence initiatives (Bean, 2018; Patrizio, 2018).  It is also worth noting that 76.5% of the executives have indicated that the availability of Big Data is empowering AI and cognitive initiatives within their organizations (Bean, 2018).  The same survey has also shown 93% of the executives have identified artificial intelligence as the disruptive technology and their organizations are investing in for the future.  This result shows that a common consensus among the executives that organizations must leverage cognitive technologies to compete in an increasingly disruptive period (Bean, 2018). 

AI Application Example in the Health Care Industry

Since healthcare industry has been identified in various research studies about its great benefits from Big Data and artificial intelligence, this sector is chosen as an example of the application of both BD and AI for this discussion. AI is becoming a transformational force in healthcare (Bresnick, 2018).  The healthcare industry has almost endless opportunities to apply technologies such as Big Data and AI to deploy more precise and impactful interventions at the right time in the care of patients (Bresnick, 2018).

Harvard Business Review (HBR) has indicated that 121 health AI and machine learning companies raised $2.7 billion in 206 deals between 2011 and 2017 (Kalis, Collier, & Fu, 2018).  HBR has examined ten promising artificial intelligence applications in healthcare (Kalis et al., 2018).  The findings have shown that the application of AI could create up to $150 billion in annual savings for U.S. health care by 2026 (Kalis et al., 2018).   The investigation has also shown that AI currently creates the most value in assisting the frontline clinicians to be more productive and in making back-end processes more efficient, but not yet in making clinical decisions or improving clinical outcomes (Kalis et al., 2018).  Figure 3 shows the ten AI applications that could change health care.


Figure 3.  Ten Application of AI That Could Change Health Care (Kalis et al., 2018).

Conclusion

            In conclusion, the combination of Big Data and Artificial Intelligence drives innovations for all sectors. Every sector and every business need to innovate to maintain a competitive edge.  Some sectors are leading in taking the advantages of this combination of BD and AI more than others.  Health care is an excellent example of employing artificial intelligence.  However, the application of the AI has its most value on three main areas only of AI-assisted surgery, virtual nurse, administrative workflow.  The use of AI in other areas in healthcare is still in infant stages and will take time until it establishes its root and witness the great benefits of AI application (Kalis et al., 2018).

References

Bean, R. (2018). How Big Data and AI Are Driving Business Innovation in 2018. Retrieved from https://sloanreview.mit.edu/article/how-big-data-and-ai-are-driving-business-innovation-in-2018/.

Bresnick, J. (2018). Top 12 Ways Artificial Intelligence Will Impact Healthcare. Retrieved from https://healthitanalytics.com/news/top-12-ways-artificial-intelligence-will-impact-healthcare.

Brown, B., Chui, M., & Manyika, J. (2011). Are you ready for the era of ‘big data’. McKinsey Quarterly, 4(1), 24-35.

Kalis, B., Collier, M., & Fu, R. (2018). 10 Promising AI Applications in Health Care. Retrieved from https://hbr.org/2018/05/10-promising-ai-applications-in-health-care, Harvard Business Review.

Manyika, J., Chui, M., Brown, B., Bughin, J., Dobbs, R., Roxburgh, C., & Byers, A. H. (2011). Big data: The next frontier for innovation, competition, and productivity.

Patrizio, A. (2018). Big Data vs. Artificial Intelligence.

Seamans, R. (2017). Artificial Intelligence And Big Data: Good For Innovation?

The Impact of Big Data Analytics (BDA)on Artificial Intelligence (AI)

Dr. O. Aly
Computer Science

The purpose of this discussion is to discuss the future impact of Big Data Analytics for Artificial Intelligence.  The discussion will also provide an example of the AI use in Big Data generation and analysis.  The discussion begins with artificial intelligence, followed by an advanced level of big data analysis.  The impact of the Big Data (BD) on the artificial intelligence is also discussed addressing various examples showing how artificial intelligence is empowered by BD.

Artificial Intelligence

Artificial Intelligence (AI) has eight definitions laid out across two dimensions of thinking and acting (Table 1) (Russell & Norvig, 2016). The top definitions are concerned with thought processes and reasoning, while the bottom definitions address the behavior.  The definitions on the left measure success regarding fidelity to human performance, while the definitions on the rights measure against an ideal performance measure called “rationality” (Russell & Norvig, 2016).  The system is “rational” if it does the “right thing” given what it knows.

Table 1:  Some Definitions of Artificial Intelligence, Organized Into Four Categories (Russell & Norvig, 2016).

The study (Patrizio, 2018) defined artificial intelligence as a computational technique allowing machines to perform cognitive functions such as acting or reacting to input, similar to the way humans do.  The traditional computing applications react to data, but the reactions and responses have to be hand-coded.   However, the app cannot react to unexpected results (Patrizio, 2018).  The artificial intelligence systems are continuously in a flux mode changing their behavior to accommodate any changes in the results and modifying their reactions (Patrizio, 2018).  The artificial intelligence-enabled system is designed to analyze and interpret data and address the issues based on those interpretations (Patrizio, 2018).  The computer learns once how to act or react to a particular result and knows in the future to act in the same way using the machine learning algorithms (Patrizio, 2018).  IBM has invested $1 billion in artificial intelligence through the launch of its IBM Watson Group (Power, 2015).  The health care industry is the most significant application of Watson (Power, 2015).

Advanced Level of Big Data Analysis

The fundamental analytics techniques include descriptive analytics allowing breaking down big data into smaller, more useful pieces of information about what has happened, and focusing on the insight gained from the historical data to provide trending information on past or current events (Liang & Kelemen, 2016).  However, the advanced level computational tools focus on predictive analytics, to determine patterns and predict future outcomes and trends through quantifying effects of future decision to advise on possible outcomes (Liang & Kelemen, 2016).  The prescriptive analytic includes functions as a decision support tool exploring a set of possible actions and proposing actions based on descriptive and predictive analysis of complex data.  The advanced level computational techniques include real-time analytics.

Advanced level of data analysis includes various techniques.  The real-time analytics and meta-analysis can be used to integrate multiple data sources (Liang & Kelemen, 2016).  The hierarchical or multi-level model can be used for spatial data, a longitudinal and mixed model for real-time or dynamic temporal data rather than static data (Liang & Kelemen, 2016).  The data mining, pattern recognition can be used for trends, and pattern detection (Liang & Kelemen, 2016).  The natural language processing (NLP) can be used for text mining, machine learning, statistical learning Bayesian learning with auto-extraction of data and variables (Liang & Kelemen, 2016).  The artificial intelligence with automatic ensemble techniques and intelligent agent, and deep learning such as neural network, support vector machine, dynamic state-space model, automatic can be used for automated analysis and information retrieval (Liang & Kelemen, 2016).  The causal inferences and Bayesian approach can be used for probabilistic interpretations (Liang & Kelemen, 2016).  

Big Data Empowers Artificial Intelligence

            The trend of artificial intelligence implementation is increasing.  It is anticipated that 70% of enterprises will implement artificial intelligence (AI) by the end of 2018, which is up from 40% in 2016 and 51% in 2017 (Mills, 2018).  A survey conducted by NewVantage Partners of c-level executive decision-makers found that 97.2% of executives stated that their companies are investing in, building, or launching Big Data and artificial intelligence initiatives (Bean, 2018; Patrizio, 2018).  The same survey has found that 76.5% of the executives feel that the artificial intelligence and Big Data are becoming interconnected closely and the availability of the data is empowering the artificial intelligence and cognitive initiatives within their organizations (Patrizio, 2018).

Artificial intelligence requires data to develop its intelligence, particularly machine learning (Patrizio, 2018).  The data used in artificial intelligence and machine learning is already cleaned, with extraneous, duplicate and unnecessary data already removed, which is regarded to be the first big step when using Big Data and artificial intelligence (Patrizio, 2018).  CERN data center has accumulated over 200 petabytes of filtered data (Kersting & Meyer, 2018). Machine learning and artificial intelligence can take advantages of this filtered data leading to many breakthroughs (Kersting & Meyer, 2018).  An example of these breakthroughs includes genomic and proteomic experiments to enable personalized medicine (Kersting & Meyer, 2018).  Another example includes the historical climate data which can be used to understand global warming and to predict weather better (Kersting & Meyer, 2018).  The massive amounts of sensor network readings and hyperspectral images of plants is another example to identify drought conditions and gain insights into plant growth and development (Kersting & Meyer, 2018). 

Multiple technologies such as artificial intelligence, machine learning, and data mining techniques have been used together to extract the maximum value from Big Data (Luo, Wu, Gopukumar, & Zhao, 2016). Artificial intelligence, machine learning, and data mining have been used in healthcare (Luo et al., 2016).  Computational tools such as neural networks, genetic algorithms, support vector machines, case-based reasoning have been used in prediction (Mishra, Dehuri, & Kim, 2016; Qin, 2012) of stock markets and other financial markets (Qin, 2012). 

AI has impacted the business world through social media and the large volume of the collected data from social media (Mills, 2018).  For instance, the personalized content in real time is increasing to enhance the sales opportunities (Mills, 2018).   The artificial intelligence makes use of effective behavioral targeting methodologies (Mills, 2018).  Big Data improves customer services by making it proactive and allows companies to make customer responsive products (Mills, 2018).  The Big Data Analytics (BDA) assist in predicting what is wanted out of a product (Mills, 2018).  BDA has been playing a significant role in fraud preventions using artificial intelligence (Mills, 2018).  Artificial intelligence techniques such as video recognition, natural language processing, speech recognition, machine learning engines, and automation have been used to help businesses protect against these sophisticated fraud schemes (Mills, 2018).   

The healthcare industry has utilized the machine learning to transform the large volume of the medical data into actionable knowledge performing predictive and prescriptive analytics (Palanisamy & Thirunavukarasu, 2017).  The machine learning platform utilizes artificial intelligence to develop sophisticated algorithm processing massive datasets (structured and unstructured) performing advanced analytics (Palanisamy & Thirunavukarasu, 2017).  For a distributed environment, Apache Mahout (2017), which is an open source machine learning library, integrates with Hadoop to facilitate the execution of scalable machine learning algorithms, offering various techniques such as recommendation, classification, and clustering (Palanisamy & Thirunavukarasu, 2017).

Conclusion

Big Data has attracted the attention of various industries including academia, healthcare and even the government. Artificial intelligence has been around for some time.  Big Data offers various advantages to organizations from increasing sales, to reduce costs to health care.  Artificial intelligence also has its advantages, providing real-time analysis reacting to changes continuously.  The use of Big Data has empowered the artificial intelligence.  Various industries such as the healthcare industry are taking advantages of Big Data and artificial intelligence.  Their growing trend is increasingly demonstrating the realization of businesses to the importance of artificial intelligence in the age of Big Data, and the importance of Big Data role in the artificial intelligence domain.

References

Bean, R. (2018). How Big Data and AI Are Driving Business Innovation in 2018. Retrieved from https://sloanreview.mit.edu/article/how-big-data-and-ai-are-driving-business-innovation-in-2018/.

Kersting, K., & Meyer, U. (2018). From Big Data to Big Artificial Intelligence? : Springer.

Liang, Y., & Kelemen, A. (2016). Big Data Science and its Applications in Health and Medical Research: Challenges and Opportunities. Austin Journal of Biometrics & Biostatistics, 7(3).

Luo, J., Wu, M., Gopukumar, D., & Zhao, Y. (2016). Big data application in biomedical research and health care: a literature review. Biomedical informatics insights, 8, BII. S31559.

Mills, T. (2018). Eight Ways Big Data And AI Are Changing The Business World.

Mishra, B. S. P., Dehuri, S., & Kim, E. (2016). Techniques and Environments for Big Data Analysis: Parallel, Cloud, and Grid Computing (Vol. 17): Springer.

Palanisamy, V., & Thirunavukarasu, R. (2017). Implications of Big Data Analytics in developing Healthcare Frameworks–A review. Journal of King Saud University-Computer and Information Sciences.

Patrizio, A. (2018). Big Data vs. Artificial Intelligence.

Power, B. (2015). Artificial Intelligence Is Almost Ready for Business.

Qin, X. (2012). Making use of the big data: next generation of algorithm trading. Paper presented at the International Conference on Artificial Intelligence and Computational Intelligence.

Russell, S. J., & Norvig, P. (2016). Artificial intelligence: a modern approach: Malaysia; Pearson Education Limited.

Machine Learning: Logistic Regression

Dr. O. Aly
Computer Science

Introduction

The purpose of this discussion is to discuss and analyze the assumptions of the Logistic Regression, and the assumptions of the Regular Regression, which are not applicable to the Logistic Regression.  The discussion and the analysis also address the type of the variables in both the Logistic Regression and the Regular Regression. 

Regular Linear Regression:

Regression analysis is used when a linear model is fit to the data and is used to predict values of an outcome variable or dependent variable from one or more predictor variable or independent variables (Field, 2013).  The Linear Regression is also defined in (Field, 2013) as a method which is used to predict the values of the continuous variables, and to make inferences about how specific variables are related to a continuous variable.  These two procedures of the prediction and inference rely on the information from the statistical model, which is represented by an equation or series of equations with some number of parameters (Tony Fischetti, 2015).  Linear Regression is the most important prediction method for “continuous” variables (Giudici, 2005). 

With one predictor or independent variable, the technique is sometimes referred to as “Simple Regression” (Field, 2013; Tony Fischetti, 2015; T. Fischetti, Mayor, & Forte, 2017; Giudici, 2005).  However, when there are several predictors or independent variables in the model, it is referred to as “Multiple Regression” (Field, 2013; Tony Fischetti, 2015; T. Fischetti et al., 2017; Giudici, 2005).  In Regression Analysis, the differences between what the model predicts and the observed data are called “Residuals” which are the same as “Deviations” when looking at the Mean (Field, 2013).  These deviations are the vertical distances between what the model predicted and each data point that was observed.  Sometimes, the predicted value of the outcome is less than the actual value, and sometimes it is greater, meaning that the residuals are sometimes positive and sometimes negative.  To evaluate the error in a regression model, like when the fit of the mean using the variance is assessed, a sum of squared errors can be used using residual sum squares (SSR) or the sum of squared residuals (Field, 2013).  This SSR is an indicator of how well a particular line fits the data: if the SSR is large, the line is not representative of the data; if the SSR is small, the line is a representative of the data (Field, 2013).

When using the Simple Linear Regression with the two variables; one independent or predictor and the other is outcome or dependent, the equation is as follows (Field, 2013).

In this Regression Model, the (b) is the correlation coefficient (more often denoted as ( r )), and it is a standardized measure (Field, 2013). However, an unstandardized measure of (b) can be used, but the equation will alter to be as follows (Field, 2013):

This model differs from that of a correlation only in that it uses an unstandardized measure of the relationship (b) and consequently a parameter (b0) for the value of the outcome must be included, when the predictor is zero (Field, 2013).  These parameters of (b0) and (b1) are known as the Regression Coefficients (Field, 2013).

When there are more than two variables which might be related to the outcome, Multiple Regression can be used. The Multiple Regression can be used with three, four or more predictors (Field, 2013).  The equation for the Multiple Regression is as follows:

The (b1) is the coefficient of the first predictor (X1), (b2) is the second coefficient of the second predictor (X2), and so forth, as (bn) is the coefficient of the nth predictor (Xni) (Field, 2013). 

To assess the goodness of fit for the Regular Regression, the sum of squares, the R and R2 can be used.  When using the Mean as a model, the difference between the observed values and the values predicted by the mean can be calculated using the sum of squares (denoted SST) (Field, 2013).  This value of the SST represents how good the Mean is as a model of the observed data (Field, 2013).  When using the Regression Model, the SSR can be used to represent the degree of inaccuracy when the best model is fitted to the data (Field, 2013). 

Moreover, these two sums of squares of SST and SSR can be used to calculate how much better the regression model is than using a baseline model such as the Mean model (Field, 2013).  The improvement in prediction resulting from using the Regression Model rather than the Mean Model is measured by calculating the difference between SST and SSR (Field, 2013).  Such improvement is the Model Sum of Squares (SSM) (Field, 2013).  If the value of SSM is large, then the regression model is very different from using the mean to predict the outcome variable, indicating that the Regression Model has made a big improvement to how well the outcome variable can be predicted (Field, 2013). However, if the SSM is small then using the Regression Model is a little better than using the Mean model (Field, 2013).  Calculating the R2 by dividing SSM by SST to measure the proportion of the improvement due to the model.  The R2 represents the amount of variance in the outcome explained by the mode (SSM) relative to how much variation there was to explain in the first place (SST) (Field, 2013).  Other methods to assess the goodness-of-fit of the Model include the F-test using Mean Squares (MS) (Field, 2013), and F-statistics to calculate the significance of R2(Field, 2013). To measure the individual contribution of a predictor in Regular Linear Regression, the estimated regression coefficient (b) and their standard errors to compute a t-statistic are used (Field, 2013).

The Regression Model must be generalized as a generalization is a critical additional step, because if the model cannot be generalized, then any conclusion must be restricted based on the model to the sample used (Field, 2013).  For the regression model to generalize, cross-validation can be used (Field, 2013; Tony Fischetti, 2015) and the underlying assumptions must be met (Field, 2013).

Central Assumptions of Regular Linear Regression in Order of Importance

The assumptions of the Linear Model in order of importance as indicated in (Field, 2013) are as follows:

  1. Additivity and Linearity:  The outcome variable should be linearly related to any predictors, and with several predictors, their combined effect is best described by adding their effects together. Thus, the relationship between variables is linear.  If this assumption is not met, the model is invalid.  Sometimes, variables can be transformed to make their relationships linear (Field, 2013).
  2. Independent Errors:  The residual terms should be uncorrelated (i.e., independent) for any two observations, sometimes described as “lack of autocorrelation” (Field, 2013).  If this assumption of independence is violated, the confidence intervals and significance tests will be invalid.  However, regarding the model parameters, the estimates using the method of least square will still be valid but not optimal (Field, 2013).  This assumption can be tested with the Durbin-Watson test, which tests for serial correlations between errors, specifically, it tests whether adjacent residuals are correlated (Field, 2013).  The size of the Durbin-Watson statistic depends upon the number of predictors in the model and the number of observation (Field, 2013).  As a very conservative rule of thumb, values less than one or greater than three are the cause of concern; however, values closer to 2 may still be problematic, depending on the sample and model (Field, 2013).
  3. Homoscedasticity:  At each level of the predictor variable(s), the variance of the residual terms should be constant, meaning that the residuals at each level of the predictor(s) should have the same variance (homoscedasticity) (Field, 2013).  When the variances are very unequal there is said to be heteroscedasticity.  Violating this assumption invalidates the confidence intervals and significance tests (Field, 2013). However, estimates of the model parameters (b) using the method of least squares are still valid but not optimal (Field, 2013).  This problem can be overcome by using weighted least squares regression in which each case is weighted by a function of its variance (Field, 2013).
  4. Normally Distributed Errors:   It is assumed that the residuals in the model are random, normally distributed variables with a mean of 0.  This assumption means that the differences between the model and the observed data are most frequently zero or very close to zero, and that differences much greater than zero happen only occasionally (Field, 2013).  This assumption sometimes is confused with the idea that predictors have to be normally distributed (Field, 2013).  Predictors do not need to be normally distributed (Field, 2013). In small samples a lack of normality will invalidate confidence intervals and significance tests; in large samples, it will not, because of the central limit theorem (Field, 2013).  If the concern is only with estimating the model parameters and not with the significance tests and confidence intervals, then this assumption barely matters (Field, 2013).  In other words, this assumption matters for significance tests and confidence intervals.  This assumption can also be ignored if the bootstrap of confidence intervals is used (Field, 2013).

Additional Assumptions of Regular Linear Regression

There are additional assumptions when dealing with Regular Linear Regression.  These additional assumptions are as follows as indicated in (Field, 2013).

  • Predictors are uncorrelated with “External Variable,” or “Third Variable”  External variables are variables which have not been included in the regression model and influence the outcome variable. These variables can be described as “third variable.” This assumption indicates that there should be no external variables that correlate with any of the variables included int eh regression model (Field, 2013).  If external variables do correlate with the predictors, the conclusion that is drawn from the model become “unreliable” because other variables exist that can predict the outcome just as well (Field, 2013).
  • Variable Types: All predictor (independent) variables must be “quantitative” or “categorical,” and the outcome (dependent) variables must be “quantitative,” “continuous” and “unbounded” (Field, 2013). The “quantitative” indicates that they should be measured at the interval level, and the “unbounded” indicates that there should be no constraints on the variability of the outcome (Field, 2013).  
  • No Perfect Multicollinearity: If the model has more than one predictor then there should be no perfect linear relationship between two or more of the predictors.  Thus, the predictors (independent) variables should not correlate too highly (Field, 2013).
  • Non-Zero Variance: The predictors should have some variations in value; meaning they do not have variances of zero (Field, 2013).  

Logistic Regression

When the dataset has categorical variables as well as continuous predictors (independent), Logistic Regression is used (Field, 2013). Logistic Regression is multiple regression but with an outcome (dependent) variable that is categorical and predictor variables that are continuous or categorical.  Logistic Regression is the main prediction method for qualitative variables (Giudici, 2005).

Logistic Regression can have life-saving applications as in medical research it is used to generate models from which predictions can be made about the “likelihood” that, e.g. a tumor is cancerous or benign (Field, 2013). A database is used to develop which variables are influential in predicting the “likelihood” of malignancy of a tumor (Field, 2013). These variables can be measured for a new patient and their values placed in a Logistic Regression model, from which a “probability” of malignancy could be estimated (Field, 2013).  Logistic Regression calculates the “probability” of the outcome occurring rather than making a prediction of the outcome corresponding to a given set of predictors (Ahlemeyer-Stubbe & Coleman, 2014). The expected values of the target variable from a Logistic Regression are between 0 and 1 and can be interpreted as a “likelihood” (Ahlemeyer-Stubbe & Coleman, 2014).

There are two types of Logistic Regression; Binary Logistic Regression, and Multinomial or Polychotomous Logistic Regression.  The Binary Logistic Regression is used to predict membership of only two categorical outcomes or dependent variables, while the Multinomial or Polychotomous Logistic Regression is used to predict membership of more than two categorical outcomes or dependent variables (Field, 2013).

Concerning the assessment of the model, the R-statistics can be used to calculate a more literal version of the multiple correlations in the Logistic Regression model.  The R-statistic is the partial correlation between the outcome variable and each of the predictor variables, and it can vary between -1 and +1. A positive value indicates that as the predictor variable increases, so does the likelihood of the event occurring, while the negative value indicates that as the predictor variable increases, the likelihood of the outcome occurring decreases (Field, 2013).  If a variable has a small value of R then, it contributes a small amount to the model.  Other measures for such assessment include Hosmer and Lemeshow, Cox and Snell’s and Nagelkerke’s (Field, 2013). All these measures differ in their computation, conceptually they are somewhat the same, and they can be seen as similar to the R2 in linear regression regarding interpretation as they provide a gauge of the substantive significance of the model (Field, 2013).

In the Logistic Regression, there is an analogous statistics, the z-statistics, which follows the normal distribution to measure the individual contribution of predictors (Field, 2013). Like the t-tests in the Regular Linear Regression, the z-statistic indicates whether the (b) coefficient for that predictor is significantly different from zero (Field, 2013). If the coefficient is significantly different from zero, then the assumption can be that the predictor is making a significant contribution to the prediction of the outcome (Y) (Field, 2013). The z-statistic is known as the Wald statistic as it was developed by Abraham Wald (Field, 2013).  

Principles of Logistic Regression

One of the assumptions mentioned above for the regular linear models is that the relationship between variables is linear for the linear regression to be valid.  However, when the outcome variable is categorical, this assumption is violated as explained in the “Variable Types” assumption above, because and the outcome (dependent) variables must be “quantitative,” “continues” and “unbounded” (Field, 2013).  To get around this problem, the data must be transformed using the logarithmic transformation).  The purpose of this transformation is to express the non-linear relationship into a linear relationship (Field, 2013). However, Logistic Regression is based on this principle as it expresses the multiple linear regression equation in logarithmic terms called the “logit” and thus overcomes the problem of violating the assumption of linearity (Field, 2013).  The transformation logit (p) is used in Logistic Regression with the letter (p) representing the probability of success (Ahlemeyer-Stubbe & Coleman, 2014).  The logit (p) is a non-linear transformation, and Logistic Regression is a type of non-linear regression (Ahlemeyer-Stubbe & Coleman, 2014).

Assumptions of the Logistic Regression

In the Logistic Regression, the assumptions of the ordinary regression are still applicable.  However, the following two assumptions are dealt with differently in the Logistic Regression (Field, 2013):

  • Linearity:  While in the ordinary regression, the assumption is that the outcome has a linear relationship with the predictors, in the logistic regression, the outcome is categorical, and so this assumption is violated, and the log (or logit) of the data is used to overcome this violation (Field, 2013). Thus, the assumption of linearity in Logistic Regression is that there is a linear relationship between any continuous predictors and the logit of the outcome variable (Field, 2013). This assumption can be tested by checking if the interaction term between the predictor and its log transformation is significant (Field, 2013).  In short, the linearity assumption is that each predictor has a linear relationship with the log of the outcome variable when using the Logistic Regression.
  • Independence of Errors:  In the Logistic Regression, violating this assumption produces overdispersion, which can occur when the observed variance is bigger than expected from the Logistic Regression model.   The overdispersion can occur for two reasons (Field, 2013). The first reason is the correlated observation when the assumption of independence is broken (Field, 2013). The second reason is due to variability in success probabilities (Field, 2013). The overdispersion tends to limit standard errors, which creates two problems. The first problem is the test statistics of regression parameters which are computed by dividing by the standard error, so if the standard error is too small, then the test statistic will be too big and falsely deemed significant.  The second problem is the confidence intervals which are computed from standard errors, so if the standard error is too small, then the confidence interval will be too narrow and result in the overconfidence about the likely relationship between predictors and the outcome in the population. In short, the overdispersion occurs when the variance is larger than the expected variance from the model.  This overdispersion can be caused by violating the assumption of independence.  This problem makes the standard errors too small (Field, 2013), which can bias the conclusions about the significance of the model parameters (b-values) and population value (Field, 2013).

Business Analytics Methods Based on Data Types

In (Hodeghatta & Nayak, 2016), the following table summarizes the business analytics methods based on the data types.  As shown in the table, when the response (dependent) variable is continuous, and the predictor variables is either continuous or categorical, the Linear Regression method is used.  When the response (dependent) variable is categorical, and the predictor variables are either continuous or categorical, the Logistic Regression is used.  Other methods are also listed as additional information.

Table-1. Business Analytics Methods Based on Data Types. Adapted from (Hodeghatta & Nayak, 2016).

References

Ahlemeyer-Stubbe, A., & Coleman, S. (2014). A practical guide to data mining for business and industry: John Wiley & Sons.

Field, A. (2013). Discovering Statistics using IBM SPSS Statistics: Sage publications.

Fischetti, T. (2015). Data Analysis with R: Packt Publishing Ltd.

Fischetti, T., Mayor, E., & Forte, R. M. (2017). R: Predictive Analysis: Packt Publishing.

Giudici, P. (2005). Applied data mining: statistical methods for business and industry: John Wiley & Sons.

Hodeghatta, U. R., & Nayak, U. (2016). Business Analytics Using R-A Practical Approach: Springer.