The Ultimate Data & AI Guide

150 FAQs About Artificial Intelligence, Machine Learning and Data

Synopsis of the book

Trust us, you are not alone. We have all been lost in the buzzword jungle resulting from the hype around data and artificial intelligence. Big data, machine learning, Hadoop, GDPR, deep learning, data governance – the list of things to know is endless. It is difficult to stay on top of all the exciting developments and concepts in the field.

This book is here to help. It is the result of our work as data consultants, implementing and consulting on 500+ data projects at 100+ European companies in all sectors. This book will equip you with a solid understanding of the most important topics around data, AI and machine learning. Everything you need to know is simply explained in 150 FAQs accompanied by plenty of clarifying visualizations.

Whether you are a professional working with data or you just want to gain an overview of these topics that are increasingly shaping our societies and economies, this is the ultimate guidebook for your journey through the data and AI buzzword jungle.

Who is this book for?

The Ultimate Data & AI Guide is for you if you are:

  • Interested in AI, machine learning and big data and want to understand how they are increasingly shaping our societies and economies - no previous knowledge required!
  • Working with data in your job and want to know how to leverage its power with machine learning and AI
  • A data professional looking for inspiration for how you can drive digital transformation in your field
  • Confused (and slightly annoyed) by all the buzzword bingo and finally want to clear out all of this confusion

What's in it for you?

With the Ultimate Data & AI Guide you will

  • Gain an overview and knowledge of the most important concepts around artificial intelligence, big data and machine learning.
  • Understand complex theoretical topics through simple explanations, examples and illustrations.
  • Be inspired by case studies of companies that are leveraging data and AI.
  • Get insights into real-world experiences from 500+ implemented data projects.

The book in numbers

Summary statistics
Total number of words
131,502
Number of FAQs
150
Total number of illustrations
119
Total number of tables
126
Total number of case studies
63
Average number of words per FAQ
877
Average reading time per FAQ
3min 55sec
Total reading time
9h 45min
data-ai-book-wordcount

Browse the table of contents

1_1 Digital transformation 

1 | What is digital transformation?  36

2 | What is the impact of digital transformation on companies and society?  38

3 | What are the drivers of digital transformation?  40

1_2 The role of data and AI in digital transformation 

4 | AI – Why is it the engine of digital transformation?  41

5 | Data – why is it the fuel of digital transformation?  42

6 | How are data and AI applied to generate value across industries?  43

1_3 Buzzwords in digital transformation, data and AI

7 | Overview of buzzwords in data and AI 46

8 | What is the IoT and what does it have to do with big data?  47

9 | What are data lakes, data warehouses, data architectures, Hadoop and NoSQL databases?  48

10 | What are data governance and data democratization?  48

11 | What is the cloud?  49

12 | What are data science, data analytics, business intelligence, data mining and predictive analytics?  49

13 | What are machine learning, neural networks and deep learning?  51

14 | What are AI, natural language processing, computer vision and robotics?  52

2_1 Understanding data 

15 | What is data?  54

16 | Why collect data and what are the different types of data analytics?  60

17 | How is data created?  62

18 | What are the factors that have enabled an era of mass data creation and storage?  64

19 | What is data quality and what kind of data quality issues are there?  68

20 | How much data quality do you need?  70

2_2 Types of data 

21 | What are unstructured, semi-structured and structured data?  72

22 | What are master data and transactional data?  76

23 | What is streaming data and what is the difference between batch and streaming processing?  77

24 | What is big data?  78

3_1 Understanding data storage 

25 | Why can’t a company store its structured data in an Excel file like we do on PCs?  82

26 | What is a database and how does it work?  83

27 | What are the advantages of storing data in a database?  85

28 | What types of databases are there and how are they classified?  86

3_2 Relational (SQL) databases 

29 | What is a relational database system and how does it work?  87

30 | How does the relational model work?  90

31 | What is a key attribute and why is it indispensable?  93

32 | How is data accessed and manipulated in a relational database system (SQL)?  96

33 | What are the strengths of relational database systems?  100

34 | What are the limitations of relational database systems and how were they revealed with the dawn of big data?  101

3_3 Distributed file systems and non-relational (NoSQL) databases 

35 | What are computer clusters and how did the idea of “scaling out” form the basis for storing and processing big data?  104

36 | What are distributed file systems and how do we store data with them?  106

37 | What are non-relational (NoSQL) databases and what does the CAP theorem have to do with them?  108

38 | How do relational and non-relational databases compare and when is it best to use each one?  114

3_4 Popular data storage technologies 

39 | What are the types of data storage technologies?  116

40 | What are Hadoop and the Hadoop Ecosystem (e.g. Hive, HBase, Flume, Kafka)?  117

41 | What is Spark?  119

42 | What are MySQL, PostgreSQL, Oracle, Microsoft SQL Server, SAP HANA, IBM Db2 and Teradata Database?  120

43 | What are MongoDB, Neo4j, Amazon DynamoDB, CouchDB and Redis?  121

4_1 Understanding data architectures 

44 | What is a data architecture and why do companies need it?  122

45 | What are the most popular architectural blueprints?  125

4_2 Data warehouse architectures 

46 | What is a data warehouse (DWH) architecture?  125

47 | How does a DWH work?  128

48 | What does a typical data pipeline in a DWH look like?  131

49 | What are the limitations of a DWH?  132

50 | What are popular ETL tools?  134

4_3 Data lakes and streaming architectures 

51 | What is a data lake architecture?  135

52 | How does a data lake work and where should it be used?  137

53 | How do a DWH and data lake compare?  139

4_4 Cloud architectures

54 | What is the cloud?  142

55 | What types of cloud architectures are there?  144

56 | What types of cloud services are there?  146

57 | What are the advantages and disadvantages of using cloud services?  149

58 | What is a serverless architecture?  153

59 | What are the popular cloud providers and services?  154

5_1 People and job roles 

60 | What does a chief data and analytics officer do?  157

61 | What does a data architect do?  158

62 | What does a database administrator do?  159

63 | What other job roles are involved in creating and maintaining a data architecture?  159

5_2 Data governance and Democratization 

64 | What are data governance and democratization and why does data need to be governed and democratized?  160

65 | What are the key elements of data governance and data democratization?  161

66 | How can we make data more findable and accessible?  162

67 | How can we make data more understandable and share knowledge on data?  163

68 | How can we make data more trustworthy and improve the quality of data?  164

69 | How can we empower the data user with self-service BI and analytics?  164

70 | How can data governance and data democratization be implemented?  165

5_3 Data security and protection (privacy)

71 | What is an overview of data security, data protection and data privacy and how do they relate to each other?  166

72 | What is data security and how can it be achieved?  167

73 | What is personal data?  170

74 | What is data protection (privacy) and why is the distinction between non-personal and personal data so important?  172

75 | General Data Protection Regulation (GDPR) – who, what, where and why?  174

6_1 Understanding AI and ML 

76 | What is AI?  181

77 | Where can AI be applied and how have approaches to create AI developed over time?  184

78 | What is currently possible with AI and what are some top breakthroughs?  186

79 | Why is AI almost tantamount to ML (AI = ML + X) today?  189

80 | What is ML and how can it create AI?  191

81 | How is a machine able to learn and why is ML often considered “Software 2.0”?  193

82 | What is a machine able to learn – can it predict the future?  196

6_2 Types of ML

83 | What types of ML are there and how do they differ?  201

84 | What is supervised ML?  204

85 | What is the difference between regression and classification?  206

86 | What is unsupervised ML?  207

87 | What are the most commonly used methods in unsupervised learning?  208

88 | What is reinforcement learning?  212

6_3 Popular ML tools 

89 | What types of ML tools are there?  215

90 | What is Python?  217

91 | What is R and RStudio?  218

92 | What is scikit-learn?  218

93 | What are Tensorflow and Keras?  219

94 | What are MLLib, PySpark and SparkR?  219

95 | What are some popular cloud-based ML tools?  220

7_1 Creating a machine learning model with supervised ML methods 

96 | What ingredients do you need and what is the recipe for creating an ML model?  221

97 | What is an ML model?  223

98 | What is a correlation and why is it necessary for ML models?  226

99 | What is feature engineering and why is it considered “applied ML”?  230

100 | What is feature selection and why is it necessary?  233

101 | Why do we need to split a dataset into training, validation and test sets?  237

102 | What does it mean to “train an ML model” and how do you do it?  240

7_2 Validating, testing and using a model

103 | What does it mean to “validate a model” and why is it necessary?  244

104 | What is the difference between validating and testing a model and why is the latter necessary?  247

105 | What are overfitting and generalization?  250

106 | Preventing overfitting: how does cross-validation work?  253

107 | Preventing overfitting: how does ensemble learning work?  254

108 | How else can overfitting be prevented?  256

109 | How much data is needed to train an ML model?  257

8_1 Some classic ML models

110 | What model classes are there in ML?  259

111 | How do generalized linear models work?  260

112 | How do decision trees work?  261

113 | How do ensemble methods such as the random forest algorithm work?  262

114 | How do we choose the right ML model?  264

8_2 Neural networks and deep learning 

115 | What are neural networks and deep learning and why do they matter?  265

116 | How do neural networks work?  269

117 | What is so special about deep neural networks compared to classic ML model classes?  271

118 | Why are neural networks so good at natural language processing and computer vision?  274

119 | Are neural networks a universal cure for all ML problems or do they also have some drawbacks?  277

120 | What is transfer learning?  280

121 | Deep neural networks – why now and what will their future look like?  281

9_1 Phases of an ML project

122 | How does the ML process work (an overview)?  284

123 | Phase 1: How can ML use cases be identified?  286

124 | Phase 2: What are data exploration and data preparation and why are they necessary?  288

125 | Phase 3: What is model creation?  295

126 | Phase 4: What is (continuous) model deployment?  295

9_2 Lessons learned from machine-learning projects

127 | How long does a machine-learning project take from the conception of the idea until the model is deployed?  298

128 | How many projects make it from the idea to the end and where do they fail?  299

129 | What are the most common reasons why projects fail?  300

130 | Why is model deployment the bottleneck for most companies implementing ML projects?  301

9_3 People and job roles in ML

131 | Which roles are required to implement an ML project?  303

132 | What does a data scientist do?  305

133 | What does a data engineer do?  305

134 | What does an ML engineer do?  306

135 | What does a statistician do?  307

136 | What does a software engineer do?  307

137 | What does a business analyst do?  308

138 | What do other roles do?  309

9_4 Agile organization and ways of working 

139 | What is agile project management and why is it appropriate for ML projects?  309

140 | What are DevOps and DataOps?  313

141 | What are the popular organizational structures and best practices?  315

9_5 Data ethics in ML 

142 | What is data ethics?  318

143 | What are the ethical considerations in data collection?  319

144 | What are the ethical considerations when creating ML models?  321

145 | What best practices and principles can ensure the ethical use of data?  323

146 | How are AI and its drivers going to develop?  327

147 | What are the implications of ML and AI for companies?  330

148 | We benefit a lot from AI, but will it cost me my job?  332

149 | Which nation will win the AI race?  335

150 | When are we going to see the creation of general AI?  339

Why we wrote this book

Let’s be honest. It is not like there isn’t any information about artificial intelligence, machine learning and data out there. The opposite is the case. As of December 2019, a Google search for these terms yielded around 455 million, 1,110 million and 6,100 million results respectively. On top of that there are plenty of books, vlogs and videos out there – more than enough, right?

There is a catch. We’ve seen it in the 500+ data projects that we have implemented and consulted for at 100+ European companies in all sectors over the past eight years. The information is either too narrow and deep (à la “Creating AI with deep reinforcement learning in Keras”)[1] or it is too shallow and striking (à la “AI-fuelled technology is about to take your job”)[2]. Consequently, there exists widespread fear, confusion and misconceptions about these topics. Very few are what we call data natives, i.e. people who have a solid understanding of AI, machine learning and data without being in-depth experts.

To empower our clients and fill that knowledge gap, we started implementing hands-on corporate training in 2017. Since then, we have tested, refined and honed the content of this course and shared our knowledge with hundreds of data and AI enthusiasts. This book is the result of all this training. Here, we pull together the essential information you need to know about artificial intelligence (AI), machine learning (ML) and data in one comprehensive, easy-to-understand book. This is the ultimate guide to data and AI.

This book is for those who want to gain an understanding that goes beyond scratching the surface, who want to know what they are talking about when playing the buzzword bingo of AI, ML and data. With this guide, you can spare yourself the minutiae of how algorithms and databases work to a level that you could program them on your own and simply learn the essentials. All you need to know, wrapped up in 150 easily navigable FAQs.

[1] Don’t worry – once you have worked through our book, you will understand what we are talking about.

[2] Whether or not this is true is covered in Chapter 148.