Yes, you can understand AI

AI. Deep Learning. Neural Networks. All buzzwords that spread like wildfire in the last few years. From self-driving cars to personalized assistants, AI seems to carve its place. However, this dramatic rise is far from sudden. The technology has been a long time in the making, with decades of hard work and research leading to iterative improvements. For many outside the sometimes obscure and confusing research field, it can be difficult to grasp how we arrived at this point. But, I am confident that beneath a layer of jargon and formulas, the concepts underlying most of AI are at the reach of pretty much anyone with sufficient interest and time. For those ready for that adventure, I have compiled five landmark articles to give you a glimpse on how exactly those infamous models are built. These papers all stand out for their impact, clarity and usefulness to beginners. For each, I will also point out what figures and sections/paragraphs offer best a foundational understanding of the journey from theoretical wanderings to concrete real-world applications. Do note that I will focus on imaging models, as these tasks lead to visual results which should ease you in. Without further ado, let‘s go back to 1986.

Learning representations by back-propagating errors - David E. Rumelhart, Geoffrey E. Hinton & Ronald J. Williams (1986)

Before the 1980s, artificial intelligence was, for the most part, dormant for years. Some pivotal progress in information technology in the first half of the 20th century set the field in motion (with the research of Shannon and Turing to name only those two). However, the interest in artificial intelligence waned mid-century as it was considered too computationally expensive to be practical. But as the computing resources increased steadily, excitement renewed. This article marked a clear bound forward and a good choice to start our overview.

This small and innocent-looking paper introduced the concept of backpropagation. This technique allows complex neurons (the basic units of neural networks) to learn rich patterns in data by adjusting their internal parameters (called weights) through error correction. Although mathematically simple, this technique became the cornerstone of AI models to come by minimizing computing costs. Image recognition, natural language processing and AI assistant models (almost) all use this trick. Backpropagation allowed AI models to scale up precisely at the time that computers were starting to be able to support them. At that time, AI was a niche. The original authors could not predict what their work would lead to. A must-read.

Oh, and Geoffrey Hinton: remember his name.

If you only check pictures: Figure 5 shows a simple neural network with back-propagation, pretty much the basis for all models to come.

If you only read one section: Honestly, the first paragraph (after the abstract) sums up the technique quite well.

Most insightful and jargon-free sections: The article is short and relatively straightforward. Worth a complete read-through.

ImageNet Classification with Deep Convolutional Neural Networks - Alex Krizhevsky, Ilya Sutskever & Geoffrey E. Hinton (2012)

Our next big jump came in 2012. At that point, ImageNet, a well-known database of labelled images in thousands of categories, was out for some time and used by researchers to train their models. The ImageNet Large Scale Visual Recognition Challenge (ILSVRC) was an annual competition comparing researchers' model's accuracy on ImageNet. At that time, models in the style of the previous article were commonplace without much progress. For a few years, top scores for the competition were stagnating. Then Alex Krizhevsky made AlexNet. In a few words: AlexNet blew its competition out of the water. What made this model so successful was mainly twofold. First, the model used more “layers“ (groups of nodes placed in succession) than its competition. Second, they trained the heavier model using multiple GPUs to accelerate the process. AlexNet was not the first so-called “deep“ neural network, but its success showcased that deep neural networks could scale and deliver impressive performance in a realistic time. Frankly, this ground-breaking article has much more to offer. This includes the rectified linear unit (ReLU) and techniques such as dropout. The power of deep CNNs sparked a wave of research, leading to rapid advancements in AI across various fields in the 2010s. Oh and look who is back: Geoffrey Hinton.

If you only check pictures: Figure 2 to appreciate the leap in depth compared to Rumelhart (1986). Table 1 shows the error rate compared to the other BEST results in the competition. AlexNet won hands down on that metric. Figure 4 (left side) shows what would be expected from such an image recognition model during ILSVRC.

If you only read one section: You could read the first paragraph of the discussion.

Most insightful and jargon-free sections: The second and fourth paragraphs of the introduction complete the picture quite well. Section 3.2 shows the usage of multiple GPUs for the training, an approach that would become standard.

Attention Is All You Need - Google Brain & Google Research (2017)

For our next stop in 2017, we shortly leave imaging tasks for language comprehension tasks or, as the field is known, Natural language processing (NLP). Vaswanu and his colleagues would introduce a new architecture to the world: the Transformer. Unlike previous models that relied on neural networks to process text, the Transformer completely discards recurrence and uses a mechanism called self-attention to process chunks of input text in parallel instead. This led to dramatically improved efficiency and performance. Before self-attention, NLP models were limited by the fact that the importance (or weight) of the relation between words in a sequence was dominated by the distance between them. In that way, models had difficulties interpreting sentences such as “The cat the dog chased ran away.“. Models integrating self-attention were alleviated from this restriction and could more efficiently process complex sentences. If you heard of models such as BERT or GPT, this architecture is what they use. The Transformer would not be confined to NLP tasks. Clever researchers quickly found ways to use it for other tasks, including imaging. This article is a bit more technical but is so pivotal that you need to give it a chance.

If you only check pictures: Figure 1 for the broad perspective. Figure 2 (left side) to better understand the self-attention mechanism. Table 1 to see at a glance why it is an efficient structure.

If you only read one section: Make it section 4 (paragraphs 2 and 3).

Most insightful and jargon-free sections: Section 3.1 (on encoder-decoder structure) and 3.2 on the attention mechanism (NOT 3.2.1, 3.2.2, etc. - just the paragraph before 3.2.1 suffice) are straightforward enough and useful. Paragraphs 2 and 3 of section 4 summarize the advantages of self-attention quite well. Why not the conclusion too?

Denoising Diffusion Probabilistic Models - Jonathan Ho, Ajay Jain & Pieter Abbeel (2020)

We then go a few years later to another groundbreaking technique. This time, it is in image generation. Different techniques were explored in the 2010s for image generation (for example, GANs and VAEs), but diffusion models (DDPMs) are what this paper introduced. These models exploit the fact that we know mathematically how to remove noise from an image. This led Jonathan Ho and Ajay Jain (under the supervision of Pieter Abbeel) to ponder what could be achieved if you started with pure noise and asked a model to gradually denoise it. They learned that they could trick the model into producing coherent and detailed images. In other words, they asked their denoising model to find a dog in complete noise and it produced one. As I said, DDPM is one of many techniques used to generate images. They deserve your attention because they are easier to understand in general principle than most generative techniques and are currently more widely used than their competition. To this day, DDPMs stand out for the diversity of high-quality image generation they offer. This paper is also an excellent way to introduce the growing concerns arousing AI. In my mind, the section called “Broader Impact“ is a must-read.

If you only check pictures: Figures 6 and 7 to better understand the denoising process.

If you only read one section: The section called “Broader Impact“ is perfect here.

Most insightful and jargon-free sections: A bit more jargon in this one for the initiated. Paragraph 3 of the introduction can still be useful. Feel free to read it entirely. Do not be discouraged by references to Markov chains or model types (such as GANs).

Explaining and Harnessing Adversarial Examples - Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy (2015)

For our last paper, I want to leave model architecture innovations to focus more on the evolving relationship between researchers and their models. As the capacities of AI became apparent, concerns grew and it came into scrutiny in itself. This article recommendation was a trendsetter in that regard. At face value, this paper also introduces a new training technique for models, but its ramifications become quickly apparent and are in reality the main focus. Ian Goodfellow and his team realized that tailored inputs could easily mislead models into making incorrect predictions even when the change is indistinguishable to our eye. This reminds us that AI, however powerful it is, can be vulnerable. This pioneered a wave of research reconsidering the robustness and trustworthiness of artificial intelligence. Defence measures would quickly follow to guarantee it. This article would also be our first dip into the booming field of artificial intelligence ethics and privacy concerns (another conundrum in itself). Artificial intelligence is a tool. We need to be aware of its limitations, just like any other tool. Would-be AI enthusiasts should know about such concerns. As such, I wholeheartedly recommend this paper.

Hey, look who‘s back in the acknowledgements.

If you only check pictures: Figure 1 is impactful and to the point.

If you only read one section: The introduction in its entirety is excellent.

Most insightful and jargon-free sections: Section 3 is also great. The first paragraph of section 8 might spark your interest enough to read the article entirely. A great and somewhat accessible read.

What about now?

I appreciate that you joined me in this journey through artificial intelligence research and development. I hope it was not a daunting task, might I risk, even an enjoyable one. Like AI training, progress is an iterative process, with each breakthrough building on top of the other.

However, machine learning evolves at a fast pace, with new models, algorithms, and applications emerging almost daily. I don‘t know how long this overview will stay relevant and accept it. If you urge for more, I encourage you to dive deeper. For those among you, I suggest starting on paperswithcode.com, a true godsend in the field. As for my part, that will be all for today.

See you next time,

Articles referenced

Learning representations by back-propagating errors, David E. Rumelhart, Geoffrey E. Hinton & Ronald J. Williams (1986) here
ImageNet Classification with Deep Convolutional Neural Networks, Alex Krizhevsky, Ilya Sutskever & Geoffrey E. Hinton (2012) here
Attention Is All You Need - Google Brain & Google Research (2017) here
Denoising Diffusion Probabilistic Models, Jonathan Ho, Ajay Jain & Pieter Abbeel (2020) here
Explaining and Harnessing Adversarial Examples, Ian J. Goodfellow, Jonathon Shlens & Christian Szegedy (2015) here