LTP 112: AI Part 1 — What is it?

Panel

Bart Busschots (host) – @bbusschots – Flickr

In this solo show Bart starts what will be at least a two-part discussion on artificial intelligence, and what it means for photography and photographers. This first part starts with the easy bit — describing what AI is, and what it does with our images today. If you’ve ever wondered what the difference is between AI and machine learning, or what neural networks are, this is the show for you!

While this podcast is free for you to enjoy, it’s not free for Bart to create. Please consider supporting the show by becoming a patron on Patreon.

Reminder – you can submit questions for future Q & A shows at http://lets-talk.ie/photoq

Notes

This will be the first of a two-part series on Artificial Intelligence (AI) in photography. In this first part I’ll focus on describing what AI is, and how it’s being applied in photography, and next month we’ll dig into the deep ethical and philosophical questions AI raises for photography.

What is AI?

AI is a very broad umbrella term that covers computers doing pretty much anything that looks like thinking. The holy grail of the field is so-called general AI — a single system that can ‘think’ in all the ways humans can across all fields of knowledge. A general AI could understand and comprehend language, sounds, and still and moving images. It could build up a complex understanding of concepts and the relationships between them by extracting information from the world, and it could use reasoning to extrapolate from this understanding. And finally, it could leverage its knowledge in creative ways to build genuinely original art. If you think that sounds like it might be tricky to pull off, you’d be dead right! General AI is to computer science as cold fusion is to physics — 30 years away, always has been, and likely always will be!

What we’ve been making spectacular strides in over the past few years are special-purpose AIs — machines that demonstrate specific kinds of intelligence within well defined fields.

Very broadly speaking we now have AIs that can recognise things, tweak things, and most recently, even create things. Or, to be more formal about it, we have classifiers, processors, and generative AIs.

What About ML?

An AI-related term you see a lot these days, even in app UIs and marketing fluff is ML or Machine Learning. This is another very generic term, but it’s been hijacked and abused to the point that it’s becoming synonymous with Artificial Neural Networks, which it really shouldn’t.

To understand what ML is, we need to start right back in the early days of AI when we were trying to directly program intelligent systems. On the whole this didn’t prove very successful, with so-called expert systems being the best we could muster. An expert system is basically a collection of deeply nested if statements. Before we started to use smartphone apps to identify plants, paper analogues of a expert systems were popular — these were little pocket sized books for identifying trees, flowers, butterflies, birds, or similar. The book would start with a question with a handful of possible answers, and each answer would redirect you to another page of the book where you’d find either another question, or your answer.

Anyway, it soon became clear that expert systems could give a speed boost over what came before, but they weren’t going to give computers fundamentally new abilities, let alone lead to general AI. So, computer scientists switched to a completely different approach — don’t teach the computer the stuff you want it to know, teach it how to learn, and then let it teach itself. Eh voi la — Machine Learning was born!

Where do Neural Networks Come in?

One of the most effective techniques for implementing ML was, and remains, imitating the human brain with artificial neurons connected to each other with artificial synapses. We call these systems Artificial Neural Networks, or simply neural nets for short.

If you’re looking for a mental image, think of a very interconnected grid of nodes with an input signal applied to one end, and an output signal read out at the other. Each node represents a neuron, and each connection a synapse. Each synapse either amplifies or de-amplifies the signal passing along it. We refer to this alteration of the signal along a connection as the connection’s weight. You train the neural net by tweaking all the weights until you get useful outputs for your inputs. Assuming you have a big set of inputs with matching expected outputs (or a rule for grading the quality of outputs), you can use various algorithms to have the computer optimise the weights automatically. The result of all this work is a collection of weights. If you put those same weights into any neural net with the layout you’ll get the same behaviour. A collection of learned weights like this is often referred to as a model. It can take thousands of hours on super computers to build a good model, but once you have that model you can use it on just about any computing device.

I want to draw your attention to a very important fact — the collection of weights are in effect the program for the neural net, but unlike traditional computer code, it’s utterly meaningless to us humans. We can measure the effect a neural net has on various inputs, but we can’t extract any sort of intelligible reasoning from it. We simply can’t say why Lightroom has decided your cat looks like your grandmother! This absolute opacity has some very serious consequences, so stick a pin in it for part two!

While we’re sticking pins in things for next time, let’s also mention that the effectiveness of a model is entirely dependent on the quality of the training set and the learning algorithm used. If the training data covers a genuinely representative sample of the real world your neural net will probably do well on real-world data, but if not, it won’t. The old computer science truism of garbage in → garbage out really applies here. Also, because we have zero understanding of exactly what the neural net has ‘learned’ there’s a real danger of if learning the wrong thing. If your training data isn’t broad enough, you run the very real risk of your AI doing something spectacularly stupid when it meets something not covered by its training set. Finally, you can over-train your model so it goes beyond learning abstract concepts, and instead effectively memorises the training set, making it utterly useless in the real world!

In other words, Machine Learning is hard!

At first we implemented neural nets entirely in software, like a kind of brain simulator. This works surprisingly well, but dedicated hardware always trumps software, hence the existence of graphics cards! These days we’ve started to implement neural nets in hardware on silicon chips. When Apple tell you they’ve added so many neural engine cores to their latest processor, they’re talking about artificial neural nets right on their chips.

Rather annoyingly, Apple and others use the term ML to refer to artificial neural nets. Yes, all ANNs are ML, but not all ML is implemented as neural nets

AI in Photography

We’ll finish this first instalment by looking at how AI is being used in Photography.

It Started with Classifiers

The first AI that made it way into main-stream photography were so-called classifiers — algorithms for recognising things in images. The first widely deployed classifiers were the facial recognition Apple and then Adobe added into their photography apps. The apps would find faces in your photographs, prompt you to assign names to them, and then find those same people in other photos and build up little smart albums for each person. This simplistic feature was actually surprisingly difficult to implement, and involved a lot of serious computer science. It’s worth pausing for a moment to dig into the details because it’s illustrative of the kind of complex engineering work that underpins all AI features.

These face classifiers are built in layers, with each layer being responsible for a different aspect of the problem. First, there’s a simple classifier that’s been trained to find just one thing — faces. The chances are this was implemented as a neural net. The output of this layer was a collection of rectangles that included faces. This was then passed to the second layer which analysed the structure of each face to build an approximate fingerprint of the face based on various ratios. Finally, a third layer looked at all the fingerprints and tried to identify clusters of similar fingerprints, and group them together on the assumption they were the same person.

When you ask the Apple’s S-lady or Amazon’s A-lady to do something there’s a similar layered approach at work — first sound is translated to words, those words are then passed to a language model to try extract their meaning, that’s then passed to another layer to find an answer, and that answer is passed to a final text-to-speech layer to produce the output you hear. What’s truly impressive is that all that happens in just seconds! So, next time your smart assistant does something dumb, just remember how impressive it is that it can do anything at all!

Getting back to photography, we now have classifiers that can’t just answer simple questions like “face or not”, but can scan photos for all the various subjects they contain and classify each as people, animals, cars, mountains, and so on. Again, this is all implemented in layers — the first layer finds the subjects in a photo, the second layer identifies each subject as belonging to a broad class, and the for some classes another layer can give a deeper classification still. For example, a photo of a daffodil and a crocus would first be broken into two subjects, these would both then be classified as flowers, and finally as a specific species of flower. At this stage our classifiers are getting good enough that we can do meaningful text searches of the photos in our smartphone photo libraries!

Then Came Image Processing Models

After classifiers came image processing models — neural nets trained to enhance images. Apple relies very heavily on its neural engines for processing images on its smartphones. Initially the models were simply trained to enhance images as a kind of black box — you’d click the magic wand, and the photo would change, but the change would be atomic. Now, Apple have made their neural nets a little smarter by training them to move the sliders to make the image better rather than just making the image better directly. This means we can fine-fine the AI’s work, making the whole experience much more powerful.

And in 2022 The AIs Became Creative!

2022 will go down in AI history as the year generative AI went main-stream. Classifiers could simply tell us things about our images, processors could make our images better, but generative AI can build original images to-order — you simply give it some text describing what you want, and it will create an image for you. Many of these systems will allow you to have an actual conversation with the model and keep refining the image with more so-called prompts until you’re happy.

I can’t even begin to describe how many layers there must be in system like this — they’re so advanced as to pretty much indistinguishable from magic by all but the most expert specialists!

It’s hard figure out exactly what all this means for the future of photography, but the one thing I’m sure about is that the effects will be profound!

To be Continued!

Now that we have a basic understanding of what AI is, we’re ready to get stuck into the deeper questions — what good can AI bring to photographers, what threats does it bring, and how will our wider society deal with this rapidly evolving new reality? All that’s on the agenda for part 2 next month, so tune in then!

Let's Talk Apple & Let's Talk Photography