LTP 125: Image Provenance with Content Credentials

Panel

Bart Busschots (host) – @bbusschots – Flickr

In this solo show Bart digs into how it’s possible to embed verifiable information about an image’s creation and the chain of edits that have been applied into the metadata along with the EXIF etc.. This work is further along than you might realise, and while the rise of AI is driving this work, it was needed long before AI and its usefulness extends beyond just thwarting deep fakes.

While this podcast is free for you to enjoy, it’s not free for Bart to create. Please consider supporting the show by becoming a patron on Patreon.

Reminder – you can submit questions for future Q & A shows at http://lets-talk.ie/photoq

Show Notes (by Bart)

Provenance is just a fancy term for the history of a thing — who made it, when, and what happened to it between then and now. For antiques provenance is about proving a chain of ownership from the creator all the way to the current owner. For legal evidence it’s about proving a chain of custody from collection to the court room, and for digital images it’s about building a chain of every edit from the moment the shutter fires to the moment you see an image on your screen. Who created it, when, where, and how, and for each edit from then to now, what was done, and by whom.

That’s a pretty big ask! Surprisingly though, we don’t need to invent any new fundamental technologies to do this, we just need to agree on some standards, and work is progressing nicely on that front too!

The reason the tech industry has gone from fix’n to make a plan to deal with photo provenance to actually getting a working system off the ground is of course AI, but really, we’ve always needed this, even before it was possible. Lies have been told with photographs for as long as there has been photography, and deceptive edits have simply always been a thing. Gardener didn’t need AI to create his propaganda shots with posed US civil war dead in the 1860s, and Stalin erased his enemies from history and photographs just fine without generative fill! I guess I shouldn’t let the irrelevance of the trigger lessen my relief that a much needed tool is finally being developed. So what if we’re doing a very clever thing for a rather silly reason, at least we’re finally doing it!

Anyway, in this episode I want to demystify the technology, and describe the emerging standards that are actually a lot more advanced than you may realise. Believe it or not, all the fundamental technological concepts have been in the padlocks on our browsers, and the chips in our phones for years!

The cryptographic Foundation

In terms of the cryptography underpinning all this, it all sits on top of just two fundamental concepts — asymmetric cryptography, and cryptographic digests AKA hashes.

Those two building blocks have been assembled into the so-called Public Key Infrastructure, or PKI, that underpins pretty much our entire digital economy, and much of our online security, including most notably that all important padlock in our browsers’ address bar!

Asymmetric Encryption and Public Key Cryptography

The first building blocks is a rather cool type of cryptography we invented in the 1970s.

Before the 1970s, all cryptography worked with one key — you take something you want to obscure, you create some kind of secret key, and then you use some math to scramble the content with the key so you can only un-scramble it with the same key later. We now call this ‘symmetric encryption’, because the same key is used to scramble and unscramble. We needed to create a name for this original approach because we invented something quite different. This new kind of encryption doesn’t use the same key to scramble and un-scramble, so we named it ‘asymmetric encryption’, and started referring to everything that had gone before as `symmetric encryption`.

With asymmetric encryption you start by using some cool math to create a key-pair. This pair of keys are a kind of mathematical mirror of each other such that anything you scramble with key 1 can only be unscrambled with key 2, and vice-versa. There’s nothing special about either key, the math works the same each way — the point is that you need the other key in the pair to unscramble what every you scramble with one of them.

If you arbitrarily pick one key and promise never to share it ever, you can freely share the other key without compromising you security. To help you remember which is which, don’t call them ‘key 1’ and ‘key 2’, call one your ‘private key’ and the other your ‘public key’. These kinds of key pairs can do some cool stuff, and we call that ‘public key cryptography’.

If you and me both have a key pair, and we exchanged public keys we can do some powerful things.

If I scramble everything I send you with your public key, and you use my public key to scramble everything you send me, then we can securely share secret information without ever needing to share a secret key like would have had to do to use symmetric encryption. We could do the entire exchange by post card, or even with sky writing, and no one listening in to every character we exchanged could read our messages! I could exchange my same public key with 500 people, and each of our conversations would remain private to just the two of us!

Why? Because when you use my public key to scramble messages, only I can unscramble them, and when I use your public key, only you can unscramble!

As well as our conversations being secret, we can also be sure we’re talking directly to each other. If your public key decrypts something, your private key must have encrypted it, so the message must be from you!

In computer science jargon we now have confidentiality and authenticity!

We’ve now nailed down the first of our two fundamental cryptographic building blocks — asymmetric encryption. Let’s move on to the next — ‘digests’.

Cryptographic Digests

A digest function is some math that takes input of any length, and spits out a fingerprint of a fixed length. Simple digests can be used for things like error detection, but to make a digest function useful cryptographically it needs to have two properties:

It must be easy to go from input to digest, but effectively impossible to go the other way
Small changes in input must translate to big changes to the fingerprint

We’ve figured out the math to do both of those things, so we can securely fingerprint data.

As well as being useful for the kind of cryptography we’re interested, digest functions are also used to protect passwords stored by websites, and in that context they’re usually referred to as ‘hashing functions’ or ‘hashes’, and to make passwords more difficult to crack those digest functions can accept a key-like second input known as a ‘salt’. So, if you heard of ‘salted hashes’, they’re the more complex cousins of the digest functions we’re talking about here.

Our digest functions just fingerprint data in such a way that you can’t start with a desired digest and then invent some content to produce it, or worse still, start with some forged data, and pad or tweak it to get it to digest to a specific fingerprint.

So, how are these digest functions useful to us?

Digital Signatures

If the problem you’re trying to solve is not secretly sharing information, but publicly sharing something in such a way that others can verify it’s source, then you need to combine digests with public key cryptography to create digital signatures.

It’s actually a very simple process — you take the information you want to sign, you digest it to a fingerprint with a cryptographic digest function, and scramble that fingerprint with your private key. If you share both the information you need to be verifiable and the scrambled fingerprint, then anyone with your public key can prove to themselves that you shared the information, and, that it hasn’t changed since you ‘signed’ it. Or, to put it into computer science jargon, you have verifiable authenticity and integrity!

How does that work? If you get some information and a scrambled fingerprint that claims to be from me, you can prove that’s true by re-calculating the fingerprint yourself, then using my public key to unscramble the fingerprint I sent, and comparing the two. If they match, the information really is from me, and it really hasn’t been changed. Why? Because any change to the information would have changed the fingerprint you calculated, and only my private key could have scrambled the fingerprint if my public key can unscramble it!

In the real world we have standardised formats for storing scrambled digests and public keys, and we call those ‘Digital Signatures’. With file formats that separately embed the data and the information about that data (the metadata) in separate parts of the file, you can embed the digital signature right into the file, and you can even have it cover both the data and all the metadata that existed in the file before it was signed. For photography that means we can embed digital signatures right into our photos like we do EXIF data.

Certificates, Certificate Authorities, and the PKI

So far I’ve been very vague about how, exactly, these public keys that are at the heart of everything are supposed to get shared so that you can be sure the key you have really is my public key! How can you know it’s not some imposter’s public key?

This is where ‘Digital Certificates’ come in.

You start with a standard file format that lets you combine a public key and some identity information into a single file, then you send that file to a third party that’s trusted by everyone, they verify the identity information, then they digitally sign the public key and the identity information. We call the file we send to this third party a ‘certificate signing request’, we call these third parties ‘certificate authorities’, and we call the digitally signed files they produce that link public keys to identities ‘digital certificates’.

To make all this work we need some kind of global set of standards, protocols, procedures and policies, and we need some way of policing all that. That’s what the so-called ‘Public Key Infrastructure’ (PKI) is.

The file formats for certificate signing requests, including the possible data fields and their meanings, and the resulting certificates are covered by a global standard known as X.509. , The public keys for the internet’s trusted certificate authorities are included in all our operating systems and act as a so-called ‘trust anchor’, allowing us to validate certificates, and hence, public keys all over the internet. There’s a global trade organisation that governs and polices these CAs, setting the rules and removing trust from those that break them.

As for the identities in digital certificates, those depend on the problem to be solved — the padlock in your browser is driven by X.509 certificates that connect internet domain names to public keys, digital signatures in email is driven by X.509 certs that connect email addresses to public keys, and you can get X.509 certificates that bind the names and addresses of people or organisations to public keys.

Protecting Private Keys and Secure Enclaves

We have one more building block to learn about before we can put all the pieces together into a system for cryptographically certifying the provenance of digital images — we need some way of protecting private keys when they’re in the physical world.

We need to be able to vouch for an image from the moment the shutter fires to moment it shows up on someone’s screen, so we need to digitally sign the RAW image right inside the camera, which means we need to have a precious private key with us at all times as we photograph. If anyone managed to steal that key, they could make it appear as if any image of their choice was shot with our camera. That would utterly undermine everything, so we need a reliable way of protecting they keys inside cameras.

This is where some nice hardware popularised by Apple’s smartphones comes to the rescue — hardware chips that create key pairs but have no mechanism for exporting the private keys. These tiny little chips have hardware for receiving commands to make key pairs, to output public keys, to receive data for scrambling, and to the data they scramble. But, they have no circuitry for exporting private keys. No amount of software hackery can get those keys out of those chips, there simply is no physical pathway for those bits to get out of those chips. The only way to do it would be to painstakingly etch away the surface of the chip to reveal the inner circuitry, then to use an electron microscope to try read the data against the chip’s will, and to do all that without accidentally corrupting the data. Not impossible, but so difficult it’s effectively impossible!

Apple call these chips ‘secure enclaves’, and others call them ‘trusted platform modules’, but they are what make it impossible for even the FBI to decrypt an iPhone without the password, and they are what camera manufacturers are now starting to embed into their cameras to protect the private keys they use to digitally sign raw photos as they’re read from the camera sensor.

A System for Digital Provenance

So, technologically, we don’t need anything new to get digital provenance — we can use digital signatures to connect data and metadata to public keys, digital certificates to connect public keys to people, organisations & companies, and trusted platform modules can protect private keys in cameras.

What’s needed is a standardised system to tie all these things together, and buy-in from the relevant parties.

The ball got rolling on this twice, once in an Adobe-led effort, and once in a BBC-led effort, and then they came together for form a unified effort they dubbed the Coalition for Content Provenance and Authenticity, or C2PA. The C2PA has a lot of members and covering a broad range of stakeholders, including:

Camera Manufacturers like Canon, Nikon, Leica & Sony (Leica already have cameras on the market that can digitally sign images)
Software companies like Adobe
Media organisations like the BBC & the New York Times
Cloud services providers like AWS, Microsoft & Google
Certificate authorities like DigiCert
Chip companies like Intel & ARM

The C2PA have developed an open standard they’ve dubbed Content Credentials for embedding provenance data into media files, and not just for photos, there’s support for video and audio files too!

Content Credentials are far enough along that you can actually start using them in the real world — Leica will sell you a camera that supports Content Credentials, any certificate authority will provide you with the needed digital certificates to assert your identity, Adobe’s photo editing tools support Content Credentials, and there’s an official validation site that will extract, validate, and visualise Content Credentials in media files at contentcredentials.org/verify. Ravnur and Microsoft have also partnered to build a full Software-as-a-Service video platform that organisations like local governments publish videos of official events with Content Credentials embedded in them. Nikon and Canon have also announced up-coming support for Content Credentials in their pro camera lines.

As well as the specification being public, there are also official open source implementations of the specification which developers can use to start embedding support for content credentials into their apps.

How do Content Credentials Work?

Content Credentials embed provenance information in media file metadata (like EXIF in photos) in a structure the spec dubs a Manifest. These manifests contain chains of claims and assertions that describe the creation of the media, and all the edits that get applied after that. The chain starts with assertions about how the media was first created and an ownership claim, and those two get digitally signed by the original creator or their organisation. From that point on, each time the image is edited in a Content Credentials capable app, an entry gets added to the manifest with fresh assertions and claims that are digitally signed by the person or organisation making the edits.

At each point in the chain the digital signature verifies the following:

The digest of the media itself at that point in its history (so you can tell if the version you are looking at corresponds to a specific point in the chain)
Some kind of preview of the current state of the media (for photos that’s a thumbnail)
A collection of assertions describing the changes that were made
An authorship claim, i.e. an identity
A digest of the previous manifest entry to establish a chained sequence (proves the order of changes and detects un-documented changes)

To illustrate this, let me describe a real-world scenario that could be happening right now this second — a New York Times photographer is in Ukraine shooting images of the war there with their Leica camera. As they shoot each image and its EXIF data is being digitally signed by the camera. They upload their images to their editor at NYT HQ and they add captions, a title, some keywords, and copyright information, they digitally sign all that with the New York Times’ private key to create the first entry in the Content Credentials manifest. The editor is in effect verifying that the camera and the photographer were working for them, and that they are vouching for the authenticity of the metadata. This does not prove the caption is true, it cryptographically proves the New Your Times say it’s true.

The image then goes to a picture editor because it’s going to be used in a story, so they open it in PhotoShop, adjust the levels, fine-tune the colour balance, and crop it. Photoshop builds a new entry for the manifest that includes assertions detailing each edit, a new digest of the image as it is after the edits are applied, a new thumbnail, and a claim of authorship of the edits, and digitally signs it again with the NYT’s private key.

The image is then published in a story, shared on social media, and someone wonders if it’s true. They download the image, check it with the Content Credentials Checker site, and can see where the image came from, and who is vouching for it.

Later that day, an un-scrupulous so-and-so also downloads the image, they doctor it to make it tell a different story, and upload it to social media. They did not use Content Credential-aware software, so the EXIF data with the Content Credential is still in the image, but the manifest has not been updated.

Another user sees the doctored image, they get suspicious, so they upload it to the verifier. They now see that the image has been tampered with, they see the original image details as the NTY editor shot the image and the NYT editor captioned it, they see that an NYT editor cropped and tweaked it, and they see thumbnails of the original image from the camera, and the image after the sanctioned edits by the NYT. They can see how the doctored version completely distorts the story, and they’re not fooled.

All that can happen today.

The only real pain point in the chain is the final end-user experience. Everything the NewYork Times need is in place and working, but the user facing stuff is really clunky today — are we really going to check images for Content Credentials?

Obviously what we need now is for the end-user apps where people encounter media to get native built-in Content Credential support, so they can badge media with embedded credentials, and let users explore those credentials visually from right within the app. The most important app categories I can think of are web browsers, social media apps, news apps, and video player apps. All the needed library code is available today from the C2PA’s Git repository, so there’s nothing stopping developers getting stuck in.

We do also have a bit of a chicken and an egg problem — not many media creators are embedding content credentials, so there’s no big demand for software support, so because the credentials are not easily visible in the apps people use day-in-day-out there’s not much pressure on media creators add them. Maybe one good thing to come out of the AI hype ATM will be a drive to add Content Credential support to news and social media apps

Final Thoughts

Something to note is that this is not just about trust in media, this is also a way of proving ownership, so creators can prove authorship of their work. It’s also being used by Adobe to expressly mark AI generated images as being AI generated when they’re created using Adobe’s AI tools. So, in the future, there will be four kinds of images online:

1. Images with Content Credentials that are asserted as being human-created and real by their verified creators
2. Images with Content Credentials that are explicitly declared as being fully or partially AI-generated
3. Images that are known to have been corrupted/manipulated because their Content Credentials fail to validate
4. Image that we know nothing about because they don’t have any embedded Content Credentials

None of this prevents fraud, what it does is add transparency. If the manifest in the final image you’re viewing validates, then you know the history in that manifest is genuine. You then have to decide if you trust all the people and organisations listed in the manifest. The manifest can’t prove that the caption and other metadata are true, just that a specific person or organisation has provably claimed they are true. Ultimately, we, the humans, will always have to decide who has and who has not earned our trust. Content Credentials just make it possible to make those decisions armed with verifiable facts about the media we’re consuming.

Finally, yes, this is very much still a work in progress, but the foundations are well and truly laid, and there’s enough of the super structure built that early adopters can test the entire system from shutter to screen today, so if this momentum keeps going, there’s a good chance this will actually happen!

Let's Talk Apple & Let's Talk Photography