Creators are becoming increasingly worried that their images are being used by generative AI as datasets for training. Generative AI requires samples in order to be trained. Once trained, it can generate content in the style of the images used for training based on a prompt. How to protect images from becoming unauthorized training material is a question on many creator’s minds. Fortunately, new solutions are constantly developing; one way is using metadata to block generative AI from using individual images.
Every image includes metadata. This is information that is embedded in the image file in text format. This metadata describes relevant information regarding the image and its production. These descriptors are called data fields. The data can be embedded automatically by the device generating the image or using post-production software. Several standards exist for embedding metadata.
Metadata Standards
- Exchangeable Image Format (EXIF) : This is a common standard used by devices such as cameras or smartphones. The EXIF information embedded in images typically comprises: device model, date and time an image is taken, the copyright owner, images settings and more.
- International Press Telecommunications Council (IPTC): This standard was initially created and heavily relied upon by media and news agencies. The metadata information embedded in this standard provides information surrounding the copyright owner, licensing allowances and restrictions, and more.
- Extensible Metadata Platform (XMP): A formatting standard created by Adobe, this standard is used to store information regarding the creation and processing of an image.
How to protect images from AI using Metadata
The IPTC, in partnership with the PLUS Coalition, updated its metadata standard to version 2023.1, including a feature called 'Data Mining'. The ‘Data Mining’ field allows image rights owners to determine whether their images can be used for AI/Machine Learning purposes by asserting prohibitions or permissions.
Step 1: Select a tool to use that supports IPTC Photo Metadata. For a complete list of available software click here.
Step 2: In the tool of your choice, access the metadata window or viewer.
Step 3: Use the XMP Property plus:DataMining
Step 4: Update the metadata information pertaining to the ‘Data Mining’ field with the appropriate value. The values are set forth by the
- http://ns.useplus.org/ldf/vocab/DMI-UNSPECIFIED (Unspecified - no prohibition defined)
- http://ns.useplus.org/ldf/vocab/DMI-ALLOWED (Allowed)
- http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-AIMLTRAINING (Prohibited for AI/ML training)
- http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-GENAIMLTRAINING (Prohibited for Generative AI/ML training)
- http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-EXCEPTSEARCHENGINEINDEXING (Prohibited except for search engine indexing)
- http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED (Prohibited)
- http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-SEECONSTRAINT (Prohibited, see Other Constraints property)
- http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-SEEEMBEDDEDRIGHTSEXPR (Prohibited, see Embedded Encoded Rights Expression property)
- http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-SEELINKEDRIGHTSEXPR (Prohibited, see Linked Encoded Rights Expression property)
Example
XMP Specs: plus:DataMining [URL <External>]
<plus:DataMining>
http://ns.useplus.org/ldf/vocab/DMI-PROHIBITED-AIMLTRAINING
</plus:DataMining>
Please note that some regional laws may override the values set in these properties, typically for search indexing purposes. It is also important to note that if the DataMining property does not have a stipulated value, this does not automatically grant permission for generative AI to use your images. Making use of the new field will provide an extra layer of protection to block generative AI from accessing individual images.
Resources
https://iptc.org/std/photometadata/specification/IPTC-PhotoMetadata-2023.1.html#data-mining
http://ns.useplus.org/LDF/ldf-XMPSpecification#DataMining