GitLab Container Registry administration [Garbage Collector]
What does the Garbage Collector do on the Container Registry?
There are two modes, which we will discuss further below.
- Removes unused image “Layers”.
- Removes unused image “Layers” + untagged “Manifests”.
In order to fully understand the implications of the Garbage Collector’s actions on the Container Registry, here is a clarification of some terms related to containerization.
Clarification: Layers / Tag / Digest / Manifest
- Layers, in simple terms, are instructions that are used to build images.
- Tags are said to be “mutable”, which means that we can tag multiple images (multiple builds) with the same tag. For example, if we tag two different builds of an image with the tag “latest”, the first image would be untagged and the second image would then have the tag “latest”. In order to consume the first build at any time, we will need to give it a versioning tag (e.g. “2004v1”) or to use its digest.
- Digests are said to be “immutable”. A digest (hash in sha256) is attributed to an image with each build, even if the tag remains the same. It is therefore possible to consume an image by invoking its digest. This is a best practice in terms of security, as it allows us to ensure that we are consuming the desired image (no spoofing). However, we lose some readability, as a tag is easier to read than a digest. As is often the case, there is a compromise to be made.
- Manifests contain information about an image, such as the layers it is made up of, its size, configuration, and digest (both for the image as a whole and for each of its layers). A manifest is essentially a file that describes an image and provides a way to verify its integrity.
With these definitions in mind, let’s try to understand what the Garbage Collector’s actions on the Container Registry entail.
As mentioned above, there are two ways to operate the Garbage Collector on the Container Registry:
- Remove unreferenced layers: This is the default level of cleaning. This mode has never allowed us to free up space on your environment. If an image layer is no longer referenced in a manifest, it is deleted. If a layer does not exist in any manifest, it means that it is not present in any image, and there is therefore no question about its deletion.
- Removing untagged manifests and unreferenced layers: The previous action is performed (remove unreferenced layers), and additionally, manifests that are no longer tagged are deleted.
You might ask, “How do we end up with an untagged manifest” ?
- When we run “docker image rm“. If the same image has multiple tags, this command deletes the tag when run with the tag as a parameter (resulting in a manifest without a tag). If there is only one tag for an image, the same command deletes both the tag and the image.
- In our example of an unversioned build where we use the same “latest” tag each time, a new manifest is created with each build. When the “latest” tag is applied to the new manifest, the old manifest is “untagged” but continues to exist (hence the importance of explicitly tagging each version when using the “latest” tag. At any given time, the image should exist with both a “latest” tag and a “versionned” tag).
In summary:
As mentioned in the definition of the term Digest, it is possible to consume an image by specifying its digest, in which case it is not necessary for the image’s manifest to be tagged.
This explains the warning from GitLab about the second method of cleaning the Registry:
An untagged image remains “content-addressable” via its digest, for this reason, we need to make sure images are not consumed via their digest before using the method which also removes untagged manifests.