imgproxy v4: Image classification, cropping objects, and better autoquality
This is the fourth part of a series of blog posts about the new features in imgproxy v4. In this post, we’ll focus on its smart features. We’ll talk about how imgproxy can classify images, how it can crop images around objects, and how we improved the autoquality feature.
imgproxy v4 announcements:
- Internal Cache and changes to conditional request behavior
- Parallel image downloading
- Better SVG minification, RAW formats support, and colorspace preservation
- Image classification, cropping objects, and better autoquality
Image classification Pro
Image classification is a process of categorizing images into predefined classes based on their content. In some ways, it is similar to object detection, but instead of localizing specific objects within an image, it assigns a label to the entire image based on its overall content. In other words, object detection answers the question “Where is a cat in this image?” while image classification answers the question “Is there a cat in this image?”.
While object detection can be used for a kind of image classification, it is not the most efficient way to do it. Using specialized image classification models has multiple advantages:
- Performance: Image classification models do less work than object detection models, as they don’t need to localize objects within the image. This makes them much faster.
- Accuracy: By focusing on the overall content of the image, classification models can provide more accurate labels for the entire image, rather than just identifying individual objects, especially on large sets of classes.
- Not limited to objects: Image classification can categorize images based on various features, such as scenes, activities, or abstract concepts, which may not be easily localized as objects. For example, it may be used to distinguish between a photo and a painting, or to classify images as not safe for work (NSFW) or safe for work (SFW).
- Simpler training: While object detection models require annotated bounding boxes for training, image classification models only require labeled images. This makes it way easier to find or even create a dataset matching your needs.
In imgproxy Pro v4, we added support for image classification using our out-of-the-box model or any custom model you may want to use. We added plenty of config options to let you configure imgproxy with virtually any image classification model.
Our out-of-the-box model is trained to classify 560 object classes: people, animals, vehicles, food, and more. It is a multi-label model, which means it can assign multiple labels to a single image. For example, an image of a person riding a horse may be classified as both “person” and “horse”.
To use image classification in imgproxy, use the classify_objects option of the /info endpoint. It accepts a number of classes to return, and the optional list of classes you want to use. If you don’t specify the list of classes, imgproxy will use all the classes the model was trained on. The response will include the list of classes with their confidence scores:
{
"classification": [
{ "class_id": 153, "name": "Dog", "confidence": 0.8544922 },
{ "class_id": 351, "name": "Person", "confidence": 0.84277344 },
{ "class_id": 553, "name": "Woman", "confidence": 0.58203125 },
{ "class_id": 295, "name": "Man", "confidence": 0.50878906 }
]
} Cropping objects Pro
Object-oriented cropping gravity was a thing in imgproxy since v3. It makes imgproxy focus on the objects in the image when it needs to crop, due to the fill resizing type or the crop option. It’s useful when you want to keep important objects in frame.
The main limitation of the object-oriented gravity is that it only allows to focus on the objects, but it doesn’t allow to crop around them. For example, you can use it to crop an image to a 200x200 square while keeping faces in frame, but you can’t use it to cut off all the areas that don’t contain faces.
In imgproxy Pro v4, we added a new processing option – crop_objects. It enables you to crop the image around the objects in it. The option allows you to specify how much the cropping area should be expanded around the objects, and what classes of objects to crop around. It respects the resizing_type option, so if you set it to fill or fill-down, imgproxy will expand the cropping area to match the output dimensions’ aspect ratio. It also respects the IMGPROXY_OBJECT_DETECTION_GRAVITY_MODE config option when selecting objects to crop around, and the objects_position option when deciding where to place the objects in the output image.
Here’s an example of the result of applying .../resize:fill:600:600/crop_objects:2:face/... to a photo by Marco Xu:
As you can see, this feature, combined with our default model for face detection, is ideal for creating perfect headshots from random photos!
Better autoquality Pro
Autoquality is a feature that automatically adjusts the compression quality factor to balance visual quality and file size. It is not a new feature for imgproxy, but in v4, we made it much better!
The first improvement is the way imgproxy calculates the quality of the output image. In previous versions, imgproxy used the plain DSSIM (structural dissimilarity) formula. The problem with it is that it treats all parts of the image equally. For example, solid areas of color (like backgrounds) will hardly show any compression artifacts, yet they will contribute significantly to the overall DSSIM score, lowering it. This leads to the situation where an image with many solid areas gets a good DSSIM score, even though it has many compression artifacts in the detailed areas.
In imgproxy Pro v4, we use a modified version of the DSSIM formula that gives different weights to different parts of the image. So the areas where compression artifacts are more likely to appear will have a bigger impact on the overall score. This allows imgproxy to better estimate the visual quality of the output image and adjust the quality accordingly.
The second improvement is the introduction of new ML-based autoquality models. ML-based autoquality is similar to the DSSIM-based one, but instead of using a static starting quality, it uses a machine learning model to predict the optimal starting quality for the output image. This allows imgproxy to find the optimal output image quality faster, since it starts the optimization process from a better starting point.
In imgproxy Pro v3, we provided only a JPEG model out of the box. In v4, we added models for WebP, AVIF, and JPEG XL images. The JPEG model has also improved its accuracy.
imgproxy becomes smarter with every release! In v4, we added support for image classification, added a new smart cropping option that allows you to crop images around objects in them, and improved the autoquality feature. And we look forward to seeing these features in action in your projects!
More announcements are on the way, so stay tuned! And if you want to try out these features, just apply to our Early Access program!