Sony IMX500 – The World’s First AI Image Sensor Announced

May 21^st, 20202

Sony IMX500 - The World’s First AI Image Sensor Announced

by Simon WyndhamMay 21^st, 2020

The new Sony IMX500 Intelligent Vision series of image sensors contain AI image analysis systems directly on the chip, which opens up some new, and faster abilities for cameras.

Sony’s new Intelligent Vision IMX500 series chips offer capabilities that are only limited by the imagination. Image: Sony Electronics

The announcement describes two new Intelligent Vision CMOS chip models, the Sony IMX500 and IMX501. From what I can tell these are the same base chip, except that the 500 is the bare chip product, whilst the 501 is a packaged product.

They are both 1/2.3” type chips with 12.3 effective megapixels. It seems clear that the one of the primary markets for the new chip is for security and system cameras. However having AI processes on the chip offers up some exciting new possibilities for future video cameras, particularly those mounted on drones or in action cameras like a GoPro or Insta 360.

Sony’s new Intelligent Vision IMX500 series chips. The IMX500 on the left, the IMX501 on the right. Image: Sony Electronics

What can the Sony IMX500 sensor do?

One prominent ability of the new chip lies in functions such as object or person identification. This could be via tracking such objects, or in fact actually identifying them. Output from the new chip doesn’t have to be in image form either. Metadata can be output so that it can simply send a description of what it sees without the accompanying visual image. This can reduce the data storage requirement by up to 10,000 times.

For security or system camera purposes, a camera equipped with the new chip could count the number of people passing by it, or identifying low stock on a shop shelf. It could even be programmed to identify customer behaviour by way of heat maps.

How the new Sony Intelligent Vision sensor is arranged. Image: Sony Electronics

For traditional cameras it could make autofocus systems better by being able to much more precisely identifying and tracking subjects. With AI systems like this, it could make autofocus systems more intelligent by identifying areas of a picture that you are likely to be focussing on. For example if you wanted to take a photograph of a flower, the AF system would know to focus on that rather than, say, the tree branch behind it. Facial recognition would also become much faster and more reliable.

Autofocus systems today are becoming incredibly good already, but if they were backed up by ultra fast on-chip object identification they could be even better. For 360 cameras, too, the ability to have more reliable object tracking metadata will help with post reframing.

The new chip doesn’t have to output images. It could just output a description of what it sees. Image: Sony Electronics.

Why do we need AI on chip?

There are two main impetuses behind placing the AI capabilities directly on the chip. The first is that it makes processing much, much quicker. The Sony IMX500 is able to perform its abilities in the speed of one frame of video, rather than having to send that data along a pipeline to be processed elsewhere. The other advantage is higher security. Quite often data is sent over the cloud for AI image analysis. Having these systems on chip takes away that potential security loophole.

Cloud AI cannot be used offline either, and in addition it restricts the ability to perform analysis in realtime reliably. The energy and cost of cloud computer is also increasing, and that’s not good for the environment.

In terms of small cameras like GoPros, it means that this type of processing doesn’t need to be performed by another chip elsewhere in the camera. This saves power, but it also means that the cameras main processing chip and memory can be freed up to do other things, such as better electronic stabilisation or colour processing.

An example of realtime tracking at a shop counter. Note how it is tracking the shop assistants limb movement as well. Image: Sony Electronics

It’s only limited by your imagination

But the abilities of the new chip, which can be custom programmed by developers to do precisely what they need, are only limited by the imagination. Sony uses a car as one example of how it could be used, identifying the driver and adjusting the car’s seating position automatically. Another example is being able to recognise whether the driver was falling asleep.

For sports cameras it might be possible for the device to identify your form during movement. If you want to improve your yoga or martial arts for instance, it could help identify areas for improvement by comparing you to a ‘perfect’ example. Speech recognition from lip movement could also be made much faster potentially, and be included in all cameras. For people filming drama, this would have huge potential when it comes to logging shots or identifying them from a script if the camera is outputting actors performances in text form at the same time it is recording the image.

The IMX500 looks like a highly capable chip from a pure video perspective, too. It is capable of 4K at up to 60fps and 1080p at up to 240fps. Although currently the chip is restricted to 30fps for full video and AI processing together.

All told while this is only the first generation of chip, you can expect this type pf capability to be rolled out into other more conventional chips as time moves along, and therefore it is a significant development worth covering.