The new Sony IMX500 Intelligent Vision series of image sensors contain AI image analysis systems directly on the chip, which opens up some new, and faster abilities for cameras.
The announcement describes two new Intelligent Vision CMOS chip models, the Sony IMX500 and IMX501. From what I can tell these are the same base chip, except that the 500 is the bare chip product, whilst the 501 is a packaged product.
They are both 1/2.3” type chips with 12.3 effective megapixels. It seems clear that the one of the primary markets for the new chip is for security and system cameras. However having AI processes on the chip offers up some exciting new possibilities for future video cameras, particularly those mounted on drones or in action cameras like a GoPro or Insta 360.
What can the Sony IMX500 sensor do?
One prominent ability of the new chip lies in functions such as object or person identification. This could be via tracking such objects, or in fact actually identifying them. Output from the new chip doesn’t have to be in image form either. Metadata can be output so that it can simply send a description of what it sees without the accompanying visual image. This can reduce the data storage requirement by up to 10,000 times.
For security or system camera purposes, a camera equipped with the new chip could count the number of people passing by it, or identifying low stock on a shop shelf. It could even be programmed to identify customer behaviour by way of heat maps.
For traditional cameras it could make autofocus systems better by being able to much more precisely identifying and tracking subjects. With AI systems like this, it could make autofocus systems more intelligent by identifying areas of a picture that you are likely to be focussing on. For example if you wanted to take a photograph of a flower, the AF system would know to focus on that rather than, say, the tree branch behind it. Facial recognition would also become much faster and more reliable.
Autofocus systems today are becoming incredibly good already, but if they were backed up by ultra fast on-chip object identification they could be even better. For 360 cameras, too, the ability to have more reliable object tracking metadata will help with post reframing.
Why do we need AI on chip?
There are two main impetuses behind placing the AI capabilities directly on the chip. The first is that it makes processing much, much quicker. The Sony IMX500 is able to perform its abilities in the speed of one frame of video, rather than having to send that data along a pipeline to be processed elsewhere. The other advantage is higher security. Quite often data is sent over the cloud for AI image analysis. Having these systems on chip takes away that potential security loophole.
Cloud AI cannot be used offline either, and in addition it restricts the ability to perform analysis in realtime reliably. The energy and cost of cloud computer is also increasing, and that’s not good for the environment.
In terms of small cameras like GoPros, it means that this type of processing doesn’t need to be performed by another chip elsewhere in the camera. This saves power, but it also means that the cameras main processing chip and memory can be freed up to do other things, such as better electronic stabilisation or colour processing.
It’s only limited by your imagination
But the abilities of the new chip, which can be custom programmed by developers to do precisely what they need, are only limited by the imagination. Sony uses a car as one example of how it could be used, identifying the driver and adjusting the car’s seating position automatically. Another example is being able to recognise whether the driver was falling asleep.
For sports cameras it might be possible for the device to identify your form during movement. If you want to improve your yoga or martial arts for instance, it could help identify areas for improvement by comparing you to a ‘perfect’ example. Speech recognition from lip movement could also be made much faster potentially, and be included in all cameras. For people filming drama, this would have huge potential when it comes to logging shots or identifying them from a script if the camera is outputting actors performances in text form at the same time it is recording the image.
The IMX500 looks like a highly capable chip from a pure video perspective, too. It is capable of 4K at up to 60fps and 1080p at up to 240fps. Although currently the chip is restricted to 30fps for full video and AI processing together.
All told while this is only the first generation of chip, you can expect this type pf capability to be rolled out into other more conventional chips as time moves along, and therefore it is a significant development worth covering.
Have you any ideas how on-chip AI would make a camera feature you would like to see possible? Let us know in the comments below!