More resolution, High Dynamic Range (HDR), higher frame rates… How exactly do the limits of the human visual system impact our technology?
When is enough, enough? This is one of the most interesting questions brought to mind after a discussion that came up at the recent ARRI Broadcast Day held at Die Fernsehwerft studio facilities in the new creative district of East Berlin Harbour. In a studio filled with international guests from across the industry, Marc Shipman-Mueller – ARRI’s head of camera systems – opened up this topic in a way that has permanently realigned the way I view today’s core developments in digital cinema acquisition, post and delivery.
These are purely my opinions based on an evolving thought process rooted in an exploration of new technology, and not an absolute truth. I may well end up being wrong, and am happy to learn things that may change my views expressed here. In fact, I’d love to open this up to you, and hear some of your thoughts, so I encourage you to leave comments.
The Holy Grail
A decade ago, matching the performance and aesthetic of 35mm film was the “Holy Grail” of digital cinema. We have now met and surpassed the challenges laid down at the birth of digital cinema, matching 35mm emulsion technically, if not also in almost every other way. We are now far more concerned with the nuances of digital color science than arguing over film vs. digital. That battle has been won and, for the most part, the result is universally accepted.
So, what’s next for digital cinema? And how much of it is commercially driven by the need to sell new TVs and consumer gadgets rather than by a meaningful improvement of the visual experience? Perhaps more than we’d like to think. On the other hand, after a few years of feeling through the darkness, some core standards are finally being defined and adopted throughout the industry.
Digital Cinema 2.0
If the initial phase in the evolution of digital cinema saw us match and surpass 35mm film, the second phase has shifted gear and direction entirely. It is no longer a question of imitating what came before, but imagining and realising an unknown future. Film as the holy grail of the cinematic experience was a fixed, known goal, but the question of where to head next has seen us charge down a few blind alleys only to backtrack and change course.
The Cart before the Horse?
Perhaps it is this very lack of a fixed, known goal on the horizon that has allowed us to follow technology for technology’s sake to the detriment of both the art and the consumer of filmed entertainment. Television manufacturers have charged ahead, undeterred by the recent rise and fall of 3D TV, to UHD, SUHD, HDR and whatever other acronyms they can legitimately or illegitimately make up to dazzle consumers into parting with their cash. This was already taking place before formal standards were agreed, all the while content creators were finding themselves lost for technical direction, and content providers scrambling to launch their first UHD channels and services regardless of the fact that there is very little UHD content.
UHD, High Dynamic Range (HDR), wide color gamut, high frame rate… We’ve thrown everything we can at the screen, when perhaps fine tuning a more measured combination of technologies could give the best results for both creators and audience.
Just because we can do something doesn’t mean it’s the best (or only) solution to the problem. We may, in fact, not even fully understand the problem.
The resolution war has quietened over the past couple of years. We all know 4K is here to stay, and we’d be foolish not to acknowledge that 8K has already made it over the horizon and is charging toward us. Granted, I don’t believe we will see 8K adopted as a mainstream delivery resolution anytime soon, but RED (and of course NHK and partners on the broadcast side of things) has made 8K acquisition a reality, and it’s actually not as impractical as the cynics would like to think.
However, simply increasing pixel resolution has some far-reaching implications.
More pixels at a higher color bit-depth means a lot more data. This increases bandwidth requirements on location, through post and, of course, delivery. The whole point of increasing pixel resolution is to increase perceived sharpness and detail in the image, although there are other ways to achieve this without the huge overhead.
There is a difference between perceived resolution and actual pixel resolution – a difference which is important to distinguish. The only thing that matters to the viewer is “perceived” sharpness and detail and, as such, it is imperative to bring the response of human vision into the equation if we’re weighing the pros and cons of different technologies, advancements and their implications. Quite often, solving one issue by brute force – such as a 400% increase in actual pixels – brings up other problems.
All too often in these discussions it seems we don’t acknowledge how we see, process and perceive the illusion of moving images at all.
Resolution and Human Vision
We have only a small region of our retinas which packs cone cells at a very high density. This area is called the “fovea” and, while it comprises less than 1% of the retina, it takes up over 50% of the visual cortex in the brain. Only the central two degrees of the visual field is focussed on the fovea, approximately twice the width of your thumbnail at arm’s length.
When you observe an object in detail which occupies more than this central two degrees of your field of vision, your eye must scan the area and your brain fills in the gaps. Your peripheral field of view is effectively of far less resolution, but your brain is able to construct the overall image of the scene in front of you as your eye passes over it. You remember and recall detail in parts of the scene after your eye has moved focus, even when in motion, which is why you maintain real-time visual awareness over your entire field of view.
What you are seeing, however, is a constantly updated construct processed in your brain rather than the exact “real” projected image your eyes are seeing. You are perceiving a computationally constructed reality all the time, something far more complex and nuanced than a camera recording the entirety of an image focussed on an image sensor at a fixed, uniform pixel density and frame rate.
This is why your eye moves to scan your screen as you read this: for you to stop and again see full detail on another part of the screen you have to move your eyes to read it again. You are aware of the rest of the text and the overall layout and spatial position of everything else in your field of view, but to actually read the text you have to physically move your eyes.
A moving image on a screen is an illusion. It is a trick that our brain perceives as a fluid moving image. We should be careful not to break the illusion, which is more fragile than you might think.
Pixel resolution is not the only measure of sharpness. Mathematically, there is no doubt that increasing the pixel count gets the job done, but beyond a certain threshold the returns drop off towards zero while exponentially increasing data rates.
Our perception of detail and sharpness is based intrinsically on contrast. It is edge contrast especially which allows our brain to separate and recognise one object from another. It is also contextual to the rest of the image, is not necessarily uniform – especially when in motion – and is largely informed by contrast in brightness and color. This is why simply increasing pixel resolution over the entire image is quite possibly a wasteful brute force solution to a problem which can be better addressed by feeding our visual system with just the extra information it needs, exactly where it needs it to achieve the desired illusion.
One of the practical examples shown by Marc Shipman-Mueller at the ARRI Broadcast Day was a side-by-side comparison between 1080p SDR and UHD HDR, followed by 1080p HDR and UHD HDR on calibrated monitors in a darkened room. The difference between 1080p SDR and UHD HDR was clear, but the 1080p HDR and UHD HDR which followed immediately afterwards was not.
After a repeated viewing, everyone agreed that when compared to the 1080p SDR, the 1080p HDR provided 90% of the perceived increase in detail achieved by quadrupling the pixel count when sitting at the same (normal) viewing distance from the screen. That’s a lot of wasted pixels.
It’s time to address the elephant in the room. We’ve all seen it – it’s arguably the biggest challenge we currently face in maintaining the illusion of motion when our brains fail to smoothly connect and integrate individual frames.
But, what exactly causes judder?
Judder is caused when our brains fail to smoothly connect movement of objects with bright, high-contrast edges across individual frames. The reason it is more of a problem for us in the context of HDR is the higher display brightness, but it comes down to a combination of factors:
- the ratio of the area of moving detail compared to the total field of view,
- speed of motion,
- contrast in the detail,
- sharpness of the detail,
- brightness level of the display.
Display brightness is not really a root cause, but it definitely makes the situation worse. The lower the overall image brightness, the less obvious the effect, and at SDR-display levels, judder has typically been hidden by motion blur at film and video frame rates.
However, at HDR contrast ratios and display backlight levels, it is no longer possible to hide these artefacts.
So how do we solve this problem? Read on.
In the real world, our eyes feed our brains with a constant stream of image information. We don’t see in discrete frames or fixed images sampled over time. However, the illusion of motion we perceive when we view a series of still images in quick succession is caused by what we call the persistence of vision, as the retina is able to hold an image for about 40ms.
The 24fps standard came about as the minimum acceptable frame rate for the perception of smooth motion: high enough for the illusion to hold, but low enough to be economical.
The reason 24fps still feels cinematic is largely due to historical and cultural reference. In other words, it’s the way it has always been. The subtle, almost imperceptible flicker informs us to expect big-screen Hollywood budgets, award-winning performances and high production and artistic value compared to the lower expectations of television and video.
This has been fine for more than 100 years of motion pictures, for film, television, all the way up to what I consider the end of “Digital Cinema 1.0”, where we have now matched 35mm film in resolution and color. But whether on a small screen or cinema projection, our viewing experience has remained within the threshold of acceptable contrast, color and brightness for the illusion of motion to hold at traditional frame rates.
With high-resolution images, high dynamic range and brighter displays comes the need for higher frame rates. Simply put, we perceive less judder at higher frame rates because the relative motion of objects is reduced between frames. Areas of very high detail, high contrast and fast motion could require very high frame rates to be perceived without any judder.
There are other advantages to high frame-rate acquisition. Shooting at 120fps allows motion blur to be interpolated in post, and the ability to easily master to any number of lower frame rate delivery requirements.
There are also disadvantages, the first being obvious: like resolution, increasing frame rate drastically increases the required data bandwidth to capture, store and process moving images.
Super realism and the soap-opera effect
The second disadvantage is one being faced by some of the first filmmakers to experiment with high frame rates. Many critics of Peter Jackson’s decision to shoot and master The Hobbit at 48fps, and more recently Ang Lee’s Billy Lynn’s Long Halftime Walk in 3D at 120fps, claim that the hyper-real effect of the crystal clear, super-sharp look, while technically impressive, is unwatchable and un-cinematic.
It is not the illusion of motion that is broken, but the dream-like quality that absorbs you into a story and removes you from reality. It’s the ethereal experience that is broken, and this, I would argue, is perhaps more important to the experience than 4K 1000-nit displays.
Of course we may just need to get used to a new aesthetic.
Maybe the ideal solution is not to hit the entire image with the sledgehammer of 4K or 8K resolution at 120fps. Could there be a more targeted approach where we increase detail and frame rate in selected parts of the image only? Remember, you only see the central two degrees of your entire field of vision in high detail, and even while watching a screen you are constantly scanning, moving your eyes from point of interest to point of interest. Shouldn’t we be thinking more about how we will actually experience and perceive the images we create?
Targeting our solutions to human perception means understanding that our brains are already doing an awful lot of sophisticated real-time compression and filtering. Our retinas are part of our central nervous system, and there are about 150 million receptors in the eye and only 1 million optic nerve fibres. The retina spatially encodes (compresses) the image to fit the limited capacity of the optic nerve.
Your brain is dynamically processing a constant stream of narrowly-focused high-res image information from the fovea along with a low res wider field of view, augmented with extra detail from memory recall to build what we know and see as our world around us.
If your perception of reality is already based on your eyes and brain filtering out a ton of unnecessary extra information, and passing on mostly just what changes from moment to moment (not unlike the digital spatial and temporal image compression we already use to compress video) maybe we only need to present extra information where it matters, where it will be seen and affect our perception of the image.
Perhaps in this way we can enhance how we experience cinema, adding a sense of enhanced realism and clarity, but stepping back from the point where we disengage from the dream.
Sure, 4K and 8K HDR, HFR, wide color gamut images on a massive cutting-edge OLED display might make you feel like you’re looking through a window to a real-world scene on the other side. But is reality really what we want from our storytelling?
I don’t think so.
Let us know what you think of these technologies, trends and the impact on how we tell and experience cinema in the comments below.