Converting HDR to SDR with FFmpeg

Converting HDR to SDR with FFmpeg

HDR (high dynamic range) videos rendered with PQ (perceptual quantizer) or HDR10 utilize the BT.2020 colourspace, which has the advantage of displaying colours with higher luminance than the BT.709 space which is used for high-definition television broadcasting or Blu-ray discs. BT.2020 capable displays can handle a much wider range of colours and depending on the technology used can produce darker blacks and brighter whites. But what happens if you attempt to play an HDR video on a normal RGB monitor or television with only Rec.709 capabilities? You might have guessed it. The colours look weak and washed out.

The following comparison takes a look at the perceived effect. The image on the right has been tone-mapped and simulates a still rendered in the BT.2020 colourspace and viewed on an HDR screen. The image on the left is what you will perceive when viewing it on SDR screen or RGB monitor. These images are meant for viewing on a Rec.709 capable display. If you are using an HDR monitor, you might not notice any difference at all.

The following diagrams illustrate the difference between the two colourspaces:

Rec. 709

Rec. 2020

So what is going on here? We are encountering colours in the Rec. 2020 gamut, which simply don’t exist in Rec. 709. You might expect these colours to be clipped at the maximum brightness of display, however this doesn’t happen. Out-of-gamut colours are moved back somewhere in the mid-range of the Rec. 709 gamut in an uncontrolled fashion. That is why they appear dull and with poor contrast. The above image suffers severely – especially with blue-green or cyan tones.

There are two methods to remedy this problem. Both involve the use of tone-mapping operators. You can achieve this beit either using an output filter before rendering the content to the screen or re-encoding the video source to the target colour space. This is useful in situations where the display doesn’t apply a filter at all. Some early 4k TVs did not include HDR support and thus rely on content in Rec. 709.
The opposite is called inverse tone-mapping and is used to properly render Rec. 709 content in the Rec. 2020 colour space.

So tone-mapping operators can be thought of as sophisticated functions designed to compress the dynamic range of a source to a similar output device with a smaller dynamic range. This is different to using LUTs as a LUT is ideally designed for a SINGLE type of output device and in the case of print, even a specific type of substrate.

Two types of operators exist: Global operators assign a compressed colour value to its corresponding HDR-value. This is applied to each pixel on an individual basis. The main advantage of this method is its performance as it can usually be achieved in real-time. However it’s possible to lose some details in extremely bright or dark scenes. Local operators can be applied to a greater range of HDR-sources as they are based on the idea that the human perception of variations in brightness is limited to small regions and not the entire image. Usually they apply a small radial filter, which is also modifies the colour values of neighbouring pixels. This can lead to halo effects around finer details, which have to be compensated for.

FFmpeg is an application I used to effectively convert BT.2020 to BT.709 by applying different built-in tone-mapping operators. It’s fast, open-source and rather simple to use. The commands that follow below extract PNG-stills from a video source, but you can for example pipe the output directly into your encoder of choice, if you are happy with the results.

ffmpeg.exe -i input.mkv -vf select=gte(n\,360) -vframes 1 output.png
This instructs ffmpeg to extract the 360th frame and output 1 frame to a PNG file. This image was also used as a reference for the comparison between the different filters.

ffmpeg.exe -i input.mkv -vf select=gte(n\,360),zscale=t=linear:npl=100,
r=tv,format=yuv420p -vframes 1 output.png

To understand what the above command does, think of ffmpeg and the zscale plugin as a command chain (the output of one command is applied to the next comma-separated command) and finally the output format.
The first step converts the YUV source to 32-bit floating point RGB values, however the colour primaries are still in BT.2020. I also set the peak luminance to 100 nits, as all SDR-TVs would normally be in the range of 100-200 nits. The next zscale-command converts the primaries to BT.709. This will cause clipping, that is why it is important to work in 32-bit floating point, so the colour information isn’t lost, but just out-of-gamut. At this point I apply the hable tone-mapping filter.
The final zscale applies a BT.709 transfer matrix and also converts the format back to YUV 4:2:0 8-bit. Check below to see the difference.

The only problem here is that clipped colours will be shifted into white. You will not notice this effect in the original. That is why you should apply a desaturation filter which will move the clipped colours back into gamut. Look at the red and orange lights in the comparison.

ffmpeg.exe -i input.mkv -vf select=gte(n\,360),zscale=t=linear:npl=100,
m=bt709:r=tv,format=yuv420p -vframes 1 E:\output.png

There are also other filters which you may use. (clip, gamma, linear, mobius, reinhard). Their function is explained in more detail on the FFmpeg filters page. The reason why I didn’t use them, is because they seem to produce either too dark or bright results, some even oversaturate certain colours. The images below are compared to the hable-filter:






Finally, to output the colour-corrected stream to an encoder such as x265, make a few modifications to the command. For example to encode an MKV file using a constant rate factor of 18.0 and using the “slower” preset:

ffmpeg.exe -i input.mkv -vf zscale=t=linear:npl=100,format=gbrpf32le,
format=yuv420p -c:v libx265 -crf 18 -preset slower output.mkv

It makes sense to tune encoding speed to available CPU resources. The tone-mapping algorithm place 100% load on a single CPU core. Null output of our tone-mapping setup results in an average playback speed of about 4.3 frames per second on an AMD Ryzen 7 1800X CPU running at 3.8 GHz. This leaves 7 cores open for encoding. Choosing an encoding speed which would result in a general FPS rate of or less than 4.3 frames per second would be the optimal choice. Check out my other guide on encoding to HEVC for advice.