The Protocol Wars: How to win and enhance User eXperience – Part 1
I thought it was time to write a blog about a topic I have been talking quite a lot about: Optimizing User eXperience. Because, in my opinion, it doesn’t matter how good your benchmarks or metrics are, at the end of the day it is all about how happy a user is with your nicely build system and what their perceived UX is.
In Part 1 of this series we will deep-dive in the basics of the modern (video) codecs used by vendors like Citrix, VMware or Nutanix (FRAME). In the next parts we will look why a GPU will help with a video codec and how to optimize the User eXperience for different scenarios.
To know how to optimize the User eXperience it is good to have (basic) understanding of the methods behind encoders. For example, the HDX3DPro encoder of Citrix uses a video codec called H264, or For Active Changing Regions uses a combination of JPEG and H264. But if you don’t know the strong and the weak spots of those codecs than it can be hard to select the right codec for a certain situation. So let’s start with digging into the basics of those codecs!
If you read the documentation of the different remoting solutions, for example, the Citrix Documenation about HDX3DPro, you will see many different terms:
If you’ll Google on those terms you will need days to read all the articles written about them. I will describe the most-used ones and the ones you’ll really need to know about in this article, hopefully in a simple and understandable way. Because those terms are the basics of how modern encoders work.
Let’s start with Colors! Most of us see them every day, from the moment you’ll wake up till the moment you go to sleep.
But did you know that the color I see, if I look for example at grass, doesn’t have to be the same color you will see? Yes, I’m serious, colors are a perception!
We never really perceive what color really is, as it physically is. This fact makes color the most relative medium in art.Josef Albers
The perception of Colors is a combination of the way our eyes and brain are build and work. And this is different for every person, as our DNA and is different per person.
But perception is not only created by the way we are build, it is also by the way our brain is wired to process input. And this can lead to very strange things. For example, look to the picture below, do you see the “squared shape” in the middle of the Gradient background? Do you think the Colors on the far ends are the same?
No? And what if I removed the background a bit?
We perceive Color by its surroundings. This is even better illustrated in the images below.
The reason why we see a different color in the first picture is that our brains processes B differently because it is in the shadow. This phenomenal is also called “Optical Illusion”.
Color is nothing more than electromagnetic radiation on a certain frequency, which is measured in nano-meters. This spectrum also contains X-rays, infrared, UV and even the microwaves you use to heat your food.
When you look at the figures of the different colors humans can see you will notice that compared to other animals we only see a small fraction of what the world could show us.
This has all to do with the number of cone cells our eyes have, we only have 3 of them and we only see the colors in the 380 to 740 nm space.
So humans can only perceive around 10 Million different colors, but have you noticed that we even talk about billion colors in the digital world? That question will be answered in a few minutes.
‘Pigeons are stupid because they shit on everything, but they can see 100 million colors”Rody Kossen during a TeamRGE presentation about Colors.
An important thing to remember, especially when we will look at compression, later on, is that our brain and eyes are not well trained to see the difference in color, it is more trained to see difference in brightness. If we would compare two types of green which are just slightly different we would only notice the difference in color when they are directly next to each other.
The Bit-depth is a description of the amount of colors can be displayed. But it is used in 2 ways which can be a “bit” confusing:
- The number of bits used to indicate the color of a single pixel, for example 24 bit.
- Number of bits used for each color component of a single-pixel ( Red Green Blue), for example 8 bit. Sometimes it is also combined with Alpha, which describes transparency.
The most commonly used descriptions for Bit-depth are:
- High Color or 16 bit Color = 5 bits per color + 1 unused bit = 32.768 colors (2^15)
- True Color or 24 bit Color + 8 bit Alpha = 8 bits per color = 16.777.216 colors (2^24)
- Deep Color or 30 bit Color + 10 bit Alpha = 10 bits per color = 1.073 billion colors (2^30)
Huh, wait.. what? 1 Billion colors? But we can only see 10 million colors!
That is still true, however, if a display only can create 10 million colors we will see a new issue: Color Banding
The reason for 1 billion colors is that there are so many more colors available that the transition from, for example, light green to green is much smoother. This will result in much better image quality and clears out the banding issue.
The Color Gamut describes the subset of colors which can be displayed in relation to the human eye. The Color Gamut is often displayed in a graph form like this:
The numbers around the shape (from 380 up to 700) describe the wavelength of the color in nano-meters. The x and y-axis are the Hue and Saturation of colors.
There are a few standardized gamuts which you might recognize:
- Rec.709 (Commonly used on Blu ray and Television broadcasting)
- Rec.2020 (Ultra HD Blu ray)
- Adobe RGB
In the graph, you can clearly see the big difference between Rec.709 and Rec.2020. But even Rec.2020 doesn’t contain all the colors the human eye can see. At the moment it is just way too expensive and difficult to reproduce every color with a display. An important thing with Color Gamut is that you display them correctly, so if you would view a Rec.709 input on a display set to Rec.2020 the colors will be way of. This is also important when you are a photographer and you want to print your image on a printer, a mismatch of gamut could lead to a very strange result.
An important part of displaying colors on a digital screen is to take into account that most screens do not display colors correctly. For example, most televisions screens display colors way too saturated, this can be seen at most electronic stores, they will show you the brightest and most saturated pictures of flowers, etc. This is just because a television showing a “dull” picture doesn’t sell very well.
To make sure that a screen displays colors correctly you must calibrate it. Calibration can be performed with a special device called a “colorimeter”, like the X-Rite i1 Display Pro, which measures the displayed colors of a reference image. The difference in color is called ΔE or Delta E, this value should be lower than 2 to be unnoticeable by the human eye.
The software used for these devices can create special color-profiles, known as ICC (International Color Consortium) profiles, which you can import in your Windows 10 OS. If your monitor supports it you can also directly tweak the color settings there so it’s device-independent.
There are a few screens on the market that are calibrated out of the box, like the EIZO ColorEdge, but those are mostly targeted for Photographers or Graphic Designers. There are also less costly versions from Dell in their Ultrasharp series.
Compression and codecs
Now we know the basics of colors, what bit-depths and color gamuts are it’s time to dig into how this is used to compress and encode digital images or videos. Because almost every digital image we see is compressed as uncompressed just takes to much storage space.
So compression is used in almost every image we see on the internet or every video we watch on YouTube. There are mainly 3 different types of compression:
Lossy compression is an irreversible compression where information gets lost in the process. It is mainly used to (highly) reduce the size of data at the cost of data loss. A few common lossy compressions are MP3, JPEG, and MPEG-4.
Lossless compression is a form of compression where you shrink the data size without losing information. The level of compression is much less than lossy but it is still reversible. I think you can imagine that nobody used ZIP if they lost data with it. So ZIP or RAR or TAR are a few common lossless compressions, but also Dolby True-HD / PNG / FLAC are examples of lossless compression.
The last compression type is a “special” one. It sits between Lossless and Lossy, it’s a form of compression where you remove details that we either can’t see or hear. It is still irreversible as we leave information behind but it can heavily reduce the file size if done right.
As mentioned earlier, our eyes are not so sensitive for the difference in color (Chroma) but rather brightness (Luma). This fact is used in a very common color compression called “Chroma Subsampling”. Chroma Subsampling reduced the amount of Chroma information in an image without lowering the Luma. There are different levels for Chroma Subsampling, and the most common are 4:4:4 or 4:2:0. They are also referenced as YUV 4:4:4 or YUV 4:2:0, where YUV is just another (digital) way to describe the colors (Y=Luma, U&V describe the Chroma).
So how does this work? What is 4:2:0? Let’s take a closer look!
With Chroma Subsampling we take a set of pixels (usually 4 by 2 pixels) and depending on the subsampling rate we only keep a part of their information. This is done by the following formula:
Chroma Subsampling = X:Y:Z X = The number of horizontal pixels and describes the Luma Y = The number of Chromatic pixels in the first row of X pixels Z = The number of Chromatic pixels in the second row of X pixels
This is the most commonly found subsampling rate and is used on YouTube / Blu rays and DVDs.
With 4:2:0 we take all the luma (4 pixels wide), 2 pixels of the first row (the first and the third), and 0 pixels of the second row. In comparison to the original image, we will only have 25% of the color information.
With 4:2:2 we take all the luma (4 pixels wide), 2 pixels of the first row and 2 pixels of the second row. In this situation, you remove 50% off all color information.
In many situations we won’t notice this “loss” of color information, but as you might remember our eyes see the difference in color better if they are next to each other. So in which situation do we see this? When displaying text!
In the above example, you can clearly see that with YUV 4:2:0 colors seem to “blend” into each especially when we use to very opposite colors like red letters on a blue background. This can also be seen when using content that contains thin lines like AutoCAD, you will also notice that the colors look duller.
So Chroma Subsampling should not be used in every situation. In the next blog parts, we will dive deeper into how the encoders use Chroma Subsampling and how to fine-tune it for different situations.
Video Codecs – The magicians of this World
If you thought that magic didn’t exist then you have not looked at video codecs yet. These codecs have many tricks up their sleeve to compress moving images without any really noticeable quality loss. They even compress better than image-codecs like JPEG. If we would compare a screenshot of a webpage vs a 5-second movie with 30FPS the movie will be many times smaller than a single screenshot! I’m not kidding!
Why a video codec?
So why do we need a video codec in the first place? Well, the direct answer is storage space. If we would not compress a video stream we would really need some big storage, especially with 4K or 8K video streams.
If we do a quick calculation we can see how much storage a video stream needs to store 1 second of data:
Uncompressed Full HD - 24-Bit (3 Byte) Colors: 1920 * 1080 * 3 = 5.9 MB per frame 30 FPS = 177 MB/Second 60 FPS = 354 MB/Second Uncompressed 4K - 24-Bit (3 Byte) Colors : 3840 *2160 * 3 = 24.3 MB per frame 30 FPS = 729 MB/Second 60 FPS = 1458 MB/Second
You could imagine that you need an insane amount of storage just for 1 movie. If we compare this with an encoded stream we would only need around 8 GB for a FullHD movie. Isn’t that magic? Curious how they do this?
There are many video codecs available today. So let’s take a look at the most common ones.
H264 or AVC is the most used codec at the moment. It is used on Blu-Ray disks but also by many streaming media companies like Netflix. Due to its success almost every device can use hardware decoding to decode H264 to ensure optimal playback performance. It is also used by Remote Graphics vendors like Citrix, VMware, and Nutanix in their encoders to ensure high-quality graphics.
Sid Bala wrote a great blogpost on how H264 works, which you can find here: https://sidbala.com/h-264-is-magic/
This is the direct successor of H264 and is also called High Efficiency Video Codec (HEVC). It was introduced as a standard in 2013 and is used on Ultra-HD Blu-ray discs. This codec was developed to reduce the bitrate to facilitate higher resolutions (up to 8K and HDR). The average bitrate reduction can be up to 60% in comparison with H264. But this higher compression comes at the cost of performance, it uses 8 times! more CPU than H264.
There are currently quite some devices/vendors supporting H265:
- NVIDIA GPUs since 2015
- Intel since Skylake (partial) / Kaby Lake
- Apple since iPhone 6
- Android since Lollipop (5.0)
One of the downsides of H264 and H265 is that it requires users to pay royalties. This was one of the reasons for Google to create its own codec for YouTube, due to the huge increase of people using their platform there was a need for another codec. VP9 was released in 2013 and is Open-Source and royalty-free, it can be found at https://www.webmproject.org/.
It is supported by most common browsers like Mozilla Firefox, Google Chrome, and Microsoft Edge. It is not supported in Microsoft Internet Explorer.
The codec is supported by almost all devices on the market today.
Another interesting codec is AV1, which was introduced in 2018. It was developed by the Alliance for Open Media ( https://aomedia.org/) which is supported by quite some major companies:
AV1 claims to have a 20%~40% higher data compression than VP9 & H265 and is optimized for web-usage and video conferencing. However, as it is a newly introduced codec, there is currently no commonly available hardware decoder. The first hardware decoder was released by Chips&Media in Oktober 2019 (https://en.chipsnmedia.com/page/product_view/5919). As NVIDIA and Intel are both part of this Alliance I would expect more hardware support for it in the upcoming generations of chips.
Without the support of hardware, this codec is very CPU hungry. The current AV1 encoder requires a dual Intel Xeon 8280 (4 GHz / 56 Cores and 112 Threads) to encode a 4K stream with 80 FPS. In comparison, this same setup would encode 365 Frames per Second if it would encode it as H265.
As mentioned earlier with the different codecs, hardware support is very important. That’s why most GPUs contain special “ASICs” (application-specific integrated circuit) to perform those tasks. Intel calls them Intel QuickSync Video, NVIDIA uses NVENC (NVIDIA ENcoder) and NVDEC (NVIDIA DECoder). These specialized chips are used to either encode or decode video streams.
It depends on the vendor which codecs they support, and there can also be a big difference between encoding or decoding.
If we take a look at the support of NVIDIA cards for Encoding we will see that they only support H264 and H265:
The support for codecs is much broader for the decoding part:
As you can see in the above graphs it is important to check the requirements for encoding and decoding against the GPU you are buying. So for example if you want to use Hardware Decoding for YouTube (VP9) make sure to buy at least a Pascal card.
Also in terms of Remoting Protocols you need to check which codecs are used, for example, Citrix can use H265 with Citrix HDX3DPro Hardware Encoding, and check if your graphics card is supporting it.
I think we handled all the basics! You have read about things like color gamut, compression, YUV, codecs and hardware support. These are the ingredients of the modern encoders used by remote display protocols like ICA or Blast. In the upcoming blogs, you will read more about how to use these basics to understand which setting you should use in which situation.
This will help you to create the best User eXperience!
Sources: Thanks to Simon Schaber for the YUV4:4:4 images