I work in the Scalable Systems Team at Yahoo/Verizon Media, where we mainly work on developing distributed systems, analytics data platforms and recommendation systems. A couple of years ago, in 2018, our team was assigned a project whose purpose was to improve Verizon Media user experience in using VOD (non-live) streaming video services on mobile devices, like the videos appearing in the Finance, News and Sports portals. During our work we came to a number of interesting observations, regarding the nature of user habits and network infrastructure behaviour. We’re sharing these observations, as they could be helpful to engineers and researchers working on other video services similar to those available at Yahoo portals.
Streamed videos are usually divided into segments of constant length, 8 seconds in our case. Each segment is available in a number of qualities, where higher bitrate gives better video quality and resolution. Each segment is downloaded separately from the media server. The segments are not downloaded in advance in the beginning of the session, but are received continuously, as the video session proceeds. The video player application has an internal buffer to store downloaded segments till they are played. Buffer size is defined in seconds of playback time. The player application sends a request for the next segment when the internal buffer is not full. If a video segment playback is finished but the next segment is not downloaded yet, the player enters a rebuffering state, stalling the picture and waiting till the next segment becomes available in the buffer.
The video player decides per segment which bitrate to request. The higher quality segments have greater size and take longer to download. The algorithm responsible for choosing the bitrate of the next segment is called ABR (Adaptive Bitrate) algorithm. This algorithm may make its decision based on:
- The status of the internal buffer.
- The estimated network bandwidth.
- Additional parameters, like playing device characteristics, features of the video played and many more.
In general, when the network bandwidth is high and the buffer is not empty, the ABR requests a segment with a higher bitrate. On the other hand, when the buffer is empty or almost empty and the download speed is low, the ABR requests a segment with lower bitrate. This technique is called adaptive bitrate streaming.
The video session quality is usually measured by two parameters:
- Rebuffering ratio, which is defined to be the overall time the player spends in the rebuffering state divided by the overall playback time. In practice, however, different monitoring applications can get different rebuffering values on the same log data, hence results reported by different metric application can not be compared directly.
- Average bitrate, is the average bitrate of the segments played.
There is a clear tradeoff between the two parameters. If a higher bitrate is requested, the video is played in higher quality, improving user experience, but the download takes longer and there is a greater chance that the segment is not ready before the player consumes all the data in the buffer and the player must enter a rebuffering state. On the other hand, lower bitrate segments have lower quality, but are downloaded faster thereby reducing the chance for a rebuffering event. It is believed, however, that the user is ready to compromise on the video quality, but becomes displeased if they experience rebuffering.
Our initial assumption in this project was that by improving the ABR algorithm used by the video player we can significantly improve the user experience. However, during our study of logged data collected on various ABR implementations and their performance, we’ve revealed that the assumptions on both the user behaviour and network bandwidth variations patterns the ABR designs are based upon are not accurate. Hence in most cases an ABR does not improve the user experience and in some cases can even degrade it. Our observations and measurements were performed over videos embedded in Yahoo portal webpages. For dedicated movie streaming services, like Netflix, the situation regarding the player window size and the importance the viewer gives for video quality may be different.
Improving User Experience
Our work dealt with the mobile devices and all the measurements and tests were performed on Yahoo web video player. The observed average rebuffering ratio was equal to ~1% and reducing it even by 0.1% was considered as a significant milestone. The assumption was that this goal can be reached by improving the ABR algorithm used by the media player. Currently the ABR algorithm used by the media player is much similar to that used in ExoPlayer. This ABR algorithm is implemented as a set of rules, each rule implying different ways to choose the next segment bitrate, either according to the estimated network bandwidth, internal buffer occupancy, or other parameters. The final bitrate requested from the media server is the lowest of the bitrates proposed by each of the rules.
The first point we have to explain is, why the production average video bitrate was much lower than in the simulation, 1.3 Mbps in production versus 2.1 Mbps in simulation. We recognized there was a factor that we ignored in the simulation as irrelevant, but which turned to be most influencing in production. In the simulation, we assumed that the bottleneck for video quality is the available network bandwidth. However, the ABR algorithms also take into account the player window size, such that it never requests a video with the resolution higher than the actual player window size. It turned out that in 82% of all ABR choices the bitrate was limited by the player window size. More than that, if we remove from the statistics the first two segments of each session (their resolution is always very low due to a dedicated startup rule), for 89% of segments the bitrate is limited by the player window size and not by other factors. In Fig. 1 the three possibilities for bitrate choice are presented. The left chart shows the results for all segments while the right chart shows the results for “all except the first two segments.” The yellow, “resolution,” stands for the cases where the segment bitrate is chosen according to the player window resolution. The red, “equal” represents the cases when the player window size limit and the lowest of other rules gave the same result and the last case is the blue, “below,” is the situation where the segment resolution limit enforced by the bandwidth rule is actually lower than the limit given by the player window size. Of course only in the blue cases the bitrate can be improved by the ABR.
In the simulation we assumed the player window size to be HD, which corresponds to a bitrate of ~5 Mbs. Although we did not collect data about the player window resolution in production, we do know that the average video bitrate is about 1.3 Mbps, which approximately corresponds to Standard Definition video with resolution of 640x480. If we look only at the long video sessions, the average bitrate rises to about 2 Mbps, which roughly corresponds to 720x1280 video resolution. However, this result is surprising, since this resolution is below the display resolution of most smartphones as of 2018 and the bitrate is far below the bandwidth of even 3G cellular networks. The best explanation for the low resolution and bitrate of the videos we can think of comes not from the limitations enforced by poor network conditions, but from the fact that most people do not switch to the full screen mode when watching short videos on news sites, a fact known to video streaming engineers and enforced somewhat by an experiment, performed at Youtube, showing that most viewers don’t care about video quality. The situation may be different for full time films streaming services, like Netflix, but in our case, however, we can conclude that the network bandwidth is not the bottleneck for better video quality. In other words, the trivial ABR algorithm that always chooses the highest bitrate available for a given player window resolution can not be outperformed in 89% of cases, both is average bitrate and rebuffering ratio. But probably, is the ABR very effective in preventing rebuffering for the remaining 11% of its decisions?
We partially answered this question by analyzing the second disparity between the simulated and production results, which is why the production rebuffering ratio for the proposed ABR was not better than the default ABR, while in simulations the results were very different. Here, again, we came to the conclusion that the reason is the difference in simulated and actual environment, particularly in network conditions. Like many of the works in this field, we used a Norwegian bandwidth measurement set for simulating the network variable bandwidth. In this set, built on actual network measurements performed in 2012, the network bandwidth changes dynamically from high to low and from low to high over time and hence under those conditions the ABR has to adapt the requested segment bitrate to the ever changing network conditions and its influence on the video player performance is great.
However, it turned out that this scenario was not the common case in the real world of 2018. When we studied the video session logs available from the production, we’ve seen that the network behaves differently. In most, if not all cases of rebuffering events, the bandwidth was not gradually decreasing in advance, allowing the ABR algorithm to adapt the requested segment bitrate to the new network conditions and store more data in the buffer. Instead the network bandwidth was high and stable, till coming to a sudden halt, where the device was either fully disconnected or the download speed was extremely low. Moreover, in most cases at the moment of disconnection the internal player buffer was full, but the disconnection was long enough to drain the buffer entirely, leading to a rebuffering event. Of course, under those conditions no ABR could perform better than the existing one, since the network failure can not be predicted, and even if it could be, the player could not meet it better prepared. Hence, also in this case a trivial ABR is most likely to perform just as well as a most complicated one.
Moreover, many ABR implementations can actually degrade the user experience instead of improving it. A common example is when after a short rebuffering event the player switches to very low resolution for the following pair of segments and then switches back to the high resolution. The reason is that when an ABR calculates the available network bandwidth, it actually averages zero bandwidth of the disconnection period with the high bandwidth of the network after the connection returns. Therefore the ABR wrongfully assumes that the available bandwidth is very low and requests from the server lowest resolution available, while in practice the network is fast and highest resolution for the given window can be downloaded. In this situation a trivial ABR that just requests the highest resolution for a given window would perform better than a complex one.
Another issue arose when we tried to better understand the nature of the metric we were trying to improve, the rebuffering ratio in the web player. The rebuffering ratio is defined to be the overall rebuffering time divided by the overall playing time. Note, however, that there is no industry standard measure for this metric. Hence different monitoring applications involve different data preparations and cleaning techniques, therefore the resulting values can differ even when calculated on the same log data.The metrics used by Yahoo are relatively severe and the rebuffering values presented further can not be directly compared with values reported on other platforms.
While studying the web player report data, we noticed that the rebuffering ratio has very high variance even in the sequential days (Fig. 2). Since the result is an average of tens of thousands of sessions for every day, we expected the variance to be much lower.
We went deeper into the nature of the data, and it turned out that the rebuffering durations per session are also not distributed as we expected them to be. We’ve discovered that the rebuffering time length per session follows a long tail distribution, namely, a very small portion of video sessions contribute a very significant amount of rebuffering time. For example, for a certain day out of a representative sample of 1,000 sessions with an overall rebuffering time of 4,782 seconds, 1,214.8 seconds of rebuffering were contributed by 5 sessions. For those 5 sessions the average rebuffering ratio was almost 50%. However, most unlikely that a user would suffer such a high rebuffering ratio and would not quit the sessions within seconds, so these reports are unindicative and must reflect some erroneous player state rather than rebuffering. On the other hand, if we consider data related to these sessions as irrelevant and ignore it, the rebuffering ratio for that day drops from 0.8% to 0.6%, a ratio that was considered an excellent result.
Summarising our findings, we can claim that:
- Since the video resolution is mostly limited by the player window size, no ABR can significantly outperform the ABRs currently used in production from the point of view of average video bitrate. These conclusions are going to gain even more validity when more users switch to 5G cellular networks. We do not expect the mobile devices display resolution to grow as fast as the bandwidth.
- Since most of the time the available network bandwidth is high enough to allow the player to keep the internal buffer full with the video data in the best bitrate available and since the usual network problems are a full disconnection, which cannot be predicted, no ABR can outperform the ABR algorithms currently used from the point of view of reducing the rebuffering ratio. Moreover, the bandwidth measurements performed by the current ABR algorithms can be very misleading, yielding non relevant values.
- For the better understanding of the user experience, the rebuffering rate should not be represented as a single number. The actual distribution of rebuffering periods among video sessions and segments should be taken into consideration as well.
Thanks to Ohad Shacham and Eddie Bortnikov who took part in this research and to Ofer Shem Tov, Amit Agarwal and Deepak Somashekhara for reviewing.
For questions and suggestions, please contact Boris Kapchits.
Originally published at https://www.verizonmedia.com.