Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Measurement  



1.1  Source selection  





1.2  Settings  





1.3  Viewers  



1.3.1  Number of viewers  





1.3.2  Viewer selection  







1.4  Test environment  



1.4.1  Crowdsourcing  







1.5  Analysis of results  



1.5.1  Subject screening  



1.5.1.1  Advanced models  











2 Standardized testing methods  



2.1  Examples  



2.1.1  Single-Stimulus  





2.1.2  Double-stimulus or multiple stimulus  







2.2  Choice of methodology  







3 Databases  





4 References  





5 External links  














Subjective video quality







Add links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Subjective video qualityisvideo quality as experienced by humans. It is concerned with how video is perceived by a viewer (also called "observer" or "subject") and designates their opinion on a particular video sequence. It is related to the field of Quality of Experience. Measuring subjective video quality is necessary because objective quality assessment algorithms such as PSNR have been shown to correlate poorly with subjective ratings. Subjective ratings may also be used as ground truth to develop new algorithms.

Subjective video quality tests are psychophysical experiments in which a number of viewers rate a given set of stimuli. These tests are quite expensive in terms of time (preparation and running) and human resources and must therefore be carefully designed.

In subjective video quality tests, typically, SRCs ("Sources", i.e. original video sequences) are treated with various conditions (HRCs for "Hypothetical Reference Circuits") to generate PVSs ("Processed Video Sequences").[1]

Measurement[edit]

The main idea of measuring subjective video quality is similar to the mean opinion score (MOS) evaluation for audio. To evaluate the subjective video quality of a video processing system, the following steps are typically taken:

Many parameters of the viewing conditions may influence the results, such as room illumination, display type, brightness, contrast, resolution, viewing distance, and the age and educational level of viewers. It is therefore advised to report this information along with the obtained ratings.

Source selection[edit]

Typically, a system should be tested with a representative number of different contents and content characteristics. For example, one may select excerpts from contents of different genres, such as action movies, news shows, and cartoons. The length of the source video depends on the purpose of the test, but typically, sequences of no less than 10 seconds are used.

The amount of motion and spatial detail should also cover a broad range. This ensures that the test contains sequences which are of different complexity.

Sources should be of pristine quality. There should be no visible coding artifacts or other properties that would lower the quality of the original sequence.

Settings[edit]

The design of the HRCs depends on the system under study. Typically, multiple independent variables are introduced at this stage, and they are varied with a number of levels. For example, to test the quality of a video codec, independent variables may be the video encoding software, a target bitrate, and the target resolution of the processed sequence.

It is advised to select settings that result in ratings which cover the full quality range. In other words, assuming an Absolute Category Rating scale, the test should show sequences that viewers would rate from bad to excellent.

Viewers[edit]

Number of viewers[edit]

Viewers are also called "observers" or "subjects". A certain minimum number of viewers should be invited to a study, since a larger number of subjects increases the reliability of the experiment outcome, for example by reducing the standard deviation of averaged ratings. Furthermore, there is a risk of having to exclude subjects for unreliable behavior during rating.

The minimum number of subjects that are required for a subjective video quality study is not strictly defined. According to ITU-T, any number between 4 and 40 is possible, where 4 is the absolute minimum for statistical reasons, and inviting more than 40 subjects has no added value. In general, at least 15 observers should participate in the experiment. They should not be directly involved in picture quality evaluation as part of their work and should not be experienced assessors.[2] In other documents, it is also claimed that at minimum 10 subjects are needed to obtain meaningful averaged ratings.[3]

However, most recommendations for the number of subjects have been designed for measuring video quality encountered by a home television or PC user, where the range and diversity of distortions tend to be limited (e.g., to encoding artifacts only). Given the large ranges and diversity of impairments that may occur on videos captured with mobile devices and/or transmitted over wireless networks, generally, a larger number of human subjects may be required.

Brunnström and Barkowsky have provided calculations for estimating the minimum number of subjects necessary based on existing subjective tests.[4] They claim that in order to ensure statistically significant differences when comparing ratings, a larger number of subjects than usually recommended may be needed.

Viewer selection[edit]

Viewers should be non-experts in the sense of not being professionals in the field of video coding or related domains. This requirement is introduced to avoid potential subject bias.[2]

Typically, viewers are screened for normal vision or corrected-to-normal vision using Snellen charts. Color blindness is often tested with Ishihara plates.[2]

There is an ongoing discussion in the QoE community as to whether a viewer's cultural, social, or economic background has a significant impact on the obtained subjective video quality results. A systematic study involving six laboratories in four countries found no statistically significant impact of subject's language and culture / country of origin on video quality ratings.[5]

Test environment[edit]

Subjective quality tests can be done in any environment. However, due to possible influence factors from heterogenous contexts, it is typically advised to perform tests in a neutral environment, such as a dedicated laboratory room. Such a room may be sound-proofed, with walls painted in neutral grey, and using properly calibrated light sources. Several recommendations specify these conditions.[6][7] Controlled environments have been shown to result in lower variability in the obtained scores.[5]

Crowdsourcing[edit]

Crowdsourcing has recently been used for subjective video quality evaluation, and more generally, in the context of Quality of Experience.[8] Here, viewers give ratings using their own computer, at home, rather than taking part in a subjective quality test in laboratory rooms. While this method allows for obtaining more results than in traditional subjective tests at lower costs, the validity and reliability of the gathered responses must be carefully checked.[9]

Analysis of results[edit]

Opinions of viewers are typically averaged into the mean opinion score (MOS). To this aim, the labels of categorical scales may be translated into numbers. For example, the responses "bad" to "excellent" can be mapped to the values 1 to 5, and then averaged. MOS values should always be reported with their statistical confidence intervals so that the general agreement between observers can be evaluated.

Subject screening[edit]

Often, additional measures are taken before evaluating the results. Subject screening is a process in which viewers whose ratings are considered invalid or unreliable are rejected from further analysis. Invalid ratings are hard to detect, as subjects may have rated without looking at a video, or cheat during the test. The overall reliability of a subject can be determined by various procedures, some of which are outlined in ITU-R and ITU-T recommendations.[2][7] For example, the correlation between a person's individual scores and the overall MOS, evaluated for all sequences, is a good indicator of their reliability in comparison with the remaining test participants.

Advanced models[edit]

While rating stimuli, humans are subject to biases. These may lead to different and inaccurate scoring behavior and consequently result in MOS values that are not representative of the “true quality” of a stimulus. In the recent years, advanced models have been proposed that aim at formally describing the rating process and subsequently recovering noisiness in subjective ratings. According to Janowski et al., subjects may have an opinion bias that generally shifts their scores, as well as a scoring imprecision that is dependent on the subject and stimulus to be rated.[10] Li et al. have proposed to differentiate between subject inconsistency and content ambiguity.[11]

Standardized testing methods[edit]

There are many ways to select proper sequences, system settings, and test methodologies. A few of them have been standardized. They are thoroughly described in several ITU-R and ITU-T recommendations, among those ITU-R BT.500[7] and ITU-T P.910.[2] While there is an overlap in certain aspects, the BT.500 recommendation has its roots in broadcasting, whereas P.910 focuses on multimedia content.

A standardized testing method usually describes the following aspects:

Another recommendation, ITU-T P.913,[6] gives researchers more freedom to conduct subjective quality tests in environments different from a typical testing laboratory, while still requiring them to report all details necessary to make such tests reproducible.

Examples[edit]

Below, some examples of standardized testing procedures are explained.

Single-Stimulus[edit]

Double-stimulus or multiple stimulus[edit]

Choice of methodology[edit]

Which method to choose largely depends on the purpose of the test and possible constraints in time and other resources. Some methods may have fewer context effects (i.e. where the order of stimuli influences the results), which are unwanted test biases.[12] In ITU-T P.910, it is noted that methods such as DCR should be used for testing the fidelity of transmission, especially in high quality systems. ACR and ACR-HR are better suited for qualification tests and – due to giving absolute results – comparison of systems. The PC method has a high discriminatory power, but it requires longer test sessions.

Databases[edit]

The results of subjective quality tests, including the used stimuli, are called databases. A number of subjective picture and video quality databases based on such studies have been made publicly available by research institutes. These databases – some of which have become de facto standards – are used globally by television, cinematic, and video engineers around the world to design and test objective quality models, since the developed models can be trained against the obtained subjective data. An overview of publicly available databases has been compiled by the Video Quality Experts Group, and video assets have been made available in the Consumer Digital Video Library.

References[edit]

  • ^ a b c d e f g h ITU-T Rec. P.910 : Subjective video quality assessment methods for multimedia applications, 2008.
  • ^ Winkler, Stefan. "On the properties of subjectiveratings in video quality experiments". Proc. Quality of Multimedia Experience, 2009.
  • ^ Brunnström, Kjell; Barkowsky, Marcus (2018-09-25). "Statistical quality of experience analysis: on planning the sample size and statistical significance testing". Journal of Electronic Imaging. 27 (5): 053013. Bibcode:2018JEI....27e3013B. doi:10.1117/1.jei.27.5.053013. ISSN 1017-9909. S2CID 53058660.
  • ^ a b Pinson, M. H.; Janowski, L.; Pepion, R.; Huynh-Thu, Q.; Schmidmer, C.; Corriveau, P.; Younkin, A.; Callet, P. Le; Barkowsky, M. (October 2012). "The Influence of Subjects and Environment on Audiovisual Subjective Tests: An International Study" (PDF). IEEE Journal of Selected Topics in Signal Processing. 6 (6): 640–651. Bibcode:2012ISTSP...6..640P. doi:10.1109/jstsp.2012.2215306. ISSN 1932-4553. S2CID 10667847.
  • ^ a b ITU-T P.913: Methods for the subjective assessment of video quality, audio quality and audiovisual quality of Internet video and distribution quality television in any environment, 2014.
  • ^ a b c d e f ITU-R BT.500: Methodology for the subjective assessment of the quality of television pictures, 2012.
  • ^ Hossfeld, Tobias (2014-01-15). "Best Practices for QoE Crowdtesting: QoE Assessment With Crowdsourcing". IEEE Transactions on Multimedia. 16 (2): 541–558. doi:10.1109/TMM.2013.2291663. S2CID 16862362.
  • ^ Hossfeld, Tobias; Hirth, Matthias; Redi, Judith; Mazza, Filippo; Korshunov, Pavel; Naderi, Babak; Seufert, Michael; Gardlo, Bruno; Egger, Sebastian (October 2014). "Best Practices and Recommendations for Crowdsourced QoE - Lessons learned from the Qualinet Task Force "Crowdsourcing"". hal-01078761. {{cite journal}}: Cite journal requires |journal= (help)
  • ^ Janowski, Lucjan; Pinson, Margaret (2015). "The Accuracy of Subjects in a Quality Experiment: A Theoretical Subject Model". IEEE Transactions on Multimedia. 17 (12): 2210–2224. doi:10.1109/tmm.2015.2484963. ISSN 1520-9210. S2CID 22343847.
  • ^ Li, Zhi; Bampis, Christos G. (2017). "Recover Subjective Quality Scores from Noisy Measurements". 2017 Data Compression Conference (DCC). IEEE. pp. 52–61. arXiv:1611.01715. doi:10.1109/dcc.2017.26. ISBN 9781509067213. S2CID 14251604.
  • ^ Pinson, Margaret and Wolf, Stephen. "Comparing Subjective Video Quality Testing Methodologies". SPIE Video Communications and Image Processing Conference, Lugano, Switzerland, July 2003.
  • External links[edit]


    Retrieved from "https://en.wikipedia.org/w/index.php?title=Subjective_video_quality&oldid=1219212571"

    Categories: 
    Film and video technology
    Digital television
    Video codecs
    Hidden categories: 
    CS1 errors: missing periodical
    Articles with short description
    Short description is different from Wikidata
     



    This page was last edited on 16 April 2024, at 11:42 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki