Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Properties  





2 Formulae  



2.1  Related indices  





2.2  Other measures  







3 Evaluation of indices  





4 See also  





5 Notes  





6 References  














Qualitative variation: Difference between revisions







Add links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 




Print/export  



















Appearance
   

 





Help
 

From Wikipedia, the free encyclopedia
 


Browse history interactively
 Previous editNext edit 
Content deleted Content added
DrMicro (talk | contribs)
19,885 edits
DrMicro (talk | contribs)
19,885 edits
→‎Formulae: Corrections to formulae
Line 19: Line 19:

Wilcox gives a number of formulae for various indices of QV {{Harv|Wilcox|1973}}, the first, which he designates DM for "Deviation from the Mode", is a standardized form of the [[variation ratio]], and is analogous to [[variance]] as deviation from the mean.

Wilcox gives a number of formulae for various indices of QV {{Harv|Wilcox|1973}}, the first, which he designates DM for "Deviation from the Mode", is a standardized form of the [[variation ratio]], and is analogous to [[variance]] as deviation from the mean.



The formula for this is

The formula for this is derived as follows:



:<math> ModVR = \sum_{ i = 1 }^K ( f_m - f_i ) </math>

:<math> M = \sum_{ i = 1 }^K ( f_m - f_i ) </math>



where ''f''<sub>m</sub> is the modal frequency, ''K'' is the number of catagories and ''f''<sub>i</sub> is the frequency of the ''i''<sup>th</sup> group.

where ''f''<sub>m</sub> is the modal frequency, ''K'' is the number of catagories and ''f''<sub>i</sub> is the frequency of the ''i''<sup>th</sup> group.

Line 27: Line 27:

This can be simplified to

This can be simplified to



:<math> ModVR = Kf_m - N</math>

:<math> M = Kf_m - N</math>



where ''N'' is the total size of the sample.

where ''N'' is the total size of the sample.



Freeman's index is<ref name=Freemen1965>Freemen LC (1965) Elementary applied statistics. New York: John Wiley

Freeman's index is

and Sons pp 40-43</ref>



: <math> v = 1 - \frac{ f_m }{ N } </math>

: <math> v = 1 - \frac{ f_m }{ N } </math>



This is related to ModVR as follows:

This is related to M as follows:



:<math> \frac{ ( \frac{ f_m }{ N } ) - \frac{ 1 }{ K } }{ \frac{ N }{ K }\frac{ ( K - 1 )} { N } } = \frac{ ModVR }{ N( K - 1 ) }</math>

:<math> \frac{ ( \frac{ f_m }{ N } ) - \frac{ 1 }{ K } }{ \frac{ N }{ K }\frac{ ( K - 1 )} { N } } = \frac{ M }{ N( K - 1 ) }</math>



The ModVR is then defined as


:<math> ModVR = 1 - \frac{ Kf_m - N }{ N( K - 1 ) }</math>


Low values of ModVR correspond to small amount of variation and high values to larger amounts of variation.



One formula for IQV,<ref>[http://www.xycoon.com/qualitative_variation.htm IQV at xycoon]</ref> given as M2 in {{Harv|Gibbs|1975|p=472}} is:

One formula for IQV,<ref>[http://www.xycoon.com/qualitative_variation.htm IQV at xycoon]</ref> given as M2 in {{Harv|Gibbs|1975|p=472}} is:


Revision as of 10:47, 29 May 2013

Anindex of qualitative variation (IQV) is a measure of statistical dispersioninnominal distributions. There are a variety of these, but they have been relatively little-studied in the statistics literature. The simplest is the variation ratio, while the most sophisticated is the information entropy.

Properties

There are various indices of qualitative variation; a number are summarized and devised by Wilcox (Wilcox 1967), (Wilcox 1973), who requires the following standardization properties to be satisfied:

In particular, the value of these standardized indices does not depend on the number of categories or number of samples.

For any index, the closer to uniform the distribution, the larger the variance, and the larger the differences in frequencies across categories, the smaller the variance.

Indices of qualitative variation are in this sense complementary to information entropy, which is maximized when all cases belong to a single category and minimized in a uniform distribution, but they are not complementary in the sense of a particular IQV equaling 1 minus entropy. Indeed, information entropy can be used as an index of qualitative variation.

One characterization of a particular index of qualitative variation (IQV) is as a ratio of observed differences to maximum differences.

Formulae

Wilcox gives a number of formulae for various indices of QV (Wilcox 1973), the first, which he designates DM for "Deviation from the Mode", is a standardized form of the variation ratio, and is analogous to variance as deviation from the mean.

The formula for this is derived as follows:

where fm is the modal frequency, K is the number of catagories and fi is the frequency of the ith group.

This can be simplified to

where N is the total size of the sample.

Freeman's index is[2]

This is related to M as follows:

The ModVR is then defined as

Low values of ModVR correspond to small amount of variation and high values to larger amounts of variation.

One formula for IQV,[3] given as M2 in (Gibbs 1975, p. 472) is:

where K is the number of categories, and is the proportion of observations that fall in a given category i. The factor of is for standardization.

The unstandardized index, , denoted as M1 (Gibbs 1975, p. 471), can be interpreted as the likelihood that a random pair of samples will belong to the same category (Lieberson 1969, p. 851), so this formula for IQV is a standardized likelihood of a random pair falling in the same category. M1 and M2 can be interpreted in terms of variance of a multinomial distribution (Swanson 1976) (there called an "expanded binomial model").

Related indices

The sum

has also found application. This is known as the Simpson index in ecology and as the Herfindahl index or the Herfindahl-Hirschman index (HHI) in economics. A variant of this is known as the Hunter-Gaston index in microbiology[4]

M1

The M1 statistic defined above has been proposed several times in a number of different settings under a variety of names. These include Gini's index of mutability,[5] Simpson's measure of diversity,[6] Bachi's index of linguistic homogeneity[7], Mueller and Schuessler's index of qualitative variation,[8] Gibbs and Martin's index of industry diversification,[9] Lieberson's index.[10] and Blau's index index in sociology, psychology and management studies.[11] The formulation of all these indices are identical.

Simpson's D is defined as

where n is the total sample size and ni is the number of items in the ith category.

For large n we have

Another statistic that has been proposed is the coefficient of unalikeability which ranges between 0 and 1.[12]

where n is the sample size and c(x,y) = 1 if x and y are alike and 0 otherwise.

For large n we have

where K is the number of categories.

M2

Greenberg's monolingual non weighted index of linguistic diversity,[13] is the M2 statistic defined above.

Other measures

Berger-Parker index

The Berger-Parker index equals the maximum value in the dataset, i.e. the proportional abundance of the most abundant type. This corresponds to the weighted generalized mean of the values when q approaches infinity, and hence equals the inverse of true diversity of order infinity (1/D).

Rényi entropy

The Rényi entropy is a generalization of the Shannon entropy to other values of q than unity. It can be expressed:

which equals


This means that taking the logarithm of true diversity based on any value of q gives the Rényi entropy corresponding to the same value of q.

The value of is also known as the Hill number.[14]

Evaluation of indices

Different indices give different values of variation, and may be used for different purposes: several are used and critiqued in the sociology literature especially.

If one wishes to simply make ordinal comparisons between samples (is one sample more or less varied than another), the choice of IQV is relatively less important, as they will often give the same ordering.

In some cases it is useful to not standardize an index to run from 0 to 1, regardless of number of categories or samples (Wilcox 1973, pp. 338), but one generally so standardizes it.

See also

Notes

  1. ^ This can only happen if the number of cases is a multiple of the number of categories.
  • ^ Freemen LC (1965) Elementary applied statistics. New York: John Wiley and Sons pp 40-43
  • ^ IQV at xycoon
  • ^ Hunter PR, Gaston MA (1988) Numerical index of the discriminatory ability of typing systems: an application of Simpson's index of diversity. J Clin Microbiol 26(11): 2465-2466
  • ^ Gini CW (1912) Variability and mutability, contribution to the study of statistical distributions and relations. Studi Economico-Giuricici della R. Universita de Cagliari
  • ^ Simpson EH (1949) Measurement of diversity. Nature 163:688
  • ^ Bachi R (1956) A statistical analysis of the revival of Hebrew in Israel. In: Bachi R (ed) Scripta Hierosolymitana, Vol III, Jerusalem: Magnus press pp 179-247
  • ^ Mueller JH, Schuessler KF (1961) Statistical reasoning in sociology. Boston: Houghton Mifflin
  • ^ Gibbs JP, Martin, WT (1962) Urbanization, technology and division of labor: International patterns. American Sociological Review 27: 667-677
  • ^ Lieberson S (1969) Measuring population diversity. American Sociological Review 34(6) 850-862
  • ^ Blau P (1977) Inequality and Heterogeneity. Free Press, New York
  • ^ Perry M, Kader G (2005) Variation as unalikeability. Teaching Stats 27 (2) 58-60
  • ^ Greenberg JH (1956) The measurement of linguistic diversity. Language 32: 109-115
  • ^ Hill MO (1973) Ecology 54 (2) 427-432
  • References


    Retrieved from "https://en.wikipedia.org/w/index.php?title=Qualitative_variation&oldid=557326386"

    Categories: 
    Statistical deviation and dispersion
    Summary statistics for categorical data
    Categorical data
    Hidden categories: 
    Harv and Sfn no-target errors
    CS1 errors: unsupported parameter
     



    This page was last edited on 29 May 2013, at 10:47 (UTC).

    This version of the page has been revised. Besides normal editing, the reason for revision may have been that this version contains factual inaccuracies, vandalism, or material not compatible with the Creative Commons Attribution-ShareAlike License.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki