Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

(Top) 1 Properties 2 Formulae 2.1 Related indices 2.2 Other measures 3 Evaluation of indices 4 See also 5 Notes 6 References

Qualitative variation: Difference between revisions

Add links ●Article ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●Upload file ●Special pages ●Permanent link ●Page information ●Cite this page ●Get shortened URL ●Download QR code ●Wikidata item Print/export ●Download as PDF ●Printable version Print/export Appearance Help From Wikipedia, the free encyclopedia Browse history interactively

← Previous edit Next edit →

Content deleted Content added

VisualWikitext

Inline

Revision as of 10:47, 29 May 2013

Anindex of qualitative variation (IQV) is a measure of statistical dispersioninnominal distributions. There are a variety of these, but they have been relatively little-studied in the statistics literature. The simplest is the variation ratio, while the most sophisticated is the information entropy.

Properties

There are various indices of qualitative variation; a number are summarized and devised by Wilcox (Wilcox 1967), (Wilcox 1973), who requires the following standardization properties to be satisfied:

Variation varies between 0 and 1.
Variation is 0 if and only if all cases belong to a single category.
Variation is 1 if and only if cases are evenly divided across all category.^[1]

In particular, the value of these standardized indices does not depend on the number of categories or number of samples.

For any index, the closer to uniform the distribution, the larger the variance, and the larger the differences in frequencies across categories, the smaller the variance.

Indices of qualitative variation are in this sense complementary to information entropy, which is maximized when all cases belong to a single category and minimized in a uniform distribution, but they are not complementary in the sense of a particular IQV equaling 1 minus entropy. Indeed, information entropy can be used as an index of qualitative variation.

One characterization of a particular index of qualitative variation (IQV) is as a ratio of observed differences to maximum differences.

Formulae

Wilcox gives a number of formulae for various indices of QV (Wilcox 1973), the first, which he designates DM for "Deviation from the Mode", is a standardized form of the variation ratio, and is analogous to variance as deviation from the mean.

The formula for this is derived as follows:

M=\sum _{i=1}^{K}(f_{m}-f_{i})

where f_m is the modal frequency, K is the number of catagories and f_i is the frequency of the i^th group.

This can be simplified to

M=Kf_{m}-N

where N is the total size of the sample.

Freeman's index is^[2]

v=1-{\frac {f_{m}}{N}}

This is related to M as follows:

{\frac {({\frac {f_{m}}{N}})-{\frac {1}{K}}}{{\frac {N}{K}}{\frac {(K-1)}{N}}}}={\frac {M}{N(K-1)}}

The ModVR is then defined as

ModVR=1-{\frac {Kf_{m}-N}{N(K-1)}}

Low values of ModVR correspond to small amount of variation and high values to larger amounts of variation.

One formula for IQV,^[3] given as M2 in (Gibbs 1975, p. 472) is:

{\text{IQV}}:={\frac {K}{K-1}}\left(1-\sum _{i=1}^{K}p_{i}^{2}\right)

where K is the number of categories, and $p_{i}=f_{i}/N$ is the proportion of observations that fall in a given category i. The factor of ${\frac {K}{K-1}}$ is for standardization.

The unstandardized index, $\left(1-\sum _{i=1}^{K}p_{i}^{2}\right)$ , denoted as M1 (Gibbs 1975, p. 471), can be interpreted as the likelihood that a random pair of samples will belong to the same category (Lieberson 1969, p. 851), so this formula for IQV is a standardized likelihood of a random pair falling in the same category. M1 and M2 can be interpreted in terms of variance of a multinomial distribution (Swanson 1976) (there called an "expanded binomial model").

Related indices

The sum

\sum _{i=1}^{K}p_{i}^{2}

has also found application. This is known as the Simpson index in ecology and as the Herfindahl index or the Herfindahl-Hirschman index (HHI) in economics. A variant of this is known as the Hunter-Gaston index in microbiology^[4]

M1

The M1 statistic defined above has been proposed several times in a number of different settings under a variety of names. These include Gini's index of mutability,^[5] Simpson's measure of diversity,^[6] Bachi's index of linguistic homogeneity^[7], Mueller and Schuessler's index of qualitative variation,^[8] Gibbs and Martin's index of industry diversification,^[9] Lieberson's index.^[10] and Blau's index index in sociology, psychology and management studies.^[11] The formulation of all these indices are identical.

Simpson's D is defined as

D=1-\sum {\frac {n_{i}(n_{i}-1)}{n(n-1)}}

where n is the total sample size and n_i is the number of items in the i^th category.

For large n we have

u\sim 1-\sum _{i=1}^{K}p_{i}^{2}

Another statistic that has been proposed is the coefficient of unalikeability which ranges between 0 and 1.^[12]

u={\frac {c(x,y)}{n^{2}-n}}

where n is the sample size and c(x,y) = 1 if x and y are alike and 0 otherwise.

For large n we have

u\sim 1-\sum _{i=1}^{K}p_{i}^{2}

where K is the number of categories.

M2

Greenberg's monolingual non weighted index of linguistic diversity,^[13] is the M2 statistic defined above.

Other measures

Berger-Parker index

The Berger-Parker index equals the maximum $p_{i}$ value in the dataset, i.e. the proportional abundance of the most abundant type. This corresponds to the weighted generalized mean of the $p_{i}$ values when q approaches infinity, and hence equals the inverse of true diversity of order infinity (1/^∞D).

Rényi entropy

The Rényi entropy is a generalization of the Shannon entropy to other values of q than unity. It can be expressed:

{}^{q}H={\frac {1}{1-q}}\;\ln \left(\sum _{i=1}^{K}p_{i}^{q}\right)

which equals

{}^{q}H=\ln \left({1 \over {\sqrt[{q-1}]{\sum _{i=1}^{K}p_{i}p_{i}^{q-1}}}}\right)=\ln({}^{q}\!D)

This means that taking the logarithm of true diversity based on any value of q gives the Rényi entropy corresponding to the same value of q.

The value of ${}^{q}\!D$ is also known as the Hill number.^[14]

Evaluation of indices

Different indices give different values of variation, and may be used for different purposes: several are used and critiqued in the sociology literature especially.

If one wishes to simply make ordinal comparisons between samples (is one sample more or less varied than another), the choice of IQV is relatively less important, as they will often give the same ordering.

In some cases it is useful to not standardize an index to run from 0 to 1, regardless of number of categories or samples (Wilcox 1973, pp. 338), but one generally so standardizes it.

Notes

^ This can only happen if the number of cases is a multiple of the number of categories.

^ Freemen LC (1965) Elementary applied statistics. New York: John Wiley and Sons pp 40-43

^ IQV at xycoon

^ Hunter PR, Gaston MA (1988) Numerical index of the discriminatory ability of typing systems: an application of Simpson's index of diversity. J Clin Microbiol 26(11): 2465-2466

^ Gini CW (1912) Variability and mutability, contribution to the study of statistical distributions and relations. Studi Economico-Giuricici della R. Universita de Cagliari

^ Simpson EH (1949) Measurement of diversity. Nature 163:688

^ Bachi R (1956) A statistical analysis of the revival of Hebrew in Israel. In: Bachi R (ed) Scripta Hierosolymitana, Vol III, Jerusalem: Magnus press pp 179-247

^ Mueller JH, Schuessler KF (1961) Statistical reasoning in sociology. Boston: Houghton Mifflin

^ Gibbs JP, Martin, WT (1962) Urbanization, technology and division of labor: International patterns. American Sociological Review 27: 667-677

^ Lieberson S (1969) Measuring population diversity. American Sociological Review 34(6) 850-862

^ Blau P (1977) Inequality and Heterogeneity. Free Press, New York

^ Perry M, Kader G (2005) Variation as unalikeability. Teaching Stats 27 (2) 58-60

^ Greenberg JH (1956) The measurement of linguistic diversity. Language 32: 109-115

^ Hill MO (1973) Ecology 54 (2) 427-432

References

Gibbs, Jack P.; Poston, Jr., Dudley L. (1975), "The Division of Labor: Conceptualization and Related Measures", Social Forces, 53 (3): 468–476, doi:10.2307/2576589, JSTOR 2576589 {{citation}}: Unknown parameter |month= ignored (help)

Lieberson, Stanley (1969), "Measuring Population Diversity", American Sociological Review, 34 (6): 850–862, doi:10.2307/2095977, JSTOR 2095977 {{citation}}: Unknown parameter |month= ignored (help)

Swanson, David A. (1976), "A Sampling Distribution and Significance Test for Differences in Qualitative Variation", Social Forces, 55 (1): 182–184, doi:10.2307/2577102, JSTOR 2577102 {{citation}}: Unknown parameter |month= ignored (help)

Wilcox, Allen R. (1967), Indices of qualitative variation (PDF)

Wilcox, Allen R. (1973), "Indices of Qualitative Variation and Political Measurement", The Western Political Quarterly, 26 (2): 325–343, doi:10.2307/446831, JSTOR 446831 {{citation}}: Unknown parameter |month= ignored (help)

Retrieved from "https://en.wikipedia.org/w/index.php?title=Qualitative_variation&oldid=557326386" Categories: ●Statistical deviation and dispersion ●Summary statistics for categorical data ●Categorical data Hidden categories: ●Harv and Sfn no-target errors ●CS1 errors: unsupported parameter ●This page was last edited on 29 May 2013, at 10:47 (UTC). ●This version of the page has been revised. Besides normal editing, the reason for revision may have been that this version contains factual inaccuracies, vandalism, or material not compatible with the Creative Commons Attribution-ShareAlike License. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view