Generalized chi-squared distribution: Difference between revisions

Latest revision as of 23:16, 22 June 2024

Generalized chi-squared distribution
Probability density function
Cumulative distribution function
Notation	${\tilde {\chi }}({\boldsymbol {w}},{\boldsymbol {k}},{\boldsymbol {\lambda }},s,m)$
Parameters	${\boldsymbol {w}}$ , vector of weights of noncentral chi-square components ${\boldsymbol {k}}$ , vector of degrees of freedom of noncentral chi-square components ${\boldsymbol {\lambda }}$ , vector of non-centrality parameters of chi-square components $s$ , scale of normal term $m$ , offset
Support	$x\in {\begin{cases}[m,+\infty ){\text{ if }}w_{i}\geq 0,s=0,\\(-\infty ,m]{\text{ if }}w_{i}\leq 0,s=0,\\\mathbb {R} {\text{ otherwise.}}\end{cases}}$
Mean	$\sum _{j}w_{j}(k_{j}+\lambda _{j})+m$
Variance	$2\sum _{j}w_{j}^{2}(k_{j}+2\lambda _{j})+s^{2}$
CF	${\frac {\exp \left[it\left(m+\sum _{j}{\frac {w_{j}\lambda _{j}}{1-2iw_{j}t}}\right)-{\frac {s^{2}t^{2}}{2}}\right]}{\prod _{j}\left(1-2iw_{j}t\right)^{k_{j}/2}}}$

Definition[edit]

The generalized chi-squared variable may be described in multiple ways. One is to write it as a weighted sum of independent noncentral chi-square variables

{{\chi }'}^{2}

and a standard normal variable

z

:^[1]^[2]

Here the parameters are the weights

w_{i}

, the degrees of freedom

k_{i}

and non-centralities

\lambda _{i}

of the constituent non-central chi-squares, and the coefficients

s

and

m

of the normal. Some important special cases of this have all weights

w_{i}

of the same sign, or have central chi-squared components, or omit the normal term.

Since a non-central chi-squared variable is a sum of squares of normal variables with different means, the generalized chi-square variable is also defined as a sum of squares of independent normal variables, plus an independent normal variable: that is, a quadratic in normal variables.

Another equivalent way is to formulate it as a quadratic form of a normal vector

{\boldsymbol {x}}

:^[3]^[4]

Here

\mathbf {Q_{2}}

is a matrix,

{\boldsymbol {q_{1}}}

is a vector, and

q_{0}

is a scalar. These, together with the mean

{\boldsymbol {\mu }}

and covariance matrix

\mathbf {\Sigma }

of the normal vector

{\boldsymbol {x}}

, parameterize the distribution. The parameters of the former expression (in terms of non-central chi-squares, a normal and a constant) can be calculated in terms of the parameters of the latter expression (quadratic form of a normal vector).^[4] If (and only if)

\mathbf {Q_{2}}

in this formulation is positive-definite, then all the

w_{i}

in the first formulation will have the same sign.

For the most general case, a reduction towards a common standard form can be made by using a representation of the following form:^[5]

where D is a diagonal matrix and where x represents a vector of uncorrelated standard normal random variables.

Computing the pdf/cdf/inverse cdf/random numbers[edit]

The probability density, cumulative distribution, and inverse cumulative distribution functions of a generalized chi-squared variable do not have simple closed-form expressions. However, numerical algorithms ^[5]^[2]^[6]^[4] and computer code (Fortran and C, Matlab, R, Python, Julia) have been published to evaluate some of these, and to generate random samples.

Applications[edit]

The generalized chi-squared is the distribution of statistical estimates in cases where the usual statistical theory does not hold, as in the examples below.

In model fitting and selection[edit]

Classifying normal vectors using Gaussian discriminant analysis[edit]

{\boldsymbol {x}}

is a normal vector, its log likelihood is a quadratic formof

{\boldsymbol {x}}

, and is hence distributed as a generalized chi-squared. The log likelihood ratio that

{\boldsymbol {x}}

arises from one normal distribution versus another is also a quadratic form, so distributed as a generalized chi-squared.^[4]

In Gaussian discriminant analysis, samples from multinormal distributions are optimally separated by using a quadratic classifier, a boundary that is a quadratic function (e.g. the curve defined by setting the likelihood ratio between two Gaussians to 1). The classification error rates of different types (false positives and false negatives) are integrals of the normal distributions within the quadratic regions defined by this classifier. Since this is mathematically equivalent to integrating a quadratic form of a normal vector, the result is an integral of a generalized-chi-squared variable.^[4]

In signal processing[edit]

has a generalized chi-squared distribution of a particular form. The difference from the standard chi-squared distribution is that

Z_{i}

are complex and can have different variances, and the difference from the more general generalized chi-squared distribution is that the relevant scaling matrix A is diagonal. If