Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Types of Data Reduction  



1.1  Dimensionality Reduction  





1.2  Numerosity Reduction  





1.3  Statistical modelling  







2 See also  





3 References  





4 Further reading  














Data reduction






Nederlands
 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


Data reduction is the transformation of numerical or alphabetical digital information derived empirically or experimentally into a corrected, ordered, and simplified form. The purpose of data reduction can be two-fold: reduce the number of data records by eliminating invalid data or produce summary data and statistics at different aggregation levels for various applications.[1] Data reduction does not necessarily mean loss of information. For example, the body mass index reduces two dimensions (body and mass) into a single measure, without any information being lost in the process.

When information is derived from instrument readings there may also be a transformation from analog to digital form. When the data are already in digital form the 'reduction' of the data typically involves some editing, scaling, encoding, sorting, collating, and producing tabular summaries. When the observations are discrete but the underlying phenomenon is continuous then smoothing and interpolation are often needed. The data reduction is often undertaken in the presence of reading or measurement errors. Some idea of the nature of these errors is needed before the most likely value may be determined.

An example in astronomy is the data reduction in the Kepler satellite. This satellite records 95-megapixel images once every six seconds, generating dozens of megabytes of data per second, which is orders-of-magnitudes more than the downlink bandwidth of 550 kB/s. The on-board data reduction encompasses co-adding the raw frames for thirty minutes, reducing the bandwidth by a factor of 300. Furthermore, interesting targets are pre-selected and only the relevant pixels are processed, which is 6% of the total. This reduced data is then sent to Earth where it is processed further.

Research has also been carried out on the use of data reduction in wearable (wireless) devices for health monitoring and diagnosis applications. For example, in the context of epilepsy diagnosis, data reduction has been used to increase the battery lifetime of a wearable EEG device by selecting and only transmitting EEG data that is relevant for diagnosis and discarding background activity.[2]

Types of Data Reduction[edit]

Dimensionality Reduction[edit]

When dimensionality increases, data becomes increasingly sparse while density and distance between points, critical to clustering and outlier analysis, becomes less meaningful. Dimensionality reduction helps reduce noise in the data and allows for easier visualization, such as the example below where 3-dimensional data is transformed into 2 dimensions to show hidden parts. One method of dimensionality reduction is wavelet transform, in which data is transformed to preserve relative distance between objects at different levels of resolution, and is often used for image compression.[3]

An example of dimensionality reduction.

Numerosity Reduction[edit]

This method of data reduction reduces the data volume by choosing alternate, smaller forms of data representation. Numerosity reduction can be split into 2 groups: parametric and non-parametric methods. Parametric methods (regression, for example) assume the data fits some model, estimate model parameters, store only the parameters, and discard the data. One example of this is in the image below, where the volume of data to be processed is reduced based on more specific criteria. Another example would be a log-linear model, obtaining a value at a point in m-D space as the product on appropriate marginal subspaces. Non-parametric methods do not assume models, some examples being histograms, clustering, sampling, etc.[4]

An example of data reduction via numerosity reduction

Statistical modelling[edit]

Data reduction can be obtained by assuming a statistical model for the data. Classical principles of data reduction include sufficiency, likelihood, conditionality and equivariance.[5]

See also[edit]

References[edit]

  1. ^ "Travel Time Data Collection Handbook" (PDF). Retrieved 6 December 2020.
  • ^ Iranmanesh, S.; Rodriguez-Villegas, E. (2017). "A 950 nW Analog-Based Data Reduction Chip for Wearable EEG Systems in Epilepsy". IEEE Journal of Solid-State Circuits. 52 (9): 2362–2373. doi:10.1109/JSSC.2017.2720636. hdl:10044/1/48764. S2CID 24852887.
  • ^ Han, J.; Kamber, M.; Pei, J. (2011). "Data Mining: Concepts and Techniques (3rd ed.)" (PDF). Retrieved 6 December 2020.
  • ^ Han, J.; Kamber, M.; Pei, J. (2011). "Data Mining: Concepts and Techniques (3rd ed.)" (PDF). Retrieved 6 December 2020.
  • ^ Casella, George (2002). Statistical inference. Roger L. Berger. Australia: Thomson Learning. pp. 271–309. ISBN 0-534-24312-6. OCLC 46538638.
  • Further reading[edit]


    Retrieved from "https://en.wikipedia.org/w/index.php?title=Data_reduction&oldid=1212744290"

    Category: 
    Exploratory data analysis
     



    This page was last edited on 9 March 2024, at 11:09 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki