Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Application  



1.1  Overview  



1.1.1  Human eye  





1.1.2  Explainable Artificial Intelligence  







1.2  Saliency as a segmentation problem  







2 Algorithms  



2.1  Overview  





2.2  Example implementation  



2.2.1  Time complexity  





2.2.2  Pseudocode  





2.2.3  Difference in algorithms  









3 Datasets  





4 References  





5 External links  





6 See also  














Saliency map








 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


A view of the fort of Marburg (Germany) and the saliency Map of the image using color, intensity and orientation.

Incomputer vision, a saliency map is an image that highlights either the region on which people's eyes focus first or the most relevant regions for machine learning models.[1] The goal of a saliency map is to reflect the degree of importance of a pixel to the human visual system or an otherwise opaque ML model.

For example, in this image, a person first looks at the fort and light clouds, so they should be highlighted on the saliency map. Saliency maps engineered in artificial or computer vision are typically not the same as the actual saliency map constructed by biological or natural vision.

Application[edit]

Overview[edit]

Saliency maps have applications in a variety of different problems. Some general applications:

Human eye[edit]

Explainable Artificial Intelligence[edit]

Saliency as a segmentation problem[edit]

Saliency estimation may be viewed as an instance of image segmentation. In computer vision, image segmentation is the process of partitioning a digital image into multiple segments (sets of pixels, also known as superpixels). The goal of segmentation is to simplify and/or change the representation of an image into something that is more meaningful and easier to analyze. Image segmentation is typically used to locate objects and boundaries (lines, curves, etc.) in images. More precisely, image segmentation is the process of assigning a label to every pixel in an image such that pixels with the same label share certain characteristics.[7]

Algorithms[edit]

Overview[edit]

There are three forms of classic saliency estimation algorithms implementedinOpenCV:

In addition to classic approaches, neural-network-based are also popular. There are examples of neural networks for motion saliency estimation:

Example implementation[edit]

First, we should calculate the distance of each pixel to the rest of pixels in the same frame:

is the value of pixel , in the range of [0,255]. The following equation is the expanded form of this equation.

SALS(Ik) = |Ik - I1| + |Ik - I2| + ... + |Ik - IN|

Where N is the total number of pixels in the current frame. Then we can further restructure our formula. We put the value that has same I together.

SALS(Ik) = Σ Fn × |Ik - In|

Where Fn is the frequency of In. And the value of n belongs to [0,255]. The frequencies is expressed in the form of histogram, and the computational time of histogram is time complexity.

Time complexity[edit]

This saliency map algorithm has time complexity. Since the computational time of histogram is time complexity which N is the number of pixel's number of a frame. Besides, the minus part and multiply part of this equation need 256 times operation. Consequently, the time complexity of this algorithm is which equals to .

Pseudocode[edit]

All of the following code is pseudo MATLAB code. First, read data from video sequences.

for k = 2 : 1 : 13 % which means from frame 2 to 13,  and in every loop K's value increase one.
    I = imread(currentfilename); % read current frame
    I1 = im2single(I); % convert double image into single(requirement of command vlslic)
    l = imread(previousfilename); % read previous frame
    I2 = im2single(l);
    regionSize = 10; % set the parameter of SLIC this parameter setting are the experimental result. RegionSize means the superpixel size.
    regularizer = 1; % set the parameter of SLIC
    segments1 = vl_slic(I1, regionSize, regularizer); % get the superpixel of current frame
    segments2 = vl_slic(I2, regionSize, regularizer); % get superpixel of the previous frame
    numsuppix = max(segments1(:)); % get the number of superpixel all information about superpixel is in this link [http://www.vlfeat.org/overview/slic.html]
    regstats1 = regionprops(segments1, all);
    regstats2 = regionprops(segments2, all); % get the region characteristic based on segments1

After we read data, we do superpixel process to each frame. Spnum1 and Spnum2 represent the pixel number of current frame and previous pixel.

% First, we calculate the value distance of each pixel.
% This is our core code
for i = 1:1:spnum1 %  From the first pixel to the last one. And in every loop i++
    for j = 1:1:spnum2 % From the first pixel to the last one. j++. previous frame
        centredist(i:j) = sum((center(i) - center(j))); % calculate the center distance
    end
end

Then we calculate the color distance of each pixel, this process we call it contract function.

for i = 1:1:spnum1 % From first pixel of current frame to the last one pixel. I ++
    for j = 1:1:spnum2 % From first pixel of previous frame to the last one pixel. J++
        posdiff(i, j) = sum((regstats1(j).Centroid - mupwtd(:, i))); % Calculate the color distance.
    end
end

After this two process, we will get a saliency map, and then store all of these maps into a new FileFolder.

Difference in algorithms[edit]

The major difference between function one and two is the difference of contract function. If spnum1 and spnum2 both represent the current frame's pixel number, then this contract function is for the first saliency function. If spnum1 is the current frame's pixel number and spnum2 represent the previous frame's pixel number, then this contract function is for second saliency function. If we use the second contract function which using the pixel of the same frame to get center distance to get a saliency map, then we apply this saliency function to each frame and use current frame's saliency map minus previous frame's saliency map to get a new image which is the new saliency result of the third saliency function.

Saliency result

Datasets[edit]

The saliency dataset usually contains human eye movements on some image sequences. It is valuable for new saliency algorithm creation or benchmarking the existing one. The most valuable dataset parameters are spatial resolution, size, and eye-tracking equipment. Here is part of the large datasets table from MIT/Tübingen Saliency Benchmark datasets, for example.

Saliency datasets
Dataset Resolution Size Observers Durations Eyetracker
CAT2000 1920×1080px 4000 images 24 5 sec EyeLink 1000 (1000Hz)
EyeTrackUAV2 1280×720px 43 videos 30 33 sec EyeLink 1000 Plus (1000 Hz, binocular)
CrowdFix 1280×720px 434 videos 26 1-3 sec The Eyetribe Eyetracker (60 Hz)
SAVAM 1920×1080px 43 videos 50 20 sec SMI iViewXTM Hi-Speed 1250 (500Hz)

To collect a saliency dataset, image or video sequences and eye-tracking equipment must be prepared, and observers must be invited. Observers must have normal or corrected to normal vision and must be at the same distance from the screen. At the beginning of each recording session, the eye-tracker recalibrates. To do this, the observer fixates his gaze on the screen center. Then the session started, and saliency data are collected by showing sequences and recording eye gazes.

The eye-tracking device is a high-speed camera, capable of recording eye movements at least 250 frames per second. Images from the camera are processed by the software, running on a dedicated computer returning gaze data.

References[edit]

  1. ^ Subhash, Bijil (6 March 2022). "Explainable AI: Saliency Maps". Medium. Retrieved 26 May 2024.
  • ^ Guo, Chenlei; Zhang, Liming (Jan 2010). "A Novel Multiresolution Spatiotemporal Saliency Detection Model and Its Applications in Image and Video Compression". IEEE Transactions on Image Processing. 19 (1): 185–198. Bibcode:2010ITIP...19..185G. doi:10.1109/TIP.2009.2030969. ISSN 1057-7149. PMID 19709976. S2CID 1154218.
  • ^ Tong, Yubing; Konik, Hubert; Cheikh, Faouzi; Tremeau, Alain (2010-05-01). "Full Reference Image Quality Assessment Based on Saliency Map Analysis". Journal of Imaging Science and Technology. 54 (3): 30503–1–30503-14. doi:10.2352/J.ImagingSci.Technol.2010.54.3.030503. hdl:11250/142490.
  • ^ Goferman, Stas; Zelnik-Manor, Lihi; Tal, Ayellet (Oct 2012). "Context-Aware Saliency Detection". IEEE Transactions on Pattern Analysis and Machine Intelligence. 34 (10): 1915–1926. doi:10.1109/TPAMI.2011.272. ISSN 1939-3539. PMID 22201056.
  • ^ Jiang, Huaizu; Wang, Jingdong; Yuan, Zejian; Wu, Yang; Zheng, Nanning; Li, Shipeng (June 2013). "Salient Object Detection: A Discriminative Regional Feature Integration Approach". 2013 IEEE Conference on Computer Vision and Pattern Recognition. IEEE. pp. 2083–2090. arXiv:1410.5926. doi:10.1109/cvpr.2013.271. ISBN 978-0-7695-4989-7.
  • ^ a b Müller, Romy (2024). "How explainable AI affects human performance: A systematic review of the behavioural consequences of saliency maps". arXiv:2404.16042 [cs.HC].
  • ^ A. Maity (2015). "Improvised Salient Object Detection and Manipulation". arXiv:1511.02999 [cs.CV].
  • External links[edit]

    See also[edit]


    Retrieved from "https://en.wikipedia.org/w/index.php?title=Saliency_map&oldid=1230617565"

    Categories: 
    Computer vision
    Image processing
     



    This page was last edited on 23 June 2024, at 18:49 (UTC).

    Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



    Privacy policy

    About Wikipedia

    Disclaimers

    Contact Wikipedia

    Code of Conduct

    Developers

    Statistics

    Cookie statement

    Mobile view



    Wikimedia Foundation
    Powered by MediaWiki