Jump to content
 







Main menu
   


Navigation  



Main page
Contents
Current events
Random article
About Wikipedia
Contact us
Donate
 




Contribute  



Help
Learn to edit
Community portal
Recent changes
Upload file
 








Search  

































Create account

Log in
 









Create account
 Log in
 




Pages for logged out editors learn more  



Contributions
Talk
 



















Contents

   



(Top)
 


1 Training  





2 Inference  





3 Separation  





4 References  














Structured support vector machine






Català
فارسی
 

Edit links
 









Article
Talk
 

















Read
Edit
View history
 








Tools
   


Actions  



Read
Edit
View history
 




General  



What links here
Related changes
Upload file
Special pages
Permanent link
Page information
Cite this page
Get shortened URL
Download QR code
Wikidata item
 




Print/export  



Download as PDF
Printable version
 
















Appearance
   

 






From Wikipedia, the free encyclopedia
 


The structured support-vector machine is a machine learning algorithm that generalizes the Support-Vector Machine (SVM) classifier. Whereas the SVM classifier supports binary classification, multiclass classification and regression, the structured SVM allows training of a classifier for general structured output labels.

As an example, a sample instance might be a natural language sentence, and the output label is an annotated parse tree. Training a classifier consists of showing pairs of correct sample and output label pairs. After training, the structured SVM model allows one to predict for new sample instances the corresponding output label; that is, given a natural language sentence, the classifier can produce the most likely parse tree.

Training[edit]

For a set of training instances , from a sample space and label space , the structured SVM minimizes the following regularized risk function.

The function is convex in because the maximum of a set of affine functions is convex. The function measures a distance in label space and is an arbitrary function (not necessarily a metric) satisfying and . The function is a feature function, extracting some feature vector from a given sample and label. The design of this function depends very much on the application.

Because the regularized risk function above is non-differentiable, it is often reformulated in terms of a quadratic program by introducing one slack variable for each sample, each representing the value of the maximum. The standard structured SVM primal formulation is given as follows.

Inference[edit]

At test time, only a sample is known, and a prediction function maps it to a predicted label from the label space . For structured SVMs, given the vector obtained from training, the prediction function is the following.

Therefore, the maximizer over the label space is the predicted label. Solving for this maximizer is the so-called inference problem and similar to making a maximum a-posteriori (MAP) prediction in probabilistic models. Depending on the structure of the function , solving for the maximizer can be a hard problem.

Separation[edit]

The above quadratic program involves a very large, possibly infinite number of linear inequality constraints. In general, the number of inequalities is too large to be optimized over explicitly. Instead the problem is solved by using delayed constraint generation where only a finite and small subset of the constraints is used. Optimizing over a subset of the constraints enlarges the feasible set and will yield a solution that provides a lower bound on the objective. To test whether the solution violates constraints of the complete set inequalities, a separation problem needs to be solved. As the inequalities decompose over the samples, for each sample the following problem needs to be solved.

The right hand side objective to be maximized is composed of the constant and a term dependent on the variables optimized over, namely . If the achieved right hand side objective is smaller or equal to zero, no violated constraints for this sample exist. If it is strictly larger than zero, the most violated constraint with respect to this sample has been identified. The problem is enlarged by this constraint and resolved. The process continues until no violated inequalities can be identified.

If the constants are dropped from the above problem, we obtain the following problem to be solved.

This problem looks very similar to the inference problem. The only difference is the addition of the term . Most often, it is chosen such that it has a natural decomposition in label space. In that case, the influence of can be encoded into the inference problem and solving for the most violating constraint is equivalent to solving the inference problem.

References[edit]


Retrieved from "https://en.wikipedia.org/w/index.php?title=Structured_support_vector_machine&oldid=1136227013"

Categories: 
Structured prediction
Support vector machines
 



This page was last edited on 29 January 2023, at 09:50 (UTC).

Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization.



Privacy policy

About Wikipedia

Disclaimers

Contact Wikipedia

Code of Conduct

Developers

Statistics

Cookie statement

Mobile view



Wikimedia Foundation
Powered by MediaWiki