Jump to content

Main menu Navigation ●Main page ●Contents ●Current events ●Random article ●About Wikipedia ●Contact us ●Donate Contribute ●Help ●Learn to edit ●Community portal ●Recent changes ●Upload file

●Create account ●Log in ●Create account ● Log in Pages for logged out editors learn more ●Contributions ●Talk

(Top) 1 Applications 1.1 Example: sequence tagging 2 Techniques 2.1 Structured perceptron 3 References 4 External links

Structured prediction

●Català ●فارسی ●日本語 ●Русский ●Українська ●Tiếng Việt Edit links ●Article ●Talk ●Read ●Edit ●View history Tools Actions ●Read ●Edit ●View history General ●What links here ●Related changes ●Upload file ●Special pages ●Permanent link ●Page information ●Cite this page ●Get shortened URL ●Download QR code ●Wikidata item Print/export ●Download as PDF ●Printable version In other projects ●Wikimedia Commons Appearance From Wikipedia, the free encyclopedia

Machine learning and data mining
Part of a series on
Paradigms Supervised learning Unsupervised learning Online learning Batch learning Meta-learning Semi-supervised learning Self-supervised learning Reinforcement learning Curriculum learning Rule-based learning Quantum machine learning
Problems Classification Generative modeling Regression Clustering Dimensionality reduction Density estimation Anomaly detection Data cleaning AutoML Association rules Semantic analysis Structured prediction Feature engineering Feature learning Learning to rank Grammar induction Ontology learning Multimodal learning
Supervised learning (classification • regression) Apprenticeship learning Decision trees Ensembles Bagging Boosting Random forest k-NN Linear regression Naive Bayes Artificial neural networks Logistic regression Perceptron Relevance vector machine (RVM) Support vector machine (SVM)
Clustering BIRCH CURE Hierarchical k-means Fuzzy Expectation–maximization (EM) DBSCAN OPTICS Mean shift
Dimensionality reduction Factor analysis CCA ICA LDA NMF PCA PGD t-SNE SDL
Structured prediction Graphical models Bayes net Conditional random field Hidden Markov
Anomaly detection RANSAC k-NN Local outlier factor Isolation forest
Artificial neural network Autoencoder Cognitive computing Deep learning DeepDream Feedforward neural network Recurrent neural network LSTM GRU ESN reservoir computing Restricted Boltzmann machine GAN Diffusion model SOM Convolutional neural network U-Net Transformer Vision Mamba Spiking neural network Memtransistor Electrochemical RAM (ECRAM)
Reinforcement learning Q-learning SARSA Temporal difference (TD) Multi-agent Self-play
Learning with humans Active learning Crowdsourcing Human-in-the-loop RLHF
Model diagnostics Coefficient of determination Confusion matrix Learning curve ROC curve
Mathematical foundations Kernel machines Bias–variance tradeoff Computational learning theory Empirical risk minimization Occam learning PAC learning Statistical learning VC theory
Machine-learning venues ECML PKDD NeurIPS ICML ICLR IJCAI ML JMLR
Related articles Glossary of artificial intelligence List of datasets for machine-learning research List of datasets in computer vision and image processing Outline of machine learning
v t e

Structured predictionorstructured (output) learning is an umbrella term for supervised machine learning techniques that involves predicting structured objects, rather than scalar discreteorreal values.^[1]

Similar to commonly used supervised learning techniques, structured prediction models are typically trained by means of observed data in which the true prediction value is used to adjust model parameters. Due to the complexity of the model and the interrelations of predicted variables the process of prediction using a trained model and of training itself is often computationally infeasible and approximate inference and learning methods are used.

Applications[edit]

For example, the problem of translating a natural language sentence into a syntactic representation such as a parse tree can be seen as a structured prediction problem^[2] in which the structured output domain is the set of all possible parse trees. Structured prediction is also used in a wide variety of application domains including bioinformatics, natural language processing, speech recognition, and computer vision.

Example: sequence tagging[edit]

Sequence tagging is a class of problems prevalent in natural language processing, where input data are often sequences (e.g. sentences of text). The sequence tagging problem appears in several guises, e.g. part-of-speech tagging and named entity recognition. In POS tagging, for example, each word in a sequence must receive a "tag" (class label) that expresses its "type" of word:

This	DT
is	VBZ
a	DT
tagged	JJ
sentence	NN
.	.

The main challenge of this problem is to resolve ambiguity: the word "sentence" can also be a verb in English, and so can "tagged".

While this problem can be solved by simply performing classification of individual tokens, that approach does not take into account the empirical fact that tags do not occur independently; instead, each tag displays a strong conditional dependence on the tag of the previous word. This fact can be exploited in a sequence model such as a hidden Markov modelorconditional random field^[2] that predicts the entire tag sequence for a sentence, rather than just individual tags, by means of the Viterbi algorithm.

Techniques[edit]

Probabilistic graphical models form a large class of structured prediction models. In particular, Bayesian networks and random fields are popular. Other algorithms and models for structured prediction include inductive logic programming, case-based reasoning, structured SVMs, Markov logic networks, Probabilistic Soft Logic, and constrained conditional models. Main techniques:

Structured perceptron[edit]

One of the easiest ways to understand algorithms for general structured prediction is the structured perceptron of Collins.^[3] This algorithm combines the perceptron algorithm for learning linear classifiers with an inference algorithm (classically the Viterbi algorithm when used on sequence data) and can be described abstractly as follows. First define a "joint feature function" Φ(x, y) that maps a training sample x and a candidate prediction y to a vector of length n (x and y may have any structure; n is problem-dependent, but must be fixed for each model). Let GEN be a function that generates candidate predictions. Then:

Let

w

be a weight vector of length n

For a pre-determined number of iterations:

For each sample

x

in the training set with true output

t

Make a prediction

{\hat {y}}={\operatorname {arg\,max} }\,\{{y}\in {GEN}({x})\}\,({w}^{T}\,\phi ({x},{y}))

Update

w

, from

{\hat {y}}

t

{w}={w}+{c}(-\phi ({x},{\hat {y}})+\phi ({x},{t}))

c

islearning rate

In practice, finding the argmax over ${GEN}({x})$ will be done using an algorithm such as Viterbi or an algorithm such as max-sum, rather than an exhaustive search through an exponentially large set of candidates.

The idea of learning is similar to multiclass perceptron.

References[edit]

^ Gökhan BakIr, Ben Taskar, Thomas Hofmann, Bernhard Schölkopf, Alex Smola and SVN Vishwanathan (2007), Predicting Structured Data, MIT Press.

^ ^a ^b Lafferty, J.; McCallum, A.; Pereira, F. (2001). "Conditional random fields: Probabilistic models for segmenting and labeling sequence data" (PDF). Proc. 18th International Conf. on Machine Learning. pp. 282–289.

^ Collins, Michael (2002). Discriminative training methods for hidden Markov models: Theory and experiments with perceptron algorithms (PDF). Proc. EMNLP. Vol. 10.

Noah Smith, Linguistic Structure Prediction, 2011.
Michael Collins, Discriminative Training Methods for Hidden Markov Models, 2002.

External links[edit]

Implementation of Collins structured perceptron

Retrieved from "https://en.wikipedia.org/w/index.php?title=Structured_prediction&oldid=1177841516" Category: ●Structured prediction Hidden categories: ●Articles with short description ●Short description matches Wikidata ●This page was last edited on 29 September 2023, at 22:51 (UTC). ●Text is available under the Creative Commons Attribution-ShareAlike License 4.0; additional terms may apply. By using this site, you agree to the Terms of Use and Privacy Policy. Wikipedia® is a registered trademark of the Wikimedia Foundation, Inc., a non-profit organization. ●Privacy policy ●About Wikipedia ●Disclaimers ●Contact Wikipedia ●Code of Conduct ●Developers ●Statistics ●Cookie statement ●Mobile view