|
Abstracts
| IJCV-06 |
K. Q. Weinberger and L. K. Saul (2006).
Unsupervised learning of image manifolds by semidefinite programming. International Journal of Computer Vision 70(1): 77-90. [pdf]
Abstract:
Can we detect low dimensional structure in high dimensional data sets of images? In this paper, we propose an algorithm for unsupervised learning of image manifolds by semidefinite programming. Given a data set of images, our algorithm computes a low dimensional representation of each image with the property that distances between nearby images are preserved. More generally, it can be used to analyze high dimensional data that lies on or near a low dimensional manifold. We illustrate the algorithm on easily visualized examples of curves and surfaces, as well as on actual images of faces, handwritten digits, and solid objects.
|
| AAAI-06 |
K. Q. Weinberger and L. K. Saul (2006).
An introduction to nonlinear dimensionality reduction by maximum variance unfolding.
To appear in Proceedings of the Twenty First National Conference on
Artificial Intelligence (AAAI-06), Boston, MA.
[pdf]
Abstract:
Many problems in AI are simplified by clever representations of sensory
or symbolic input. How to discover such representations automatically,
from large amounts of unlabeled data, remains a fundamental challenge.
The goal of statistical methods for dimensionality reduction is to
detect and discover low dimensional structure in high dimensional data.
In this paper, we review a recently proposed algorithm -- maximum
variance unfolding -- for learning faithful low dimensional
representations of high dimensional data. The algorithm relies on
modern tools in convex optimization that are proving increasingly useful
in many areas of machine learning.
|
| ICASSP-06 |
F. Sha and L. K. Saul (2006).
Large margin Gaussian mixture modeling for phonetic classification and recognition.
To appear in Proceedings of the International Conference on
Acoustics, Speech, and Signal Processing (ICASSP-06), Toulouse, France.
[pdf]
Abstract:
We develop a framework for large margin classification
by Gaussian mixture models (GMMs).
Large margin GMMs have many parallels to
support vector machines (SVMs) but
use ellipsoids to model classes instead of half-spaces.
Model parameters are trained discriminatively to
maximize the margin of correct classification,
as measured in terms of Mahalanobis distances.
The required optimization is convex over the model's
parameter space of positive semidefinite matrices
and can be performed efficiently.
Large margin GMMs are naturally suited to large
problems in multiway
classification; we apply them
to phonetic classification and recognition on the TIMIT database.
On both tasks, we obtain significant improvement over baseline systems trained
by maximum likelihood estimation.
For the problem of phonetic classification, our results are competitive
with other state-of-the-art classifiers, such as hidden
conditional random fields.
|
| NIPS-05 |
K. Q. Weinberger, J. Blitzer, and L. K. Saul (2006).
Distance metric learning for large margin nearest neighbor classification.
To appear in Y. Weiss, B. Schoelkopf, and J. Platt (eds.),
Advances in Neural Information Processing Systems 18.
MIT Press: Cambridge, MA.
[pdf]
Abstract:
We show how to learn a Mahanalobis distance metric for k-nearest
neighbor (kNN) classification by semidefinite programming. The metric is
trained with the goal that the k-nearest neighbors always belong to the
same class while examples from different classes are separated by a
large margin. On seven data sets of varying size and difficulty,we find
that metrics trained in this way lead to significant improvements in kNN
classification---for example, achieving a test error rate of 1.3% on the
MNIST handwritten digits. Our approach has many parallels to support
vector machines, including a convex objective function based on the
hinge loss, but does not require modifications for problems with large
numbers of classes.
|
| book chapter |
L. K. Saul, K. Q. Weinberger, J. H. Ham, F. Sha, and D. D. Lee (2006).
Spectral methods for dimensionality reduction.
To appear in O. Chapelle, B. Schoelkopf, and A. Zien (eds.),
Semisupervised Learning. MIT Press: Cambridge, MA.
[pdf]
Abstract:
Spectral methods have recently emerged as a powerful tool for nonlinear
dimensionality reduction and manifold learning. These methods are able
to reveal low dimensional structure in high dimensional data from the
top or bottom eigenvectors of specially constructed matrices. To
analyze data that lies on a low dimensional submanifold, the matrices
are constructed from sparse weighted graphs whose vertices represent
input patterns and whose edges indicate neighborhood relations. The main
computations for manifold learning are based on tractable, polynomial-time
optimizations, such as shortest path problems, least squares fits,
semidefinite programming, and matrix diagonalization. This chapter
provides an overview of unsupervised learning algorithms that can
be viewed as spectral methods for linear and nonlinear dimensionality reduction.
|
| ISMIR |
J. A. Burgoyne and L. K. Saul (2005).
Learning harmonic relationships in digital audio with Dirichlet-based hidden Markov models.
In Proceedings of the Sixth International Conference on Music Information Retrieval (ISMIR-05),
pages 438-443.
London, England.
[pdf]
Abstract:
Harmonic analysis is a standard musicological tool for
understanding many pieces of Western classical music
and making comparisons among them. Traditionally, this
analysis is done on paper scores, and most past research
in machine-assisted analysis has begun with digital representations
of them. Human music students are also taught
to hear their musical analyses, however, in both musical
recordings and performances. Our approach attempts to
teach machines to do the same, beginning with a corpus
of recorded Mozart symphonies. The audio files are first
transformed into an ordered series of normalized pitch
class profile (PCP) vectors. Simplified rules of tonal harmony
are encoded in a transition matrix. Classical music
tends to change key more frequently than popular music,
and so these rules account not only for chords, as most
previous work has done, but also for the keys in which
they function. A hidden Markov model (HMM) is used
with this transition matrix to train Dirichlet distributions
for major and minor keys on the PCP vectors. The system
tracks chords and keys successfully and shows promise
for a real-time implementation.
|
| ICMC-05 |
J. A. Burgoyne and L. K. Saul (2005).
Visualization of low dimensional structure in tonal pitch space.
In Proceedings of the International Computer Music Conference (ICMC-05), pages 243-246.
Barcelona, Spain.
[pdf]
Abstract:
In his 2001 monograph Tonal Pitch Space, Fred Lerdahl
defined a distance function over tonal and post-tonal
harmonies distilled from years of research on music cognition.
Although this work references the toroidal structure
commonly associated with harmonic space, it stops
short of presenting an explicit embedding of this torus. It
is possible to use statistical techniques to recreate such an
embedding from the distance function, yielding a more
complex structure than the standard toroidal model has
heretofore assumed. Nonlinear techniques can reduce the
dimensionality of this structure and be tuned to emphasize
global or local anatomy. The resulting manifolds highlight
the relationships inherent in the tonal system and offer a
basis for future work in machine-assisted analysis and music
theory.
|
| ICML-05 |
F. Sha and L. K. Saul (2005).
Analysis and extension of spectral methods for nonlinear dimensionality reduction.
In Proceedings of the Twenty Second International Conference on Machine Learning (ICML-05),
pages 785-792. Bonn, Germany.
[pdf]
Abstract:
Many unsupervised algorithms for nonlinear dimensionality reduction,
such as locally linear embedding (LLE) and Laplacian eigenmaps, are
derived from the spectral decompositions of sparse matrices. While
these algorithms aim to preserve certain proximity relations on average,
their embeddings are not explicitly designed to preserve local features
such as distances or angles. In this paper, we show how to construct a
low dimensional embedding that maximally preserves angles between nearby
data points. The embedding is derived from the bottom eigenvectors of
LLE and/or Laplacian eigenmaps by solving an additional (but small)
problem in semidefinite programming, whose size is independent of the
number of data points. The solution obtained by semidefinite programming
also yields an estimate of the data's intrinsic dimensionality.
Experimental results on several data sets demonstrate the merits of our approach.
|
| AISTATS-05 |
K. Q. Weinberger, B. D. Packer, and L. K. Saul (2005).
Nonlinear dimensionality reduction by semidefinite programming and kernel
matrix factorization.
In Z. Ghahramani and R. Cowell (eds.),
Proceedings of the Tenth International Workshop on Artificial Intelligence
and Statistics, pages 381-388. Barbados, West Indies.
Best student paper award. [pdf]
Abstract:
We describe an algorithm for nonlinear
dimensionality reduction based on semidefinite programming
and kernel matrix factorization. The algorithm learns
a kernel matrix for high dimensional data that lies on or near a low
dimensional manifold. In earlier work, the kernel matrix was
learned by maximizing the variance in feature space
while preserving the distances and angles between nearest neighbors.
In this paper, exploiting recent ideas for
semi-supervised learning on graphs, we show that for well-sampled manifolds,
the entire kernel matrix can be very accurately reconstructed
from a much smaller
submatrix of inner products between randomly chosen landmarks.
By representing the kernel matrix in this way,
we are able to achieve order-of-magnitude reductions in computation
time and to handle much larger data sets.
|
| AISTATS-05 |
J. H. Ham, D. D. Lee, and L. K. Saul (2005).
Semisupervised alignment of manifolds.
In Z. Ghahramani and R. Cowell (eds.),
Proceedings of the Tenth International Workshop on Artificial Intelligence
and Statistics, pages 120-127. Barbados, West Indies.
[pdf]
Abstract:
In this paper, we study a family of semi-supervised
learning algorithms for "aligning" different data sets
that are characterized by the same underlying manifold.
The optimizations of these algorithms are based
on graphs that provide a discretized approximation
to the manifold. Partial alignments of the data
sets -- obtained from prior knowledge of their
manifold structure or from pairwise correspondences
of subsets of labeled examples -- are completed by integrating
supervised signals with unsupervised frameworks
for manifold learning. As an illustration of this semisupervised setting,
we show how to learn mappings between different data sets of images that
are parameterized by the same underlying modes of variability
(e.g., pose and viewing angle). The curse of dimensionality in these
problems is overcome by exploiting the low dimensional structure
of image manifolds.
|
| NIPS-04 |
F. Sha and L. K. Saul (2005).
Real-time pitch determination of one or more voices by
nonnegative matrix factorization.
In L. K. Saul, Y. Weiss, and L. Bottou (eds.),
Advances in Neural Information Processing Systems 17,
pages 1233-1240.
MIT Press: Cambridge, MA.
[pdf]
Abstract: An auditory "scene", composed of overlapping acoustic sources, can
be viewed as a complex object whose constituent parts are
the individual sources. Pitch is known to be an
important cue for auditory scene analysis. In this paper,
with the goal of building agents that operate in human
environments, we describe a real-time system
to identify the presence of one or more voices
and compute their pitch. The signal processing
in the front end is based on instantaneous frequency estimation,
a method for tracking the partials of voiced speech,
while the pattern-matching
in the back end is based on nonnegative matrix factorization,
an unsupervised algorithm for learning the parts of complex objects.
While supporting a framework to analyze complicated auditory
scenes, our system maintains real-time operability and state-of-the-art
performance in clean speech.
|
| NIPS-04 |
J. Blitzer, K. Q. Weinberger, L. K. Saul, and F. C. N. Pereira (2005).
Hierarchical distributed representations for statistical language models.
In L. K. Saul, Y. Weiss, and L. Bottou (eds.),
Advances in Neural Information Processing Systems 17, pages 185-192.
MIT Press: Cambridge, MA.
[pdf]
Abstract: Statistical language models
estimate the probability of a word occurring in a given context. The
most common language models rely on a discrete enumeration of
predictive contexts (e.g., n-grams) and consequently fail to
capture and exploit statistical regularities across these contexts.
In this paper, we show how to learn hierarchical, distributed
representations of word contexts that maximize the predictive value of
a statistical language model. The representations are initialized by
unsupervised algorithms for linear and nonlinear dimensionality
reduction, then fed as input into a hierarchical
mixture of experts, where each expert is a multinomial distribution
over predicted words. While the distributed
representations in our model are inspired by the neural probabilistic
language model of Bengio et al, our
particular architecture enables us to work with significantly larger
vocabularies and training corpora. For example, on a large-scale
bigram modeling task involving a sixty thousand word vocabulary and a
training corpus of three million sentences, we demonstrate consistent
improvement over class-based bigram models.
We also discuss extensions of our approach to longer multiword
contexts.
|
| IMC-04 |
Y. Mao and L. K. Saul (2004).
Modeling distances in large-scale networks by matrix factorization.
In Proceedings of the Second Internet Measurement
Conference (IMC-04), pages 278-287. Sicily, Italy.
[pdf]
Abstract: In this paper, we propose a model for representing and predicting
distances in large-scale networks by matrix factorization.
The model is useful for network distance sensitive applications,
such as content distribution networks,
topology-aware overlays, and server selections. Our approach
overcomes several limitations of previous coordinates-based
mechanisms, which cannot model asymmetric routing or sub-optimal
routing policies. We present two algorithms---singular value
decomposition (SVD) and nonnegative matrix factorization (NMF)---that
learn scalable models of network distances from limited numbers of
measurements. Extensive simulations on real-world data sets show that
these models lead to more accurate and robust predictions of latencies
in large-scale networks than previous approaches.
|
| ICML-04 |
K. Q. Weinberger, F. Sha, and L. K. Saul (2004).
Learning a kernel matrix for nonlinear dimensionality reduction.
In Proceedings of the Twenty First International Confernence on Machine Learning
(ICML-04), pages 839-846. Banff, Canada. Outstanding student paper award.
[pdf,
matlab]
Abstract: We investigate how to learn a kernel
matrix for high dimensional data that lies on
or near a low dimensional manifold.
Noting that the kernel matrix implicitly maps the data into
a nonlinear feature space, we show how to
discover a mapping that unfolds the underlying
manifold from which the data was sampled.
The kernel matrix is constructed by maximizing
the variance in feature space subject to local
constraints that preserve the angles and distances
between nearest neighbors.
The main optimization involves an
instance of semidefinite programming---a fundamentally
different computation than previous algorithms for manifold
learning, such as Isomap and locally linear
embedding.
The optimized kernels perform better than polynomial and Gaussian
kernels for problems in manifold learning,
but worse for problems in large margin classification.
We explain these results in terms of the geometric properties of different kernels
and comment on various interpretations of other manifold learning
algorithms as kernel methods.
|
| CVPR-04 |
K. Q. Weinberger and L. K. Saul (2004).
Unsupervised learning of image manifolds by semidefinite programming.
In Proceedings of the IEEE Conference on Computer Vision and Pattern
Recognition (CVPR-04), vol. 2, pages 988-995. Washington D.C.
Outstanding student paper award.
[pdf,
matlab]
Abstract:
Can we detect low dimensional
structure in high dimensional data sets of images
and video? The problem of dimensionality
reduction arises often in computer vision
and pattern recognition. In this paper,
we propose a new solution to
this problem based on semidefinite programming.
Our algorithm can be used to analyze high dimensional data
that lies on or near a low dimensional manifold. It
overcomes certain limitations of previous work
in manifold learning, such as Isomap and locally linear
embedding.
We illustrate
the algorithm on easily visualized examples of curves and surfaces,
as well as on actual images of faces, handwritten digits,
and solid objects.
|
| ICASSP-04 |
V. Jain and L. K. Saul (2004).
Exploratory analysis and visualization of speech and music
by locally linear embedding.
In Proceedings of the International Conference of
Speech, Acoustics, and Signal Processing (ICASSP-04),
vol. 3, pages 984-987. Montreal, Canada.
[pdf]
Abstract
Many problems in voice recognition and
audio processing involve feature
extraction from raw waveforms. The goal of
feature extraction is to reduce the dimensionality of the
audio signal while preserving the
informative signatures that, for example, distinguish
different phonemes in speech or identify particular
instruments in music.
If the acoustic variability of a data set is
described by a small number of continuous features,
then we can imagine the data as lying on
a low dimensional manifold in the high dimensional
space of all possible waveforms. Locally linear embedding (LLE)
is an unsupervised learning algorithm for feature
extraction in this setting. In this paper, we
present results from the exploratory analysis
and visualization of speech and music by LLE.
|
| ICASSP-04 |
Y. Lin, D. D. Lee, and L. K. Saul (2004).
Nonnegative deconvolution for time of arrival estimation.
In Proceedings of the International Conference of
Speech, Acoustics, and Signal Processing (ICASSP-04), vol. 2, pages 377-380. Montreal, Canada.
[pdf]
Abstract
The interaural time difference (ITD) of arrival is a primary
cue for acoustic sound source localization. Traditional estimation
techniques for ITD based upon cross-correlation
are related to maximum-likelihood estimation of a simple
generative model. We generalize the time difference estimation
into a deconvolution problem with nonnegativity
constraints. The resulting nonnegative least squares optimization
can be efficiently solved using a novel iterative algorithm
with guaranteed global convergence properties. We
illustrate the utility of this algorithm using simulations and
experimental results from a robot platform.
|
| ICASSP-04 |
F. Sha, J. A. Burgoyne, and L. K. Saul (2004).
Multiband statistical learning for f0 estimation in speech.
In Proceedings of the International Conference of
Speech, Acoustics, and Signal Processing (ICASSP-04), vol. 5, pages 661-664. Montreal, Canada.
[pdf]
Abstract:
We investigate a simple algorithm that combines
multiband processing and least squares fits
to estimate f0 contours in speech.
The algorithm is untraditional in several respects: it
makes no use of FFTs or autocorrelation at the pitch
period; it updates the pitch incrementally on a sample-by-sample basis;
it avoids peak picking and does
not require interpolation in time or frequency to obtain high resolution
estimates; and it works reliably, in real time,
without the need for postprocessing to produce smooth contours.
We show that a baseline implementation of
the algorithm, though already quite accurate, is
significantly improved by incorporating a model of statistical learning
into its final stages. Model parameters are estimated from
training data to minimize the likelihood of
gross errors in f0 as well as errors
in classifying voiced versus unvoiced speech.
Experimental results on several databases confirm the benefits
of statistical learning.
|
| JMLR-03 |
L. K. Saul and S. T. Roweis (2003).
Think globally, fit locally: unsupervised learning of low dimensional manifolds.
Journal of Machine Learning Research 4:119-155.
[ps.gz,
pdf]
Abstract:
The problem of dimensionality reduction arises in many fields of
information processing, including machine learning, data compression,
scientific visualization, pattern recognition, and neural computation.
Here we describe locally linear embedding (LLE), an unsupervised
learning algorithm that computes low dimensional, neighborhood
preserving embeddings of high dimensional data. The data, assumed to be
sampled from an underlying manifold, are mapped into a single global
coordinate system of lower dimensionality. The mapping is derived from
the symmetries of locally linear reconstructions, and the actual
computation of the embedding reduces to a sparse eigenvalue problem.
Notably, the optimizations in LLE--though capable of generating highly
nonlinear embeddings--are simple to implement, and they do not involve
local minima. In this paper, we describe the implementation of the
algorithm in detail and discuss several extensions that enhance its
performance. We present results of the algorithm applied to data sampled
from known manifolds, as well as to collections of images of faces,
lips, and handwritten digits. These examples are used to provide
extensive illustrations of the algorithm's performance--both successes
and failures--and to relate the algorithm to previous and ongoing work
in nonlinear dimensionality reduction.
|
| EuroSpeech-03 |
L. K. Saul, F. Sha, and D. D. Lee (2003).
Statistical signal processing with nonnegativity constraints.
In Proceedings of the Eighth European Conference
on Speech Communication and Technology, volume 2, pages 1001-1004, Geneva, Switzerland.
[pdf]
Abstract:
Nonnegativity constraints arise frequently in statistical learning
and pattern recognition. Multiplicative updates provide natural
solutions to optimizations involving these constraints. One well
known set of multiplicative updates is given by the Expectation-
Maximization algorithm for hidden Markov models, as used in
automatic speech recognition. Recently, we have derived similar
algorithms for nonnegative deconvolution and nonnegative
quadratic programming. These algorithms have applications to
low-level problems in voice processing, such as fundamental
frequency estimation, as well as high-level problems, such as
the training of large margin classifiers. In this paper, we describe
these algorithms and the ideas that connect them.
|
| ICML-03 |
J. H. Ham, D. D. Lee, and L. K. Saul (2003).
Learning high dimensional correspondences from low dimensional manifolds.
In Proceedings of the ICML 2003 Workshop on
The Continuum from Labeled to Unlabeled Data in Machine Learning and Data Mining,
pages 34-41, Washington, D.C.
[pdf]
Abstract:
Many different high dimensional data sets are
characterized by the same underlying modes of
variability. When these modes of variability
are continuous and few in number, they can
be viewed as parameterizing a low dimensional
manifold. The manifold provides a compact
shared representation of the data, suggesting correspondences
between the high dimensional examples
from different data sets. These correspondences,
though naturally induced by the underlying
manifold, are difficult to learn using traditional
methods in supervised learning. In this
paper, we generalize three methods in unsupervised
learning -- principal components analysis,
factor analysis, and locally linear embedding -- to
discover subspaces and manifolds that provide
common low dimensional representations of different
high dimensional data sets. We use the
shared representations discovered by these algorithms
to put high dimensional examples from
different data sets into correspondence. Finally,
we show that a notion of self-correspondence
between examples in the same data set can be
used to improve the performance of these algorithms
on small data sets. The algorithms are
demonstrated on images and text.
|
| COLT-03 |
F. Sha, L. K. Saul, and D. D. Lee (2003).
Multiplicative updates for large margin classifiers.
In Proceedings of the Sixteenth Annual Conference on Computational Learning Theory,
pages 188-202, Washington D.C.
[pdf]
Abstract:
Various problems in nonnegative quadratic programming
arise in the training of large margin classifiers.
We derive multiplicative updates for these problems that
converge monotonically to the desired solutions for
hard and soft margin classifiers. The updates differ strikingly in
form from other multiplicative updates used in machine learning.
In this paper, we provide complete proofs of
convergence for these updates and
extend previous work to incorporate
sum and box constraints in addition to nonnegativity.
|
| AISTATS-03 |
A. I. Schein, L. K. Saul, and L. Ungar (2003).
A generalized linear model for principal
component analysis of binary data.
In C. M. Bishop and B. J. Frey (eds.),
Proceedings of the Ninth International Workshop on Artificial Intelligence
and Statistics, pages 14-21, Key West, FL.
[pdf]
Abstract:
We investigate a generalized linear model for
dimensionality reduction of binary data. The
model is related to principal component analysis (PCA) in the same way that logistic
regression is related to linear regression. Thus
we refer to the model as logistic PCA. In this
paper, we derive an alternating least squares
method to estimate the basis vectors and generalized linear coefficients of the logistic PCA
model. The resulting updates have a simple
closed form and are guaranteed at each iteration to improve the model's likelihood. We
evaluate the performance of logistic PCA as
measured by reconstruction error rates on
data sets drawn from four real world applications. In general, we find
that logistic PCA is
much better suited to modeling binary data
than conventional PCA.
|
| NIPS-02 |
L. K. Saul, D. D. Lee, C. L. Isbell, and Y. LeCun (2003).
Real time voice processing with audiovisual feedback: toward
autonomous agents with perfect pitch.
In S. Becker, S. Thrun, and K. Obermayer (eds.),
Advances in Neural Information Processing Systems 15, pages 1205-1212. MIT Press:
Cambridge, MA.
[pdf,
matlab]
Abstract:
We have implemented a real time front end for detecting voiced
speech and estimating its fundamental frequency. The front end performs the
signal processing for voice-driven agents that attend to the pitch
contours of human speech and provide continuous audiovisual feedback.
The algorithm we use for pitch tracking has several distinguishing
features: it makes no use of FFTs or autocorrelation at the pitch
period; it updates the pitch incrementally on a sample-by-sample basis;
it avoids peak picking and does
not require interpolation in time or frequency to obtain high resolution
estimates; and it works reliably over a four octave range, in real time,
without the need for postprocessing to produce smooth contours. The
algorithm is based on two simple ideas in neural computation: the
introduction of a purposeful nonlinearity, and the error signal of a
least squares fit. The pitch tracker is used in two real time multimedia
applications: a voice-to-MIDI player that synthesizes
electronic music from vocalized melodies, and an audiovisual Karaoke
machine with multimodal feedback.
Both applications run on a laptop and display the
user's pitch scrolling across the screen as he or she
sings into the computer.
|
| NIPS-02 |
F. Sha, L. K. Saul, and D. D. Lee (2003).
Multiplicative updates for nonnegative quadratic programming in support vector machines.
In S. Becker, S. Thrun, and K. Obermayer (eds.),
Advances in Neural Information Processing Systems 15,
pages 1065-1073. MIT Press:
Cambridge, MA.
[pdf]
Abstract:
We derive multiplicative updates for solving the nonnegative quadratic
programming problem in support vector machines (SVMs). The updates have
a simple closed form, and we prove that they converge monotonically to
the solution of the maximum margin hyperplane. The updates optimize the
traditionally proposed objective function for SVMs. They do not involve
any heuristics such as choosing a learning rate or deciding which
variables to update at each iteration. They can be used to adjust all
the quadratic programming variables in parallel with a guarantee of
improvement at each iteration. We analyze the asymptotic convergence of
the updates and show that the coefficients of non-support vectors decay
geometrically to zero at a rate that depends on their margins. In
practice, the updates converge very rapidly to good classifiers.
|
| |