This site is best viewed in a browser that conforms to web standards.
There is a close correspondence between the behavior of self-organizing neural networks and the statistical method of principal component analysis. In fact, a self-organizing, fully interconnected 2-layer network with i inputs and j outputs (with i > j) can be used to extract the first j principal components from the input vector, thus reducing the size of the input vector by j-i elements. With j=1, such a network acts as a maximum eigenfilter by extracting the first principal component from the input vector. We want to start the discussion of the underlying technique with reference to this simplest case.

Fig. 1: Hebbian-based
maximum eigenfilter. The nth element of input vector
consisting of m inputs is transformed by the weight vector
into
.
For clarity: m is the number of inputs, n the number of training
samples.
The algorithm used to train the network is based on Hebb's
postulate of learning. It states that a synaptic weight
varies with time, growing strong when the presynaptic signal
and
the postsynaptic signal
coincide with each other (see Fig. 1).
Assuming that all nodes use linear activation functions and no
biases are applied such that

we may write

Here n denotes the number of training samples, and
the learning rate. The attentive reader will notice that the
unconstrained use of this learning algorithm would drive
to infinity because the weight would always grow but never
be decreased. In order to overcome this problem, some sort of
normalization or saturation factor needs to be introduced. A
proportional decrease of
by a
normalization term introduces competition among the synapses,
which, as a principle of self-organization, is essential for the
stabilization of the learning process.
Including a normalization term into Eq. 2, we can rewrite the learning rule as

where the addition in the denominator covers all of the synapses
connected to the output neuron. For a low rate of learning
, this
equation can - without much loss of precision - be simplified
to

The first two terms
in
Eq. 4 represent the usual Hebbian
modification to
. They account for the self-amplification
effect responsible for the self-organizing nature of the described
infrastructure. The second term
prevents an unlimited growth of
and
is responsible for stabilization. It transforms
into
a form1
that is dependent on both the synaptic weight
and the output
.

Substituting Eq. 5 into Eq. 4, we get a more general form of the learning rule:

During the course of training, we present all n elements
of the input vector x to the small network. Initializing the
weights to small positive values and choosing
appropriately small, the algorithm converges quickly to its final
state. The estimated weight vector w can be frozen by
suspending further adaption. We may now feed the input vector
x into the network and get a transformed time series from the
output neuron: the first principal component of the input
vector.
Combining several maximum eigenfilters as proposed above into one 2-layer network and implementing competition between the output nodes, we can now generate an infrastructure able to perform a complete Hebbian-based PCA (see Fig. 2).

Fig. 2: Principal Component Analysis (PCA) using a Hebbian 2-layer network. The input and output nodes use linear activation functions and are fully interconnected with Hebbian links. The unsupervised Hebbian learning algorithm extracts the first j principal components from input vector x using the weight vector w. The resulting output vector y may then be used as the input vector for a regular feedforward network. In our implementation, the PCA component and the regular computation component are combined as symbionts into one network.
Now, the synaptic weight
is
adapted in accordance with a generalized form of Hebbian
learning:

The adjusted weight
is
calculated as

Eq. 7 may require some further
explanation. As shown in Fig. 2, the PCA
network now has i inputs and j outputs with i >
j. The output
is calculated during the forward pass as

where the vector
consists of all the synaptic links
connected to neuron j. In contrast, the vector
represented by
in Eq. 9 is the vector of all synaptic links
from neuron i to all output neurons. Using the
notation
as before, we can
rewrite Eq. 7 as

where
again denotes a
modified version of the ith element of the input
vector
which is now a function
of the index j

Going one step further, we can define:

and rewrite Eq. 7 in a form that corresponds to Hebb's postulate of learning

In contrast to the maximum eigenfilter which requires a forward information flow for the backward adaption, the concurrent implementation for the PCA network requires an additional backward information flow. The algorithm can be split into two parts:
is passed forward through the network
according to Eq. 9.
for the particular link.

The Hebbian-based maximum eigenfilter as well as the PCA network
require the input vector to be normalized to the zero mean.
Additionally, we found a normalization to the unit standard
deviation useful. The reduced output vector
inherits the zero-mean feature. However, the standard deviation of
the elements of
varies.
In general, the algorithm converges very quickly. There is, however, one serious problem: errors in the input vector (e. g. a forgotten split for a particular price) may cause some weights to break through the constraint mechanism and grow to infinity, which as a result destroys the PCA filter. The implementation should throw an exception when a particular weight grows above an appropriate upper limit. •
in
order to simplify the derivation of the more complex case of a
complete Hebbian-based PCA.
and
should be seen as variables. The proposed
algorithm acts on the link level and should be a method of the
object representing the synaptic link. The variable
is
a property of neuron i and is set to zero before the
feedback pass starts; the variable
is
temporary, introduced merely in order to avoid multiple
calculations of
.