.CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
.C $Id: bm.l,v 1.2 92/11/30 11:53:42 drew Exp $
.CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
.C
.CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
.C   Copyright 1990,1991,1992,1993 by The University of Toronto,
.C		      Toronto, Ontario, Canada.
.C 
.C			 All Rights Reserved
.C 
.C Permission to use, copy, modify, distribute, and sell this software
.C and  its documentation  for any   purpose is hereby granted without
.C fee, provided that the above copyright notice appears in all copies
.C and  that both  the  copyright  notice  and this permission  notice
.C appear in supporting documentation, and that the name of University
.C of Toronto not  be used in  advertising or publicity  pertaining to
.C distribution of   the   software   without specific, written  prior
.C permission.  University  of Toronto makes no representations  about
.C the suitability of  this software for any  purpose.  It is provided
.C "as is" without express or implied warranty.
.C
.C UNIVERSITY OF TORONTO DISCLAIMS ALL WARRANTIES WITH REGARD  TO THIS
.C SOFTWARE, INCLUDING ALL  IMPLIED WARRANTIES  OF MERCHANTABILITY AND
.C FITNESS, IN NO EVENT SHALL UNIVERSITY  OF TORONTO BE LIABLE FOR ANY
.C SPECIAL,  INDIRECT  OR    CONSEQUENTIAL  DAMAGES OR     ANY DAMAGES
.C WHATSOEVER RESULTING FROM LOSS OF USE, DATA  OR PROFITS, WHETHER IN
.C AN ACTION OF CONTRACT, NEGLIGENCE OR OTHER TORTIOUS ACTION, ARISING
.C OUT  OF OR  IN CONNECTION  WITH  THE USE   OR  PERFORMANCE  OF THIS
.C SOFTWARE.
.C
.CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC
.C
.de BF          \" boldface a word
\fI\\$1\fP
..
.de FT          \" start a field table with title
.PP
.nf
.ta 2.5i 4.5i
.ce 1
\\$1
..
.de CF          \" center a function on a line
.sp
.ce 1
\\$1
.sp
..
.TH bm LOCAL "April 1992" "Xerion" "Xerion Manual"
.SH NAME
bm -  Xerion Boltzmann Machine module

.SH SYNOPSIS
.nf
bm  [ commands ]

run [ run options ] bm [ commands ]
.fi

.SH DESCRIPTION 

\fIbm\fP is a version of the Boltzmann Machine algorithm, built using
the Xerion Neural Network Simulator. As such, it understands all of
the commands that are built into the Xerion simulator.

The Boltzmann Machine (bm) is an adaptive Hopfield net, possibly with
hidden units, that implements a Monte Carlo algorithm (simulated
annealing) for finding a configuration of active and inactive units
that minimizes energy.  The following sections describe this
implementation of it.

For some background and references on the Boltzmann training and
relaxation algorithms, see the LaTeX documentation in the
$XERIONDIR/doc/*.tex files.

.SH ERROR AND GRADIENT UPDATE ALGORITHMS

Below is a greatly reduced and simplified pseudocode representation of
the routines that calculate the error of the network on an example
set, and calculate the derivatives of the connection weights with
respect to this error.  Where possible, actual variable and function
names are used.  Most of the support functions (setOutput,
clampOutput, unitCostUpdate, etc.)  are defined in the bm.c file.

.nf
/***********************************************************
 ********  Update net error and associated derivs   ********
 ***********************************************************/
int             errorDerivUpdate(net, exampleSet)
  Net           net ;
  ExampleSet    exampleSet ;
{
  /* Initialize vars:           */ 
  net->error = 0;                /*  etc.  */

  /* Go through all training cases (examples).  For each one,
     do a positive phase annealing, gather stats ("correlations",
     i.e., products), a negative phase annealing, gather stats,
     and then update gradients and cost. */

  for numExamples = 0 to net->batchSize - 1 { 
    getNextExample(exampleSet) ;
    for relaxPhase = 0 to numRelaxations-1 by 1 {

      /* Positive phase */
    
      temp = tMax ;
      for all input, output units  { clampOutput } ;
      /* Relax net, using simulated annealing.  */ 
      while (temp > tMin) {
        for all hidden units { updateActivity };
        temp = temp * tDecay;
      }

      /* Positive Phase Stats */
      /* Calc  <Si*Sj>+ */
      temp = tMin;
      for (sweep = 0; sweep < numSamplingSweeps ; sweep++) {
        for all hidden units { updateActivity } ;
        for all units { setIncomingProducts, POSITIVE } ; 
      }

      /* Negative phase */
 
      temp = tMax ;
      for all input units  { clampOutput } ;
      /* Relax net, using simulated annealing. */ 
      while (temp > tMin) {
        for all hidden and output units { updateActivity };
        temp = temp * tDecay;
      }

      /* Negative Phase Stats */
      /* Calc  <Si*Sj>-   */
      temp = tMin;
      for (sweep = 0; sweep < numSamplingSweeps ; sweep++) {
        for all hidden and output units { updateActivity } ;
        for all units { setIncomingProducts, POSITIVE } ;
      }
    }

    /* gradient update */
    updateNetGradients(net) ;

    /* update the error for the net */
    updateNetActivities(net) ;
  }

  /* update cost of weights after all examples done.  */
  for all units { updateCost } ;

  return 0 ;
}
/***********************************************************/

/***********************************************************
 *****               Update net error only           *******
/***********************************************************/
int             errorUpdate(net, exampleSet)
  Net           net ;
  ExampleSet    exampleSet ;
{
  net->error = 0.0 ;
  for numExamples = 0 to net->batchSize - 1 { 
    getNextExample(exampleSet) ;
    /* Similar to negative phase in training.  Relax net with 
       only the input units clamped. */

    temp = tMax ;
    for all input units  { clampOutput } ;
    /* Relax net, using simulated annealing.  */
    while (temp > tMin) {
      for all hidden and output units { updateActivity };
      temp = temp * tDecay;
    }
  }
  /* Update net error -- defined for each output unit in terms 
     of a squared difference between actual and desired "target"
     outputs. Then summed over output units and over cases. */

  for all output units { unitUpdateError } ;

  /* update cost of weights after all examples done.  */
  for all units { updateCost } ;

  return 0 ;
}
/***********************************************************/
.fi

.SH DETAILS ON IMPORTANT SECTIONS AND FEATURES

This section describes in detail some important points of the above
procedures.

.SS Sampling the (Si,Sj) correlations:
The gradient of the true error function with respect to the weights is
based on [<Si*Sj>+ - <Si*Sj>- ], the difference between unit
activation ("spin", in statistical mechanics spin glass jargon)
correlations or co-occurrences in the +phase (clamped) and -phase
(unclamped).  As the network activations are stochastic, there is
noise in these measurements.  Therefore, sampling is required.  

The sampling is governed in our algorithm by two parameters:
\fInumRelaxations\fP and \fInumSamplingSweeps\fP.  These are used in
an outer and inner loop, respectively, driving the number of network
relaxations (annealing process occurrences) in each "plus" and "minus"
phase and the number of sampling sweeps (at \fItemp\fP = \fItMin\fP,
at "equilibrium") per relaxation.  A large value for
\fInumRelaxations\fP means that many different local energy minima (at
low-temperature equilibrium) over many relaxations will be sampled,
whereas a large value for \fInumSamplingSweeps\fP means that many
samples of a few (perhaps identical or very similar) local minima for
a few different relaxations will be taken.  Of course, the best
sampling and hence learning is when both values are set high -- but
then each training epoch becomes intolerably slow.

.SS Updating unit activations (outputs) during relaxation:

Unit activation in the BM consists of plugging the weighted sum of
inputs from connected units into a stochastic "move generator"
function.  The user may choose between the "metropolis" and "heatbath"
stochastic move generation, or state change, procedures.  

.IP "\fIMetropolis\fP" 1i 
If changing the current state reduces the energy, do it; else do it
anyway, with a probability of exp(delta_E/T), where delta_E is
E[current state] - E[other state].

.IP "\fIHeatbath\fP" 1i
Set the unit activation = 1 with prob 1/(1+ exp(delta_E/T)), where
delta_E is E[unit on] - E[unit off]

.PP
Unit activation updates during network relaxation may be done
synchronously or asynchronously.  In synchronous updating, each unit
at time step t computes its new activation level using the weighted
sum of inputs from connected units from time step t-1.  In other
words, all units are updated "simultaneously".  

In asynchronous updating, each unit's weighted sum is computed with
activation/output values from connected units "as they are", i.e.,
some of the updated units may have already been updated during this
time step t, but others may not yet have been updated.  

Synchronous updating is implemented via an extra field in the Unit
extension structure: "\fIReal old\fP", that holds the "old" activation
value from step t-1 of the unit.  This old value is then used in
computing the weighted sum of inputs to a connected unit, instead of
using the current output value of the unit.  Asynchronous updating,
with random-order network traversal (default), is *strongly*
recommended.

.SS Updating gradients and network cost:

The cost update procedure features options for controlling
the size of the weights by enforcing "weight costs" or "weight
decay".  Hence:

.nf
For each Link \fIlink\fP:
    unit->net->cost  += weightCost*square(link->weight) ;
    link->deriv      += 2.0*weightCost*link->weight ;
.fi

It is strongly recommended that the weights be changed using the
momentum update of the \fIminimize\fP command (minimize -momentum).
Steepest descent (minimize -steepest) may also be used.  We do not
recommend using the Conjugate Gradient and Line Search options
in the minimization package for the Boltzmann Machine, because the
network error function calculated in our implementation is *not* the
same error function whose gradient is computed.  The latter, a probability
distributions divergence measure, is too big to compute efficiently.
(In mft, fem, and bp the error measure E is the same as in the gradient
measure dE/dW, so that line search procedures can be quite effective.)
See the man (sman) page, and online help for \fIminimize\fP for further details.

.SH NET PARAMETERS AND "GLOBAL" NETWORK VARIABLES

The net parameters that govern the training, testing, and relaxation
dynamics of the network on the examples, may be set in a *.in file (as
part of setting up a net and example sets) or in the Xerion "Network
Parameters" window or the Xerion main command window.  There are also
a few variables that *report* on the relaxation and training and
testing but which the user may not set.  The (settable) network
parameters are indicated by the /* netParam: */ comment.  (All of
these variables are defined in the bm.h file in the bm directory.)

.IP "\fIint  batchSize ;            /* netParam: */\fP" 1i
The number of examples to process during each batch of training. If
this value is set to 1, the net will be training online. If set to 0,
all the examples in the training set will be processed before updating
the weights, and the net will be doing batch training. Any other
positive number can be used for "semi-batch" learning.

.IP "\fIReal  weightCost ;            /* netParam: */\fP" 1i
Cost associated with magnitude of weights.  It is sometimes useful to
limit the absolute magnitude of weights in this way, in order to
improve the trained net's "generalization" capabilities.  See above
section on gradient updates.

.IP "\fIReal  zeroErrorRadius ;       /* netParam: */\fP" 1i
Interval of acceptance for agreement between desired output and target
output of a unit.  The degree to which a "near miss" will count as a
"hit".

.IP "\fIReal  tMax ;                  /* netParam: */\fP" 1i
Maximum temperature in annealing.  Usually something between 5
and 30 is considered reasonable.

.IP "\fIReal  tMin ;                  /* netParam: */\fP" 1i
Minimum temperature in annealing.  Typically set to 1.0 .

.IP "\fIReal  tDecay ;                /* netParam: */\fP" 1i
Factor by which to lower temperature at each annealing step.
Something between .80 (fast annealing) and .99 (very slow annealing)
is typical.

.IP "\fIReal  temperature ;\fP" 1i
Current network temperature in annealing process.

.IP "\fIint   annealMethod ;          /* netParam: */\fP" 1i
Whether to use Metropolis (set to 0) or Heatbath (set to 1) stochastic
state change method.

.IP "\fIint   numSamplingSweeps ;     /* netParam: */\fP" 1i
Number of sweeps through the network at low-temperature "equilibrium"
to gather the <Si*Sj> samples.  See further explanation above.

.IP "\fIint   numRelaxations ;        /* netParam: */\fP" 1i
Number of times to relax the net in the positive and negative phases
for each training example.  See further explanation above.

.IP "\fIint   synchronousUpdate ;     /* netParam: */\fP" 1i
Whether to update the units synchronously (set var to 1) or
asynchronously (set to 0) in the relaxation.  Asynch is considered
much better in the vast majority of cases.  Also note that random
order traversal is used in asynch case, and "standard" or "row major"
order of traversing the units is used in the synch case.

.IP "\fIint   noAnnealInPosPhase ;     /* netParam: */\fP" 1i
Whether to do a simple 1-step relaxation in the positive clamped
phase of learning, instead of a full annealing.  If the net has
only one hidden layer, and no interconnections between hidden
units, then the annealing is unnecessary because all of the units
to which each hidden unit is connected remain unchanged throughout
the relaxation.  Setting this param to 1 when appropriate can cut the
training time in half.

.IP "\fIint   delayCount  ;           /* netParam: */\fP" 1i
Degree of relaxation slowdown used for viewing the details of network
relaxation.  Basically, a trivial loop from 1 to delayCount*1000 is
used between network relaxation sweeps.  This may be used while
viewing the "testing" of cases (e.g. "clicking" on them in the
"Activations" window) and only while the "inRelaxation" updating
option is turned on in the Activations window. When viewing, try
setting the var to 100 or 1000; performance and ease of viewing will
vary depending on your machine's speed ("raw" CPU speed, memory access
speed, screen update speed, etc.), of course.

.IP "\fIReal  relaxSweepCountAve ;\fP" 1i
The *average* number of relaxation sweeps per example per training loop.

.SH FILES
.nf
.ta 3.65i
$XERIONDIR/src/sim/bm/bm.[ch]	Source code for the simulator
$XERIONDIR/src/sim/bm/bm-train.[ch]	Source code for the simulator
$XERIONDIR/nets/bm/*.in	Input files for sample nets
$XERIONDIR/nets/bm/*.layout	Layout files for sample nets
$XERIONDIR/nets/bm/*.ex	Example sets for sample nets
$XERIONDIR/config/bmrc	Initialization file for bm

.SH SEE ALSO 
minimize(1XERION), run(LOCAL), mft(LOCAL), fem(LOCAL), bp(LOCAL),
hcl(LOCAL), kcl(LOCAL), scl(LOCAL)

For general information on the Xerion simulator, its user interface,
and implementation and portability issues, see the appropriate man
(sman) page, README file, or, once inside Xerion, the online help
page.

For some background and references on the BP, MFT, Boltzmann, and FEM
training and relaxation algorithms, see the Latex documentation in the
$XERIONDIR/doc/*.tex files.

.SH AUTHOR
.nf
Evan W. Steeg (steeg@ai.toronto.edu)
Dept. of Computer Science
University of Toronto,
Toronto, ON, Canada
.fi
