One way to do unsupervised machine learning is to build a probabilistic generative model of the sensory data. Such a model can be used to predict future data, infer missing features, communicate the data efficiently, and detect novelty. While many algorithms have been proposed for fitting the parameters of these generative models, much less emphasis has been placed on learning their stucture. Some examples of 'model structure' are: the number of clusters in the data, the intrinsic dimensionality of the data, the number of hidden variables, and the conditional independence relationships between different variables. I will describe a Bayesian approach to learning the structure of probabilistic generative models. This approach resolves the tension between fitting the data well and keeping the models simple by averaging over all possible settings of the model parameters. Unfortunately, for most non-trivial models these averages are intractable, resulting in the need for approximations. I will present variational approximations, which are deterministic, generally fast, and have an objective function which is guaranteed to increase monotonically. I provide some theoretical results showing how variational optimisation generalises both the EM algorithm and belief propagation algorithms. This approach can be used (1) to automatically infer the most probable number of clusters and the intrinsic latent-space dimensionality ofeach cluster, (2) to find the number of sources in a blind source separation (ICA) problem, (3) to infer the dimensionality of the state space of a linear dynamical system, and (4) to optimise over the structure of a mixtures ofexperts network.