SUMMARY The seismic waveform inversion problem is usually cast into the framework of Bayesian statistics in which prior information on the model parameters is combined with the data and physics of the forward problem to estimate the a posteriori probability density (PPD) in model space. The PPD is a function of an objective or fitness function computed from the observed and synthetic data. In general, the PPD or the fitness function is multimodal and its shape is unknown. Global optimization methods such as simulated annealing (SA) and genetic algorithms (GA) do not require that the shape of the fitness function be known. In this paper, we investigate GA to rapidly sample the most significant portion or portions of the PPD, when very little prior information is available. First, we use a simple three operator (selection, crossover and mutation) GA acting on a randomly chosen finite population of haploid binary coded models. We use plane wave transformed synthetic seismic data and a normalized cross-correlation function [E(m)] in the frequency domain as a fitness function. A moderate value of crossover probability, a low value of mutation probability, a high value of update probability and a proper population size are required to reach very close to the global maximum of the fitness function. Next, with an attempt to accelerate convergence we show that the concepts from simulated annealing can be used in stretching of the fitness function, i.e., we use exp [E(m)/T] rather than E(m) as the fitness function, where T is a control parameter analogous to temperature in simulated annealing. By a schemata analysis, we show that at low temperatures, schemata with above average fitness values are reproduced in large numbers causing a much more rapid convergence of the algorithm. A high value of temperature T assigns nearly equal selection probability to most of the schemata and thus retains diversity among the members of the population. Thus a GA with a step function type cooling schedule (very high temperature in the beginning followed by rapid cooling to a very low temperature) improves the performance dramatically: high values of the fitness function are obtained rapidly using only half as many models as would be required by a conventional GA. Similar performance could also be achieved by first using a high mutation probability and then decreasing the mutation probability to a very low value, while retaining the same low temperature throughout. We also address the problem of ‘genetic drift’ which causes the finite GAs to converge to one peak or the other when the algorithm is applied to a highly multimodal fitness function with several peaks of nearly the same height. A parallel genetic algorithm based on the concept of ‘punctuated equilibria’ is implemented to circumvent the problem. We run several GAs each with a finite subpopulation in parallel and collect many good models from each one of these runs. These are then used to grasp the most significant portion(s) of the PPD in model space. We then compute the weighted mean model and use the derived good models to estimate uncertainty in the derived model parameters.