I think that I should begin with my first use of probability theory, which was in a paper with Dorothy Wrinch in the Phil. Mag. for 1919. My interest started through Dr E. P. Farrow, a plant ecologist, who introduced me to Karl Pearson's Grammar of Science, still the best general work in scientific method, though most philosophers of science appear not to have heard of it. Anyhow, it convinced me that scientific method is neither deduction from a set of axioms nor a way of making plausible guesses, as Bertrand Russell said; but that it is a matter of successive approximation to probability distributions. The fundamental rules were laid down by Bayes and Laplace in the eighteenth century. They led to a result for ordinary sampling, that if a test of a property 0 is applied l+m times and succeeds 1 and fails m times, the probability of a 0 at the next trial is (1+ 1)/(l+m+2). If I is large and m is 0, this approaches 1, and for a long time the argument was held to be a justification of induction. C. D. Broad, however, pointed out in 1918 that the same argument leads to the conclusion that if I O's in 1 trials are found, and the class sampled is of number n, the probability that all 1+1 the n are O's is 1 ; if the class is numerous we shall never attach a high probability to the n+1 proposition that all the members are O's until we have examined nearly every member. This is completely contrary to scientific practice; a scientist may formulate a general rule on 10 instances and expect it to hold for 100 or 1,000. The trouble is seen if we go back to the start. The Laplace rule said that the possible numbers 0, 1, ..., n of O's in the class are initially equally likely; that is, before we have made any tests at all the probability that every member 1 has the property is n . It therefore expresses a violent prejudice against any general law, a n+l totally unacceptable description of the scientific attitude. Wrinch and I therefore said that Laplace's rule should be modified to avoid this; we suggested taking probability I that all members are O's, I that all are not O's, and distributing the remaining ' uniformly. The result is then that if the first I are all O's the ratio of the probabilities that all or not all n are O's is near 1 (1+ 1), which is satisfactory. In the same paper we considered the case where all values of r are possible, the initial probabilities differing smoothly. We showed that if n is large the posterior probabilities are nearly in the ratios of the direct probabilities, that is, those of the sample given r. This was in fact the method of maximum likelihood, first given that name by Fisher a few years later. We did not think it at all remarkable at the time, thinking that all statisticians used it already. I was later astonished to find what horrors many of them used, and indeed still use. In this early paper we were already distinguishing between what I call estimation and significance tests. As I now state it, an estimation problem is one where a parameter in a law is capable of a range of values, with no special need to select one. This is covered by the method of maximum likelihood with extensions. One of significance is where we consider a change in