Relying on basic physics, laboratory and field evidence, and accounting for the different timescales involved, we derive an approximate energy function for earthquakes. The fracture term is disregarded because virtually all earthquakes occur on pre‐existing faults and because fault gouge has a very low fracture energy. Disregarding also the gravitational term, the importance of which depends on the type of seismic focal mechanism, yields that the energy function has only thermal and radiative terms. The self‐similarity ranges in the bulk rocks and gouge suggest taking the basic element as two cubes of 10 m side, with in common one face, over which slip occurs. An earthquake is a cascade of such slip processes on a series of neighbouring elements and the volume involved can be self‐similarly defined as the two megacubes embedding the set of elements participating in the slip. Dimensional analysis suggests that the slip process is composed of three stages. The first stage consists of a stick‐slip episode with an average velocity v∼ 10−1 m s−1 over a time t∼ 10−2 s. In this first stage, stage I, virtually all energy is converted into heat, with a temperature increase of the order of 102 K on the sliding surface. The second stage, stage II, in which the occurrence of further slip episodes is hardly relevant, consists of the propagation of the thermal wave generated in stage I to the whole shear zone, with a characteristic time of the order of 102 s. In light of the comparatively low permeability of fault gouge with respect to heat diffusivity, this temperature increase induces a pore fluid pressure increase. When the transition from hydrostatic to lithostatic pressure is completed over the whole shear zone, a third stage, stage III, is entered, in which, provided that the pressurization is maintained, virtually frictionless high‐velocity slip occurs, converting all the available energy into elastic radiation. The duration of this purely radiative stage, the amount of slip and the size of the earthquake depend on the number of elements cooperatively participating in the cascade slip, which is ruled, just as in the usual single‐stage cellular automata models, by the correlation length over which the strain level is near the rupture threshold. At odds with the classical single‐stage cellular automata, the model does not require the introduction of strong jumps in stress to be ignited and appears thus also capable of explaining the Coulomb failure stress quandary of very small triggering stresses, with the ignition of large events requiring excess stresses of just 10−2 to 10−3 MPa. The global seismic efficiency, under the assumption that granular effects and viscous resistance are disregarded, is ∼ 1. Assuming then statistical self‐similarity on the fault plane for the patches that slip cooperatively and approximating their pattern as a Sierpinski carpet, yields partial and approximately constant stress drop values independent of event size. These emerge from the fractal nature of the slip surface interpreted according to the seismological assumption of constant homogeneous slip on the Euclidean rectangular plane fault, which embeds the slipping patches.