"Constructive wavelet networks" are investigated as a universal tool for function approximation. The parameters of such networks are obtained via some "direct" Monte Carlo procedures. Approximation bounds are given. Typically, it is shown that such networks with one layer of "wavelons" achieve an L(2) error of order O(N(-(rho/d))), where N is the number of nodes, d is the problem dimension and rho is the number of summable derivatives of the approximated function. An algorithm is also proposed to estimate this approximation based on noisy input-output data observed from the function under consideration. Unlike neural network training, this estimation procedure does not rely on stochastic gradient type techniques such as the celebrated "backpropagation" and it completely avoids the problem of poor convergence or undesirable local minima.