In this paper, we study how to take samples at a data source for improving the freshness of received data samples at a remote receiver. We use non-linear functions of the age of information to measure data freshness, and provide a survey of non-linear age functions and their applications. The sampler design problem is studied to optimize these data freshness metrics, even when there is a sampling rate constraint. This sampling problem is formulated as a constrained Markov decision process (MDP) with a possibly uncountable state space. We present a complete characterization of the optimal solution to this MDP: The optimal sampling policy is a deterministic or randomized threshold policy, where the threshold and the randomization probabilities are characterized based on the optimal objective value of the MDP and the sampling rate constraint. The optimal sampling policy can be computed by bisection search, and the curse of dimensionality is circumvented. These age optimality results hold for (i) general data freshness metrics represented by monotonic functions of the age of information, (ii) general service time distributions of the queueing server, (iii) both continuoustime and discrete-time sampling problems, and (iv) sampling problems both with and without the sampling rate constraint. Numerical results suggest that the optimal sampling policies can be much better than zero-wait sampling and the classic uniform sampling.