In the first part of the series papers, we set out to answer the following fundamental question: for constrained sampling, what kind of signal can be uniquely represented or recovered by the (distributed) discrete sample sequence(s) obtained? We term this study as <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sparsity constrained sensing</i> or <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sparse sensing</i> . It is different from <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">compressed sensing</i> , which exploits the sparse representation of a signal to reduce sample complexity (compressed sampling or acquisition). We use <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">sparsity constrained sensing</i> to denote a class of methods which are devoted to improving the efficiency and reducing the cost of sampling implementation itself. The “sparsity” here is referred to as sampling at a low temporal or spatial rate, which captures applications of cheaper hardware such as of lower power, less memory and throughput to implement sampling. We take frequency and direction of arrival (DoA) estimation as concrete examples and give the necessary and sufficient conditions of the sampling strategy. Interestingly, we prove that these problems can be reduced to some (multiple) remainder model, where it is equivalent to studying the residue representation of a signal. As a straightforward corollary, we supplement and complete the theory of <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">co-prime sampling</i> , which receives considerable attention over last decade. Our results also connect the two classic parameter estimation frameworks, the Chinese Remainder Theorem (CRT) method and the co-prime sensing, with a unified interpretation. On the other aspect, we advance the understanding of the robust remainder problem, which models the case when sampling with noise. A sharpened tradeoff between the parameter dynamic range and the error bound is derived. We prove that, for <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$N$</tex-math></inline-formula> -frequency estimation in either complex or real waveforms, once the least common multiple (lcm) of the sampling rates selected is sufficiently large, a constant error tolerance bound independent of <inline-formula xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink"><tex-math notation="LaTeX">$N$</tex-math></inline-formula> is approached.