Population-based studies have been hampered in exploring hypothalamic-pituitary-adrenal axis (HPA) activity as a potential explanatory link between stress-related and metabolic disorders due to their lack of incorporation of reliable measures of chronic cortisol exposure. The purpose of this review is to summarize current literature on the reliability of HPA axis measures and to discuss the feasibility of performing them in population-based studies. We identified articles through PubMed using search terms related to cortisol, HPA axis, adrenal imaging, and reliability. The diurnal salivary cortisol curve (generated from multiple salivary samples from awakening to midnight) and 11 p.m. salivary cortisol had the highest between-visit reliabilities (r = 0.63-0.84 and 0.78, respectively). The cortisol awakening response and dexamethasone-suppressed cortisol had the next highest between-visit reliabilities (r = 0.33-0.67 and 0.42-0.66, respectively). Based on our own data, the inter-reader reliability (r(s)) of adrenal gland volume from non-contrast CT was 0.67-0.71 for the left and 0.47-0.70 for the right adrenal glands. While a single 8 a.m. salivary cortisol is one of the easiest measures to perform, it had the lowest between-visit reliability (R = 0.18-0.47). Based on the current literature, use of sampling multiple salivary cortisol measures across the diurnal curve (with awakening cortisol), dexamethasone-suppressed cortisol, and adrenal gland volume are measures of HPA axis tone with similar between-visit reliabilities which likely reflect chronic cortisol burden and are feasible to perform in population-based studies.