Few implementation science (IS) measures have been evaluated for validity, reliability and utility - the latter referring to whether a measure captures meaningful aspects of implementation contexts. We present a real-world case study of rigorous measure development in IS that assesses Barriers and Facilitators in Implementation of Task-Sharing in Mental Health services (BeFITS-MH), with the objective of offering lessons-learned and a framework to enhance measurement utility. We summarize conceptual and empirical work that informed the development of the BeFITS-MH measure, including a description of the Delphi process, detailed translation and local adaptation procedures, and concurrent pilot testing. As validity and reliability are key aspects of measure development, we also report on our process of assessing the measure's construct validity and utility for the implementation outcomes of acceptability, appropriateness, and feasibility. Continuous stakeholder involvement and concurrent pilot testing resulted in several adaptations of the BeFITS-MH measure's structure, scaling, and format to enhance contextual relevance and utility. Adaptations of broad terms such as "program," "provider type," and "type of service" were necessary due to the heterogeneous nature of interventions, type of task-sharing providers employed, and clients served across the three global sites. Item selection benefited from the iterative process, enabling identification of relevance of key aspects of identified barriers and facilitators, and what aspects were common across sites. Program implementers' conceptions of utility regarding the measure's acceptability, appropriateness, and feasibility clustered across several common categories. This case study provides a rigorous, multi-step process for developing a pragmatic IS measure. The process and lessons learned will aid in the teaching, practice and research of IS measurement development. The importance of including experiences and knowledge from different types of stakeholders in different global settings was reinforced and resulted in a more globally useful measure while allowing for locally-relevant adaptation. To increase the relevance of the measure it is important to target actionable domains that predict markers of utility (e.g., successful uptake) per program implementers' preferences. With this case study, we provide a detailed roadmap for others seeking to develop and validate IS measures that maximize local utility and impact.