Abstract The use of apps is increasing in the field of mental health due to their ease of use and accessibility, although there is not enough evidence on their effectiveness and safety. EvalDepApps aims to develop an evaluation tool for depression management apps. A systematic review with meta-analysis (SRMA) was performed to evaluate the efficacy and safety of apps for depression, and to identify the evaluation criteria used. PRISMA methodology was followed. The MEDLINE, PsycINFO, and Embase databases were consulted. The risk of bias was assessed with the RoB2 scale. An online 2 rounds Delphi was carried out to prioritize the most relevant criteria identified. 44 people (26 professionals, 18 patients) were invited. They were asked to rate the importance of each criterion on a Likert scale (1 - 6). Those that obtained a high consensus were selected; those with a medium were submitted to the 2nd round. Empathization (6) and co-design (6) sessions were held with patients (23) and professionals (33) in Catalonia, the Canary Islands and Andalusia to identify what relevant aspects the tool should have. Twenty-nine studies were included in the SRMA (67% unclear bias), finding a significant effect of mHealth interventions in reducing depressive symptoms compared to non-active control (Hedges g = −0.62, 95% CI: −0.87 to − 0.37, I2 = 87%). In Round 1 of the Delphi (59% participation) 24 criteria obtained a high consensus, 20 a medium and 7 a low. In Round 2 (52% participation), 4 criteria reached high consensus. The empathy sessions showed that the actions most requested by patients were reduce anxiety, and information about their condition; for professionals, suicide prevention. Regarding co-design, it was proposed that the tool provide ranking of the apps, recommendation systems and a very visual format. The RSMA and Delphi guarantee that the tool will be based on scientific evidence and expert judgment, while empathization and co-design that it will fit with the needs of end users. Key messages • It is relevant to evaluate the quality of health apps, especially those addressed to vulnerable populations such as people with depression. • It is crucial that the development of evaluation tools for digital health interventions will be developed based on evidence-based but also includes end users.