BackgroundThe potential value of big data in health care is huge. The Chinese State Council issued the Guides on Promoting and Regulating the Development of Big Data Applications in Health Care to encourage opening and sharing of health data. However, some data controllers in China are unwilling to open and share data because of privacy risks. We aimed to identify medical information that should be removed or processed before sharing of health data, so to ensure that data are shared in a safe and controlled manner. MethodsUsing cluster sampling, we selected ten departments in a hospital in Shanghai between March 5, 2018, and May 1, 2018. 456 patients in the departments were administered a questionnaire that included requests to provide basic personal information and questions on awareness of privacy, control over data, and willingness to share data. In addition, a convenience sample of 50 medical staff, 25 patients, and six management personnel was selected for one-to-one interviews about issues related to medical information privacy. Feedback from patients, doctors, and management personnel was summarised and used in consultation with experts to define categories of medical data that should be considered when opening and sharing health data in China. All analyses were descriptive. FindingsWe defined five categories of identifiable characteristics, including name, identity document number (eg, identification card number and driver's license number), contact information (eg, telephone number and home address), accounts number (eg, social security number and medical record file number), and biometric identification (eg, fingerprint and DNA). We identified eight types of diseases, including reproduction-related diseases, infectious diseases, mental diseases, malignant tumours, hereditary diseases, anal diseases, rare diseases, and other incurable diseases. Finally, we defined three identities, including mother, patient with malignant tumour, and government personnel. Health data were classified by privacy level as summary datasets, limited datasets, and identifiable datasets. InterpretationThe Health Care Commission in China should set regulations to define the content and scope of privacy data in health care. The five categories of identifiable characteristics, eight types of diseases, and three types of identities we have defined could be referred to when setting these regulations so that secondary use of data is possible. FundingNational Natural Science Foundation (grant number 71473164).