Objective: To reveal the current status and problem of surgical pathological diagnosis and to construct a structured pathological database of lung cancer in China, and to further improve the level of pathological standards and scientific data. Methods: Case report form (CRF) was made according to the diagnostic criteria of radical resection specimens of lung cancer, including general information, smoking history, pathological report (including molecular data), treatment and prognosis, etc. The original clinicopathological data of patients with primary lung cancer who underwent surgical resection in 23 centers from January 2013 to December 2017 were retrospectively collected. After desensitization, filtering and natural language processing, combined with domain knowledge base, and the raw data in the form of continuous text were structured. Results: A total of 153 817 non-structured pathological reports, 57 748 molecular reports and 13 295 pieces of treatment and/or follow-up information were collected. Finally, 75 941 effective structured documents (including 86 979 primary lesions) were obtained. The quality of treatment and follow-up data was not satisfactory; Number of CRF index involved showed an increasing trend with time coursing, and had no significant difference between general hospitals and cancer hospitals (P<0.05). The indexes with low use rate until 2017 were peripheral lung disease, pTNM stage, spread though air space, and pathological evaluation of neoadjuvant treatment response. The ratio of male to female was 1.2∶1.0; 8 648 cases (11.39%) had smoking history, and the ratio of smokers to non-smokers was 0.92∶1.00. Age group of the highest incidence was 60-69 years, accounting for 38.76%. The top five common pathological subtypes were adenocarcinoma (74.58%), squamous cell carcinoma (18.01%), small cell carcinoma (2.18%), adenosquamous carcinoma (1.71%) and sarcomatoid carcinoma (0.82%); histological subtypes were significantly correlated with gender, age and smoking status (P<0.05): adenocarcinoma (58.5%) and squamous cell carcinoma (31.6%) were the main pathological types in male patients, while adenocarcinoma (91.6%) and squamous cell carcinoma (3.4%) were the main pathological types in female patients; adenocarcinoma (85.6%) was the main type of non-smoking patients, adenocarcinoma and squamous cell carcinoma accounted for 50.6% and 37.7% respectively in smoking patients; the proportion of adenocarcinoma decreased with age, while squamous cell carcinoma and small cell carcinoma increased. The top five common immunohistochemical (IHC) markers were TTF1, CK7, ALK-Ventana, Napsin A and p63 and the most common panel included 7-9 IHC markers. The overall EGFR mutation rate was 51.32% (all 10 335/20 139 by PCR), the total ALK positive rate was 6.18% (2 084/33 726, PCR, FISH and IHC-Ventana platform positive rates were 3.01%, 8.93% and 6.58%, respectively), the KRAS mutation rate was 7.01% (all 662/9 441 by PCR). The positive rates of EGFR, ALK and KRAS were 58.14% (9 986/17 175), 6.59% (1 791/27 176) and 7.52% (607/8 068) in adenocarcinoma, 5.83% (113/1 939), 0.40% (1/251) and 1.76% (15/852) in squamous cell carcinoma, respectively. Due to the poor quality of prognostic data, it was difficult to obtain effective survival analysis. Conclusions: The standardization of pathological reports (including molecular detection) of lung cancer in China is generally fine, but most of the models are still in the state of unstructured continuous text. The postoperative pathological staging, pathological evaluation of neoadjuvant therapy response and high-quality prognosis data need paying more attention and improvement. Panel of IHC markers is balanced although further precision. The use of lung cancer structured report template and intelligent structured database management mode to improve the degree of the pathologic diagnosis standardization and data quality is recommended.