Two major endpoints for genotoxicity tests are gene mutation and chromosome damage (CD), which includes clastogenicity and aneugenicity detected by chromosomal aberration (CA) test or micronucleus (MN) test. Many in silico prediction systems for bacterial mutagenicity (i.e. Ames test results) have been developed and marketed. They show good performance for prediction of Ames mutagenicity. On the other hand, it seems that in silico prediction of CD does not progress as much as Ames prediction. Reasons for this include different mechanisms and detection methods, many false positives and conflicting test results. However, some (quantitative) structure-activity relationship ((Q)SAR) models (e.g. Derek Nexus [Derek], ADMEWorks [AWorks] and CASE Ultra [MCase]) can predict CA test results. Therefore, performances of the three (Q)SAR models were compared using the expanded Carcinogenicity Genotoxicity eXperience (CGX) dataset for understanding current situations and future development. The constructed dataset contained 440 chemicals (325 carcinogens and 115 non-carcinogens). Sensitivity, specificity, accuracy or applicability of each model were 56.0, 86.9, 68.6 or 89.1% in Derek, 67.7, 61.5, 65.2 or 99.3% in AWorks, and 91.0, 64.9, 80.5 or 97.7% in MCase, respectively. The performances (sensitivity and accuracy) of MCase were higher than those of Derek or AWorks. Analysis of predictivity of (Q)SAR models of certain chemical classes revealed no remarkable differences among the models. The tendency of positive prediction by (Q)SAR models was observed in alkylating agents, aromatic amines or amides, aromatic nitro compounds, epoxides, halides and N-nitro or N-nitroso compounds. In an additional investigation, high sensitivity but low specificity was noted in in vivo MN prediction by MCase. Refinement of test data to be used for in silico system (e.g. consideration of cytotoxicity or re-evaluation of conflicting test results) will be needed to improve performance of CD prediction.