Accurate preoperative prediction of lymph node metastasis and degree of tumor invasion would facilitate an appropriate decision of the extent of surgical resection of cancers to reduce unnecessary complication or to minimize the risk of recurrence in patients. We analyzed gene expression profiles characteristic of the invasiveness of colorectal carcinoma in a total of 89 cases, using a cDNA array and pattern classification algorithms. We set binary classes for a panel of clinicopathologic parameters, each of which was divided at different levels for categories (discrete) or values (continuous). We searched an optimal combination of genes to discriminate the classes by using of a feature subset selection algorithm, which was applied to a set of genes preselected on the basis of statistical difference in expression (two-sided t test, P ≤ 0.05). We used a sequential forward feature selection which additively searched a combination of genes, giving a minimal leave-one-out classification error rate of a k-nearest neighbor classifier. In the process of gene preselection, we found a remarkable difference in the expression pattern of genes according to the anatomical location of cancers. The difference was most prominent when the classes were set for cecum, ascending colon, transverse colon, and descending colon (CATD) versus sigmoid colon and rectum (SR). By stratifying these two locations, we were able to extract gene expression profiles characteristic of the classes of the presence versus absence of lymph node metastasis, lymphatic invasion, vascular invasion and degree of mural invasion, and pathological stages, with an accuracy of more than 90%. These results suggest that colorectal cancers harbor distinct molecular pathophysiological statuses according to their right-to-left locations, of which stratification is important for pattern classification of cDNA array data. Accurate preoperative prediction of lymph node metastasis and degree of tumor invasion would facilitate an appropriate decision of the extent of surgical resection of cancers to reduce unnecessary complication or to minimize the risk of recurrence in patients. We analyzed gene expression profiles characteristic of the invasiveness of colorectal carcinoma in a total of 89 cases, using a cDNA array and pattern classification algorithms. We set binary classes for a panel of clinicopathologic parameters, each of which was divided at different levels for categories (discrete) or values (continuous). We searched an optimal combination of genes to discriminate the classes by using of a feature subset selection algorithm, which was applied to a set of genes preselected on the basis of statistical difference in expression (two-sided t test, P ≤ 0.05). We used a sequential forward feature selection which additively searched a combination of genes, giving a minimal leave-one-out classification error rate of a k-nearest neighbor classifier. In the process of gene preselection, we found a remarkable difference in the expression pattern of genes according to the anatomical location of cancers. The difference was most prominent when the classes were set for cecum, ascending colon, transverse colon, and descending colon (CATD) versus sigmoid colon and rectum (SR). By stratifying these two locations, we were able to extract gene expression profiles characteristic of the classes of the presence versus absence of lymph node metastasis, lymphatic invasion, vascular invasion and degree of mural invasion, and pathological stages, with an accuracy of more than 90%. These results suggest that colorectal cancers harbor distinct molecular pathophysiological statuses according to their right-to-left locations, of which stratification is important for pattern classification of cDNA array data.
Read full abstract