Abstract With a flood of cancer genome sequences expected soon, distinguishing ‘driver’ from ‘passenger’ mutations will be an important task. To discover the key characteristics of driver gene fusions in cancer, we performed a systematic analysis of known fusion genes using a compendium of ‘molecular concepts', inclusive of functional domains, pathways, gene ontologies and molecular interactions. Although analysis of domain architectures and shared pathways was less informative, we found that cancer-related fusion genes tend to share common gene ontologies or engage distinct interaction networks. We hypothesized that such ‘signatures’ of molecular concepts may be used to distinguish biologically meaningful gene fusions underlying cancer, similar to signature genes defining certain phenotypes. We therefore designed an algorithm, called concept signature score (ConSig score), to quantitatively rank human genes underlying cancer by the strength of their association with the molecular concepts characteristics of cancer genes. To integrate the use of high-throughput genomic data, we characterized the chromosomal imbalances associated with gene fusions, finding that recurrent gene fusions exhibit distinctive patterns of copy number alteration corresponding to differential portions of fusion partners. We named this pattern as the fusion breakpoint principle, and confirmed this principle by large-scale meta-analysis of recurrent gene fusions using high-resolution array CGH/SNP datasets. Next, we applied the ConSig technology to paired-end transcriptome sequencing data to benchmark fusion candidates, which were then assessed for chromosomal aberrations complying with the fusion breakpoint principle by integrating high-quality copy number data. We found that the ConSig score was able to identify the known EML4-ALK fusion as the top-ranked candidate in the H2228 lung cancer cell line, and, in addition, we found further evidence of a R3HDM2-NFE2 fusion in H1792 cell line. We show that the R3HDM2-NFE2 fusion, which results in overexpression of wild-type NFE2, promotes cell proliferation and invasion. Moreover, through analysis of SNP arrays and lung TMAs, we find that chromosomal rearrangements at the NFE2 locus are recurrent in a small subset of patient tumors, suggesting that NFE2 may contribute to a new class of lung cancer molecular biology. Together, the ConSig technology suggests the functional importance of putative fusions in cancer, whereas the breakpoint principle helps interpret large-scale cancer genomic data sets to explore potential recurrence. Thus the methodology described here can filter the large number of fusion candidates generated by paired-end sequencing data and preferentially identify driver gene fusions in cancer. Citation Format: {Authors}. {Abstract title} [abstract]. In: Proceedings of the 101st Annual Meeting of the American Association for Cancer Research; 2010 Apr 17-21; Washington, DC. Philadelphia (PA): AACR; Cancer Res 2010;70(8 Suppl):Abstract nr 2214.