As the size of somatic genomes in biomedical repositories increases, it is essential to predict cancer related document sets using the machine learning models. Most of the traditional gene-based somatic cancer mining models are independent of somatic gene ranking and feature extraction due to high computational cost and memory for large datasets. A wide range of feature selection and feature extraction strategies are existing, and they are by and large generally utilized in various areas. Every one of these strategies plans to expel repetitive and irrelevant features from the trained datasets with the goal that the arrangement of new document data will be increasingly accurate. Data extraction is the activity of providing relevant data according to an information need from a collection of large resources of dataRanking consists of sorting the information offers according to some criterion, so that the “best” results appear in the top priority in the provided list. The mapping of somatic genomes and its equivalent words like synonyms to biomedical document ranking is intricate on vast biomedical document data sets. In order to overcome these limitations, a novel feature ranking based fuzzy clustering framework is designed and implemented on large biomedical databasesExperimental results are simulated with different cluster sizes and gene features for somatic document clustering. Experimental results proved that the present model has high computational cluster quality rate with document ranking for somatic gene-based document indexing.