Clinical trial research in oncology relies heavily on clinical documentation within the electronic medical record (EMR) to ascertain patient eligibility in clinical trials based on inclusion and exclusion criteria. The structured data elements within the EMR serve as the primary information source for defining patient cohorts, with clinical cancer stage and performance status being two pivotal criteria determining trial eligibility. The challenge arises from the inconsistent availability of clinical stage and performance status data within the structured fields of the EMR despite their consistent presence in clinical notes. Additionally, there is a deficiency of standardization of this data that exists in the unstructured field. Hence, due to lack of structured data and standardization of said data, there are limitations in developing artificial intelligence (AI) models. To increase the comprehensiveness of clinical records, a clinical research team at a community oncology practice was consulted to identify requirements and extract essential clinical features from de-identified data. The methods outlined in this paper focused on eliminating false positives to allow future development of Large Language Models (LLM) using the outputted structured fields which resulted in an increase in patient record completeness with high accuracy. The accuracy ranged from 97.5-97.75% for the models that were developed. Out of the 60,000+ patients, the numerical staging, TNM (tumor, node, metastasis) staging, and Karnofsky performance score models added a structured field for 29.62%, 21.01%, and 40.64% patients respectively. Additionally, a semi-supervised NLP algorithm was applied on the performance status algorithm which achieved a mean absolute error (MAE) of 1.57. This work demonstrates the use case of natural language processing (NLP) in optimizing the clinical research enrollment process by providing an efficient and accurate method to detect key clinical values in unstructured patient data. Similar methodology with more advanced algorithms such as LLM can be employed to detect additional patient elements such as molecular biomarkers, imaging reports, postoperative surgical outcomes (i.e., clear margins etc.) and patient treatment outcomes using the extracted structured fields.