Abstract

Abstract GPT-3 is a large-scale natural language model developed by OpenAI that can perform many different tasks, including topic classification. Although researchers claim that it requires only a small number of in-context examples to learn a task, in practice GPT-3 requires these training examples to be either of exceptional quality or a higher quantity than easily created by hand. To address this issue, this study teaches GPT-3 to classify whether a question is related to data science by augmenting a small training set with additional examples generated by GPT-3 itself. This study compares two augmented classifiers: the Classification Endpoint with an increased training set size and the Completion Endpoint with an augmented prompt optimized using a genetic algorithm. We find that data augmentation significantly increases the accuracy of both classifiers, and that the embedding-based Classification Endpoint achieves the best accuracy of about 76%, compared to human accuracy of 85%. In this way, giving large language models like GPT-3 the ability to propose their own training examples can improve short text classification performance.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.