This article presents research aimed at developing an optimal concept of analysis and comparison of information sources based on large volumes of textual information using natural language processing (NLP) methods. The object of the study was Telegram news channels, which are used as sources of text data. Texts were pre-processed, including cleaning, tokenization, and lemmatization, to form a global dictionary consisting of unique words from all information sources. For each source, a vector representation of the texts was built, the dimension of which corresponds to the number of unique words in the global dictionary. The frequency of use of each word in the channel's texts was displayed in the corresponding positions of the vector. By applying the cosine similarity algorithm to pairs of vectors, a square matrix was obtained that demonstrates the degree of similarity between different sources. The results of the study show the effectiveness of the proposed approach for quantitative assessment of the similarity of textual data from different sources. The need for further optimization of the algorithm was identified, in particular by parameterization to achieve a balance between accuracy and computational cost, as well as the separation of words with excessive weight, such as specific terms or channel names. The proposed method can be applied to the analysis of information flows, the identification of relationships between sources and the study of the socio-cultural influence of media content in the conditions of the modern information environment.