GitHub Projects Research Articles

For many years now, modern software is known to be developed in multiple languages (hence termed as multilingual or multi-language software). Yet, to date, we still only have very limited knowledge about how multilingual software systems are constructed. For instance, it is not yet really clear how different languages are used, selected together, and why they have been so in multilingual software development. Given the fact that using multiple languages in a single software project has become a norm, understanding language use and selection (i.e., language profile ) as a basic element of the multilingual construction in contemporary software engineering is an essential first step. In this article, we set out to fill this gap with a large-scale characterization study on language use and selection in open-source multilingual software. We start with presenting an updated overview of language use in 7,113 GitHub projects spanning the 5 past years by characterizing overall statistics of language profiles, followed by a deeper look into the functionality relevance/justification of language selection in these projects through association rule mining. We proceed with an evolutionary characterization of 1,000 GitHub projects for each of the 10 past years to provide a longitudinal view of how language use and selection have changed over the years, as well as how the association between functionality and language selection has been evolving. Among many other findings, our study revealed a growing trend of using three to five languages in one multilingual software project and the noticeable stableness of top language selections. We found a non-trivial association between language selection and certain functionality domains, which was less stable than that with individual languages over time. In a historical context, we also have observed major shifts in these characteristics of multilingual systems both in contrast to earlier peer studies and along the evolutionary timeline. Our findings offer essential knowledge on the multilingual construction in modern software development. Based on our results, we also provide insights and actionable suggestions for both researchers and developers of multilingual systems.

Read full abstract

Purpose: This study aims to uncover the worldwide usage patterns and growth trends of Library Genesis (LibGen) and Sci-Hub, two popular alternative access platforms to scholarly publications. Design/methodology/approach: This study utilizes a webometric approach to analyze worldwide usage patterns and growth trends of LibGen and Sci-Hub. Data was collected between May and June 2023 on online presence, usage metrics, and pattern from different web-based tools like LibGen and Sci-Hub database, Google Trends, SimilarWeb, and existing Python and GitHub projects. Analyses are conducted by incorporating statistical techniques and previous literature reviews. Findings: The study reveals that LibGen and Sci-Hub have experienced a notable surge in popularity despite facing copyright infringements, legal disputes, and restrictions. Conversion rates and unique visitors have significantly increased, with users from various nations, including developed countries like the USA and China. Statistical data shows a preference for accessing science and technology-related resources, particularly in the field of medicine. Most of the downloads originated from reputable publishers and academic journals. The main motivations behind using these platforms are the high costs and limited access to scholarly publications. This phenomenon has attracted researchers, academics, students, and information seekers globally as they seek to overcome financial constraints and institutional barriers hindering access to valuable knowledge. Research limitations: The study acknowledges limitations related to potential discrepancies and biases from external software and incomplete representation of source databases. The dynamic nature of the platforms also means data is only up until 2022, potentially excluding recent developments. Additionally, the country specific data may not give the exact idea as people use Virtual Private Network (VPN) to access the platforms bypassing restrictions. Practical implications: The findings shed light on the challenges faced by the academic community and offer potential implications for policymakers, publishers, and researchers aiming to address the growing demand for affordable and accessible scholarly publications. Originality/value: The originality of this study lies in its application to examine the alternative shadow libraries for scholarly publications. This research contributes to the existing knowledge domain by offering valuable insights into the extent and trends of utilizing shadow platforms, focusing on the implications of copyright infringements, legal concerns, and restrictions on scholarly knowledge dissemination.

Read full abstract

GitHub Projects Research Articles

Related Topics

Articles published on GitHub Projects

Foundation models in robotics: Applications, challenges, and the future

Architecture decisions in quantum software systems: An empirical study on Stack Exchange and GitHub

Describing and Sharing Molecular Visualizations Using the MolViewSpec Toolkit.

Deep semi-supervised learning for recovering traceability links between issues and commits

DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation.

How Are Multilingual Systems Constructed: Characterizing Language Use and Selection in Open-Source Multilingual Software

Planktonic, benthic and sympagic copepods collected from the desalination unit of Mario Zucchelli Research Station in Terra Nova Bay (Ross Sea, Antarctica).

SafeNet: Towards mitigating replaceable unsafe Rust code via a recommendation‐based approach

Rapid: Zero-shot Domain Adaptation for Code Search with Pre-trained Models

A Comparative Analysis of Centralized and Decentralized Developer Autonomous Organizations Managing Conflicts in Discussing External Crises

Webometric analysis of alternative access to scholarly publication

Class correlation correction for unbiased scene graph generation

Detecting outdated code element references in software repository documentation

Data pipeline quality: Influencing factors, root causes of data-related issues, and processing problem areas for developers

On the validity of retrospective predictive performance evaluation procedures in just-in-time software defect prediction

Study the correlation between the readme file of GitHub projects and their popularity

Protecting by attacking: A personal information protecting method with cross-modal adversarial examples

A Procedure to Continuously Evaluate Predictive Performance of Just-In-Time Software Defect Prediction Models During Software Development

On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests–A Mixed-Methods Study of 10 Large Open-Source Projects

A Proposed Simulation Technique for Population Stability Testing in Credit Risk Scorecards

Lead the way for us

Editage

Paperpal

R Discovery

Mind the Graph

GitHub Projects Research Articles

Related Topics

Articles published on GitHub Projects

Foundation models in robotics: Applications, challenges, and the future

Architecture decisions in quantum software systems: An empirical study on Stack Exchange and GitHub

Describing and Sharing Molecular Visualizations Using the MolViewSpec Toolkit.

Deep semi-supervised learning for recovering traceability links between issues and commits

DaGAN++: Depth-Aware Generative Adversarial Network for Talking Head Video Generation.

How Are Multilingual Systems Constructed: Characterizing Language Use and Selection in Open-Source Multilingual Software

Planktonic, benthic and sympagic copepods collected from the desalination unit of Mario Zucchelli Research Station in Terra Nova Bay (Ross Sea, Antarctica).

SafeNet: Towards mitigating replaceable unsafe Rust code via a recommendation‐based approach

Rapid: Zero-shot Domain Adaptation for Code Search with Pre-trained Models

A Comparative Analysis of Centralized and Decentralized Developer Autonomous Organizations Managing Conflicts in Discussing External Crises

Webometric analysis of alternative access to scholarly publication

Class correlation correction for unbiased scene graph generation

Detecting outdated code element references in software repository documentation

Data pipeline quality: Influencing factors, root causes of data-related issues, and processing problem areas for developers

On the validity of retrospective predictive performance evaluation procedures in just-in-time software defect prediction

Study the correlation between the readme file of GitHub projects and their popularity

Protecting by attacking: A personal information protecting method with cross-modal adversarial examples

A Procedure to Continuously Evaluate Predictive Performance of Just-In-Time Software Defect Prediction Models During Software Development

On Wasted Contributions: Understanding the Dynamics of Contributor-Abandoned Pull Requests–A Mixed-Methods Study of 10 Large Open-Source Projects

A Proposed Simulation Technique for Population Stability Testing in Credit Risk Scorecards