Abstract
Resistance prediction and mutation ranking are important tasks in the analysis of Tuberculosis sequence data. Due to standard regimens for the use of first-line antibiotics, resistance co-occurrence, in which samples are resistant to multiple drugs, is common. Analysing all drugs simultaneously should therefore enable patterns reflecting resistance co-occurrence to be exploited for resistance prediction. Here, multi-label random forest (MLRF) models are compared with single-label random forest (SLRF) for both predicting phenotypic resistance from whole genome sequences and identifying important mutations for better prediction of four first-line drugs in a dataset of 13402 Mycobacterium tuberculosis isolates. Results confirmed that MLRFs can improve performance compared to conventional clinical methods (by 18.10%) and SLRFs (by 0.91%). In addition, we identified a list of candidate mutations that are important for resistance prediction or that are related to resistance co-occurrence. Moreover, we found that retraining our analysis to a subset of top-ranked mutations was sufficient to achieve satisfactory performance. The source code can be found at http://www.robots.ox.ac.uk/~davidc/code.php.
Highlights
As reported by the World Health Organization, resistance co-occurrence is very common, and is especially so between first-line drugs for treating tuberculosis (TB): isoniazid (INH), ethambutol (EMB), rifampicin (RIF), and pyrazinamide (PZA) (World Health Organization, 2017)
We focus on comparing multi-label random forest (MLRF) and single-label random forest (SLRF) in terms of classification performance, mutation ranking, and the effect of feature selection on the performance
Our results show that the MLRF is the best performing model for all drugs except for PZA. feature set F3 was the best feature set for INH, RIF, and MDRTB, while feature F1 was the best feature set for EMB, PZA, and FDR-TB all in terms of AUC
Summary
As reported by the World Health Organization, resistance co-occurrence is very common, and is especially so between first-line drugs for treating tuberculosis (TB): isoniazid (INH), ethambutol (EMB), rifampicin (RIF), and pyrazinamide (PZA) (World Health Organization, 2017). Two types of resistance co-occurrence are especially important: (i) multi-drug resistant TB (MDR-TB) defined as cases that are resistant to at least INH and RIF; and (ii) extensively drug-resistant TB (XDRTB), defined as isolates that are resistant to INH and RIF plus any of the fluoroquinolones such as levofloxacin or moxifloxacin and at least one of the three injectable second-line drugs, including amikacin, capreomycin, or kanamycin. Resistance co-occurrence to anti-TB drugs has become an urgent public health concern (World Health Organization, 2017)
Talk to us
Join us for a 30 min session where you can share your feedback and ask us any queries you have
Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.