Abstract

Sequence Series Data (SSD) refers to multi-dimensional data involving measurements over sequences, which can be ordered. This type of data is frequently encountered in genomic data sets and text sentiment analysis data sets, but collecting them can be time-consuming and labour-intensive. These factors result in low-resolution data sets. Therefore, we employed six machine learning regression methods to perform SSD super-resolution, i.e. to recover high-resolution data sets using self-similarity in low-resolution data sets. Furthermore, we propose a novel Long-Short Term Memory (LSTM) network, namely Interaction Encoded LSTM (IELSTM) network, which is capable of handling multiple distant interactions among sequences. IELSTM network generally shows better overall reconstruction quality when compared with ridge regression, LASSO regression, orthogonal matching pursuit regression, multilayer perceptron regression, and random forest regression, on four genomic data sets.

Full Text
Published version (Free)

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call