Classification of sheep behaviour from a sequence of tri-axial accelerometer data has the potential to enhance sheep management. Sheep behaviour is inherently imbalanced (e.g., more ruminating than walking) resulting in underperforming classification for the minority activities which hold importance. Existing works have not addressed class imbalance and use traditional machine learning techniques, e.g., Random Forest (RF). We investigated Deep Learning (DL) models, namely, Long Short Term Memory (LSTM) and Bidirectional LSTM (BLSTM), appropriate for sequential data, from imbalanced data. Two data sets were collected in normal grazing conditions using jaw-mounted and ear-mounted sensors. Novel to this study, alongside typical single classes, e.g., walking, depending on the behaviours, data samples were labelled with compound classes, e.g., walking_grazing. The number of steps a sheep performed in the observed 10 s time window was also recorded and incorporated in the models. We designed several multi-class classification studies with imbalance being addressed using synthetic data. DL models achieved superior performance to traditional ML models, especially with augmented data (e.g., 4-Class + Steps: LSTM 88.0%, RF 82.5%). DL methods showed superior generalisability on unseen sheep (i.e., F1-score: BLSTM 0.84, LSTM 0.83, RF 0.65). LSTM, BLSTM and RF achieved sub-millisecond average inference time, making them suitable for real-time applications. The results demonstrate the effectiveness of DL models for sheep behaviour classification in grazing conditions. The results also demonstrate the DL techniques can generalise across different sheep. The study presents a strong foundation of the development of such models for real-time animal monitoring.