Current trends in language modelling leverage large language models pre-trained on a huge corpus of data to achieve state of the art results on several NLP tasks. On the other hand, humans acquire language from small amount of data using cognitive principles. Recently, a continual learning approach using compositionality to disentangle the syntax and semantics of an input sentence for downstream sequence to sequence tasks was proposed. In this work, we show how curriculum learning can be incorporated with this framework to improve performance. More specifically, first, we show that using the model of interest with reduced hidden size as the auxiliary model to generate curriculum is not necessarily optimal and second, we propose a novel variant of the one best score approach for curriculum learning where, a sequence to sequence model is used as the auxiliary model to generate the conditional probabilities of word predictions (proxy for difficulty) and consequently used this to generate a curriculum. Results on a variety of translation tasks, demonstrate the superiority of the proposed approach compared to several baselines, enabling the improvement of sentence accuracy with respect to knowledge transfer and catastrophic-forgetting both by at least a significant margin of 35% with respect to the best performing baseline on the English-French translation task.