The upgrade of network devices to be equipped with multiple network interfaces makes it possible to improve network throughput performance through multipath transmission protocols, especially multipath TCP (MPTCP). However, so far the mostly used MPTCP protocols have a common limitation, namely the rigid and conservative method. They have been designed with little consideration of the fact that real networks are dynamic and the network status changes frequently, thus leading to the poor performance of current MPTCP in many realistic scenarios. In this paper, we propose a lightweight multipath congestion control algorithm based on online learning, named MP-OL. MP-OL models congestion control as a multi-armed bandit problem, and adjusts the sending rate of each subflow flexibly and adaptively through online learning. Therefore, MP-OL possesses the capability of suiting various network scenarios, and can achieve fairness and high performance in dynamic network environment. It can also flexibly switch between online learning and traditional method, which reduces the computational complexity while ensuring the learning efficiency, thus making MP-OL easy to deploy and use. As the experimental results demonstrated, compared with the leading MPTCP variants, MP-OL achieves significant improvements in fairness and link utilization, and shows better resilience to non-congestion loss and better adaptability to unstable network conditions. In real networks, MP-OL also obtains better throughput performance.