As one of the important psychological stress reactions, Micro-expressions (MEs) are spontaneous and subtle facial movements, which usually occur in a high-stake situation and can reveal genuine human feelings and cognition. ME, Recognition (MER) has essential applications in many fields such as lie detection, criminal investigation, and psychological healing. However, due to the challenges of learning discriminative ME features via fleeting facial subtle reactions as well as the shortage of available MEs data, this research topic is still far from well-studied. To this end, in this paper, we propose a deep prototypical learning framework, namely ME-PLAN, with a local attention mechanism for the MER problem. Specifically, ME-PLAN consists of two components, i.e., a 3D residual prototypical network and a local-wise attention module, where the former aims to learn the precise ME feature prototypes through expression-related knowledge transfer and episodic training, and the latter could facilitate the attention to the local facial movements. Furthermore, to alleviate the dilemma that most MER methods need to depend on manually annotated apex frames, we propose an apex frame spotting method with Unimodal Pattern Constrained (UPC) and further extract ME key-frames sequences based on the detected apex frames to train our proposed ME-PLAN in an end-to-end manner. Finally, through extensive experiments and interpretable analysis regarding the apex frame spotting and MER on composite-database, we demonstrate the superiority and effectiveness of the proposed methods.