Most existing neural soft object manipulation systems rely on differentiable simu- lation as the default physics engine. However, current explicit differentiable sim- ulation solvers adopt a heuristic searching approach to manipulate the soft body, which easily gets stuck when the initial stage of the end effectors is sub-optimal on single-stage tasks or when performing multi-stage tasks that require complex hi- erarchical actions, which often leads to local-minima. Furthermore, existing deep soft object manipulation systems never consider the soft multi-body dynamics and model the dynamic system with naive explicit visual cues (e.g., image frame). To address this challenge, we propose an implicit, i.e., with energy-based models, soft objects manipulation differentiable simulator (iDSoftor) that guides the dif- ferentiable physics solver to deform the various soft object. The key idea of our work is to integrate energy-based models into the soft-object differentiable simu- lation to address the local minima in common explicit (heuristic transport-based) methods to stabilize the gradient of training differentiable simulators. On simple tasks, such as one-stage tasks, our proposed method can feasibly find a suitable initial stage based on theoretical arguments, refer to. On complex multi-stage tasks with multimodal, like implicit visual cues states, our proposed methods can iteratively adjust the manipulation trajec- tory from the perspective of energy priorities. To evaluate the feasibility of our proposed method, we formulate various potential novel soft-object manipulation tasks and demonstrate a preliminary story.