In this paper, we propose SelfSphNet, that is, a self-supervised learning network to estimate the motion of an arbitrarily moving spherical camera without the need for any labeled training data. Recently, numerous learning-based methods for camera motion estimation have been proposed. However, most of these methods require an enormous amount of labeled training data, which is difficult to acquire experimentally. To solve this problem, our SelfSphNet employs two loss functions to estimate the frame-to-frame camera motion, thus giving two supervision signals to the network with the usage of unlabeled training data. First, a 5 DoF epipolar angular loss, which is composed of a dense optical flow of spherical images, estimates the 5 DoF motion between two image frames. This loss function utilizes a unique property of the spherical optical flow, which allows the rotational and translational components to be decoupled by using a derotation operation. This operation is derived from the fact that spherical images can be rotated to any orientation without any loss of information, hence making it possible to “decouple” the dense optical flow between pairs of spherical images to a pure translational state. Next, a photometric reprojection loss estimates the full 6 DoF motion using a depth map generated from the decoupled optical flow. This minimization strategy enables our network to be optimized without using any labeled training data. To confirm the effectiveness of our proposed approach (SelfSphNet), several experiments to estimate the camera trajectory, as well as the camera motion, were conducted in comparison to a previous self-supervised learning approach, SfMLearner, and a fully supervised learning approach whose baseline network is the same as SelfSphNet. Moreover, transfer learning in a new scene was also conducted to verify that our proposed method can optimize the network with newly collected unlabeled data.