To reconstruct volumetric MRI from orthogonal cine acquisition aided by sparse priors of 2 static 3D MRI through implicit neural representation (NeRP) learning, with the goal of eliminating large-scale training datasets for data-driven sparse MRI reconstruction and supporting clinical workflow of real time 3D motion tracking during MR-guided radiotherapy. A multi-layer perceptron network was trained to learn the NeRP of a patient-specific MRI dataset, where the network takes 4D data coordinates of voxel locations and motion states as inputs and outputs corresponding voxel intensities. By first learning the NeRP of 2 static 3D MRI with different breathing motion states, prior knowledge of patient breathing motion was embedded into network weights through optimization. The prior knowledge was then augmented from 2 to 31 motion states by querying the optimized network at interpolated/extrapolated motion state coordinates. Starting from the prior-augmented network as an initialization point, the network was further trained using sparse samples of 2 orthogonal cine slices. The final volumetric reconstruction was obtained by querying the trained network at desired 3D spatial locations. We evaluated the proposed method using 5-minute volumetric MRI time series with 340 ms temporal resolution collected from 7 liver carcinoma patients. The time series was acquired using golden-angle radial MRI sequence and reconstructed through retrospective sorting. Two MRI with inhale and exhale states respectively were selected from the first 30 sec of the time series for prior embedding and augmentation. The remaining 4.5-min time series was used for volumetric reconstruction evaluation, where we retrospectively subsampled each MRI to 2 orthogonal slices and compared network-reconstructed images to ground truth images in terms of image quality and the capability of supporting 3D target motion tracking. Across the 7 patients evaluated, the peak signal to noise ratio between model reconstruction and ground truth was 54.66 ± 6.16 dB and the structural similarity index measure was 0.99 ± 0.01. Gross tumor volume (GTV) contours estimated by deforming a reference state MRI to model-reconstructed and ground truth MRI showed good consistency. The 95-percentile Hausdorff distance between GTV contours was 1.89 ± 1.13 mm, which is less than the voxel dimension. The mean GTV centroid position difference between ground truth and model estimation was less than 1 mm in all 3 orthogonal directions. Volumetric MRI from orthogonal cine acquisition with sparse priors is feasible by modeling prior knowledge through implicit neural representation learning. The model-reconstructed images showed sufficient accuracy in supporting 3D motion tracking of abdominal targets. By eliminating the need for large scale training datasets, the method promises to enable clinical implementation of 3D motion tracking for precision radiation therapy.