Perceptual decisions involve a process that evolves over time until it reaches a decision boundary. It's important to understand how this process unfolds. Recent psychophysical data indicates that the visual system extracts motion axis information faster than motion direction information (Kwon et al., 2015, J Vision). To understand the underlying mechanisms, we developed a biophysically realistic cortical network model of decision making. We generalized the two-variable reduced spiking neural network (Wong et al., 2006, J Neuroscience) to four-variable. The network input is based on motion energy (Adelson et al., 1985, Josa a) and the temporal profile of surround influence (Tadin et al., 2006, J Neuroscience). The model reproduces the prior experimental findings, showing the motion axis extraction before direction extraction. It reveals a stronger axis-wise inhibitory connection between the selective neural populations than the direction-wise inhibitory connection. We further designed a recurrent deep neural network to validate the neural population connectivity pattern. Our model provides a quantitative explanation for the temporal evolution of motion direction judgments. The results show that the spatiotemporal filtering for visual motion integration, the center-surround antagonism, and stronger axis-wise inhibitory connection between the selective neural populations can explain how the visual system can extract motion axis orientation before detecting motion direction.