Studies of fluctuations and correlations of soft hadrons and hard and electromagnetic probes of the dense and strongly interacting medium require event-by-event hydrodynamic simulations of high-energy heavy-ion collisions that are computing intensive. We develop a (3+1)D viscous hydrodynamic model -- CLVisc that is parallelized on Graphics Processing Unit (GPU) using Open Computing Language (OpenCL) with 60 times performance increase for space-time evolution and more than 120 times for the Cooper-Frye particlization relative to that without GPU parallelization. The pseudo-rapidity dependence of anisotropic flow $v_n(\eta)$ are then computed in CLVisc with initial conditions given by the A Multi-Phase Transport (AMPT) model, with energy density fluctuations both in the transverse plane and along the longitudinal direction. Although the magnitude of $v_n(\eta)$ and the ratios between $v_2(\eta)$ and $v_3(\eta)$ are sensitive to the effective shear viscosity over entropy density ratio $\eta_v/s$, the shape of the $v_{n}(\eta)$ distributions in $\eta$ do not depend on the value of $\eta_v/s$. The decorrelation of $v_n$ along the pseudo-rapidity direction due to the twist and fluctuation of the event-planes in the initial parton density distributions is also studied. The decorrelation observable $r_n(\eta^a, \eta^b)$ between $v_n\{-\eta^a\}$ and $v_n\{\eta^a\}$ with the auxiliary reference window $\eta^b$ is found not sensitive to $\eta_v/s$ when there is no initial fluid velocity. For small $\eta_v/s$, the initial fluid velocity from mini-jet partons introduces sizable splitting of $r_n(\eta^a, \eta^b)$ between the two reference rapidity windows $\eta^b \in [3, 4]$ and $\eta^b \in [4.4, 5.0]$, as has been observed in experiment. The implementation of CLVisc and guidelines on how to efficiently parallelize scientific programs on GPUs are also provided.