Over the last decade, videos uploaded and shared through web-based multimedia platforms and mobile applications have proliferated worldwide. This is because cloud-based applications such as iCloud, YouTube, Facebook, Twitter, and WhatsApp offer affordable and secure environments for video storage and sharing. However, new challenges have emerged alarming forensic analysts and investigators since videos can be used to commit heinous crimes such as blackmail, fraud, and forgery. Source Camera Identification (SCI) has become of paramount importance in the field of image and video forensics. Camera model identification can also help identify the perpetrators or narrow down the search and can be used to enhance SCI systems. In this context, existing approaches such as the Photo Response Non-Uniformity (PRNU) based methods and machine learning techniques such as the support vector machine (SVM) and deep learning models are commonly used solutions. This work exploits these two categories of methods by exploring a hierarchical deep learning model for camera model identification based on smartphone videos. The PRNU features are extracted by CNN-based structures during the training process. Proposed six-stream networks are leveraged to extract both low-level and high-level features through the network. A fusion layer is created based on joint sparse representation using forward and backward functions defined for fusing the proposed six streams. The proposed approach has been implemented and evaluated through intensive experiments, and results showed successful camera model identification with a performance at the frame level reaching an average accuracy of 69.9% for the Daxing dataset and 81.6% for the QUFVD dataset.