Forest managers and nature conservationists rely on precise mapping of single trees from remote sensing data for efficient estimation of forest attributes. In recent years, additional quantification of dead wood in particular has garnered interest. However, tree-level approaches utilizing segmented single trees are still limited in accuracy and their application is therefore mostly restricted to research studies. Furthermore, the combined classification of presegmented single trees with respect to tree species and health status is important for practical use but has been insufficiently investigated so far. Therefore, we introduce Silvi-Net, an approach based on convolutional neural networks (CNNs) fusing airborne lidar data and multispectral (MS) images for 3D object classification. First, we segment single 3D trees from the lidar point cloud, render multiple silhouette-like side-view images, and enrich them with calibrated laser echo characteristics. Second, projected outlines of the segmented trees are used to crop and mask the MS orthomosaic and to generate MS image patches for each tree. Third, we independently train two ResNet-18 networks to learn meaningful features from both datasets. This optimization process is based on pretrained CNN weights and recursive retraining of model parameters. Finally, the extracted features are fused for a final classification step based on a standard multi-layer perceptron and majority voting. We analyzed the network’s performance on data captured in two study areas, the Chernobyl Exclusion Zone (ChEZ) and the Bavarian Forest National Park (BFNP). For both study areas, the lidar point density was approximately 55 points/m2 and the ground sampling distance values of the true orthophotos were 10 cm (ChEZ) and 20 cm (BFNP). In general, the trained models showed high generalization capacity on independent test data, achieving an overall accuracy (OA) of 96.1% for the classification of pines, birches, alders, and dead trees (ChEZ) - and 91.5% for coniferous, deciduous, snags, and dead trees (BFNP). Interestingly, lidar-based imagery increased the OA by 2.5% (ChEZ) and 5.9% (BFNP) compared to experiments only utilizing MS imagery. Moreover, Silvi-Net also demonstrated superior OA compared to the baseline method PointNet++ by 11.3% (ChEZ) and 2.2% (BFNP). Overall, the effectiveness of our approach was proven using 2D and 3D datasets from two natural forest areas (400–530 trees/ha), acquired with different sensor models, and varying geometric and spectral resolution. Using the technique of transfer learning, Silvi-Net facilitates fast model convergence, even for datasets with a reduced number of samples. Consequently, operators can generate reliable maps that are of major importance in applications such as automated inventory and monitoring projects.