Abstract. The Station for Measuring Ecosystem–Atmosphere Relations (SMEAR) II, located within the boreal forest of Finland, is a unique station in the world due to the wide range of long-term measurements tracking the Earth–atmosphere interface. In this study, we characterize the composition of organic aerosol (OA) at SMEAR II by quantifying its driving constituents. We utilize a multi-year data set of OA mass spectra measured in situ with an Aerosol Chemical Speciation Monitor (ACSM) at the station. To our knowledge, this mass spectral time series is the longest of its kind published to date. Similarly to other previously reported efforts in OA source apportionment from multi-seasonal or multi-annual data sets, we approached the OA characterization challenge through positive matrix factorization (PMF) using a rolling window approach. However, the existing methods for extracting minor OA components were found to be insufficient for our rather remote site. To overcome this issue, we tested a new statistical analysis framework. This included unsupervised feature extraction and classification stages to explore a large number of unconstrained PMF runs conducted on the measured OA mass spectra. Anchored by these results, we finally constructed a relaxed chemical mass balance (CMB) run that resolved different OA components from our observations. The presented combination of statistical tools provided a data-driven analysis methodology, which in our case achieved robust solutions with minimal subjectivity. Following the extensive statistical analyses, we were able to divide the 2012–2019 SMEAR II OA data (mass concentration interquartile range (IQR): 0.7, 1.3, and 2.6 µg m−3) into three sub-categories – low-volatility oxygenated OA (LV-OOA), semi-volatile oxygenated OA (SV-OOA), and primary OA (POA) – proving that the tested methodology was able to provide results consistent with literature. LV-OOA was the most dominant OA type (organic mass fraction IQR: 49 %, 62 %, and 73 %). The seasonal cycle of LV-OOA was bimodal, with peaks both in summer and in February. We associated the wintertime LV-OOA with anthropogenic sources and assumed biogenic influence in LV-OOA formation in summer. Through a brief trajectory analysis, we estimated summertime natural LV-OOA formation of tens of ng m−3 h−1 over the boreal forest. SV-OOA was the second highest contributor to OA mass (organic mass fraction IQR: 19 %, 31 %, and 43 %). Due to SV-OOA's clear peak in summer, we estimate biogenic processes as the main drivers in its formation. Unlike for LV-OOA, the highest SV-OOA concentrations were detected in stable summertime nocturnal surface layers. Two nearby sawmills also played a significant role in SV-OOA production as also exemplified by previous studies at SMEAR II. POA, taken as a mix of two different OA types reported previously, hydrocarbon-like OA (HOA) and biomass burning OA (BBOA), made up a minimal OA mass fraction (IQR: 2 %, 6 %, and 13 %). Notably, the quantification of POA at SMEAR II using ACSM data was not possible following existing rolling PMF methodologies. Both POA organic mass fraction and mass concentration peaked in winter. Its appearance at SMEAR II was linked to strong southerly winds. Similar wind direction and speed dependence was not observed among other OA types. The high wind speeds probably enabled the POA transport to SMEAR II from faraway sources in a relatively fresh state. In the event of slower wind speeds, POA likely evaporated and/or aged into oxidized organic aerosol before detection. The POA organic mass fraction was significantly lower than reported by aerosol mass spectrometer (AMS) measurements 2 to 4 years prior to the ACSM measurements. While the co-located long-term measurements of black carbon supported the hypothesis of higher POA loadings prior to year 2012, it is also possible that short-term (POA) pollution plumes were averaged out due to the slow time resolution of the ACSM combined with the further 3 h data averaging needed to ensure good signal-to-noise ratios (SNRs). Despite the length of the ACSM data set, we did not focus on quantifying long-term trends of POA (nor other components) due to the high sensitivity of OA composition to meteorological anomalies, the occurrence of which is likely not normally distributed over the 8-year measurement period. Due to the unique and realistic seasonal cycles and meteorology dependences of the independent OA subtypes complemented by the reasonably low degree of unexplained OA variability, we believe that the presented data analysis approach performs well. Therefore, we hope that these results encourage also other researchers possessing several-year-long time series of similar data to tackle the data analysis via similar semi- or unsupervised machine-learning approaches. This way the presented method could be further optimized and its usability explored and evaluated also in other environments.