Abstract. Observations of aerosol and trace gases in the remote troposphere are vital to quantify background concentrations and identify long-term trends in atmospheric composition on large spatial scales. Measurements made at high altitude are often used to study free-tropospheric air; however such high-altitude sites can be influenced by boundary layer air masses. Thus, accurate information on air mass origin and transport pathways to high-altitude sites is required. Here we present a new method, based on the source–receptor relationship (SRR) obtained from backwards WRF-FLEXPART simulations and a k-means clustering approach, to identify source regions of air masses arriving at measurement sites. Our method is tailored to areas of complex terrain and to stations influenced by both local and long-range sources. We have applied this method to the Chacaltaya (CHC) GAW station (5240 m a.s.l.; 16.35∘ S, 68.13∘ W) for the 6-month duration of the “Southern Hemisphere high-altitude experiment on particle nucleation and growth” (SALTENA) to identify where sampled air masses originate and to quantify the influence of the surface and the free troposphere. A key aspect of our method is that it is probabilistic, and for each observation time, more than one air mass (cluster) can influence the station, and the percentage influence of each air mass can be quantified. This is in contrast to binary methods, which label each observation time as influenced by either boundary layer or free-troposphere air masses. Air sampled at CHC is a mix of different provenance. We find that on average 9 % of the air, at any given observation time, has been in contact with the surface within 4 d prior to arriving at CHC. Furthermore, 24 % of the air has been located within the first 1.5 km above ground level (surface included). Consequently, 76 % of the air sampled at CHC originates from the free troposphere. However, pure free-tropospheric influences are rare, and often samples are concurrently influenced by both boundary layer and free-tropospheric air masses. A clear diurnal cycle is present, with very few air masses that have been in contact with the surface being detected at night. The 6-month analysis also shows that the most dominant air mass (cluster) originates in the Amazon and is responsible for 29 % of the sampled air. Furthermore, short-range clusters (origins within 100 km of CHC) have high temporal frequency modulated by local meteorology driven by the diurnal cycle, whereas the mid- and long-range clusters' (>200 km) variability occurs on timescales governed by synoptic-scale dynamics. To verify the reliability of our method, in situ sulfate observations from CHC are combined with the SRR clusters to correctly identify the (pre-known) source of the sulfate: the Sabancaya volcano located 400 km north-west from the station.