To externally validate a hidden Markov model (HMM) for classifying gamma analysis results of in vivo electronic portal imaging device (EPID) measurements into different categories of anatomical change for lung cancer patients. Additionally, the relationship between HMM classification and deviations in dose-volume histogram (DVH) metrics was evaluated. The HMM was developed at CHU de Québec (CHUQ), and trained on features extracted from gamma analysis maps of in vivo EPID measurements from 483 fractions (24 patients, treated with three-dimensional 3D-CRT or intensity modulated radiotherapy), using the EPID measurement of the first treatment fraction as reference. The model inputs were the average gamma value, standard deviation, and average value of the highest 1% of gamma values, all averaged over all beams in a fraction. The HMM classified each fraction into one of three categories: no anatomical change (Category 1), some anatomical change (no clinical action needed, Category 2) and severe anatomical change (clinical action needed, Category 3). The external validation dataset consisted of EPID measurements from 263 fractions of 30 patients treated at Maastro with volumetric modulated arc therapy (VMAT) or hybrid plans (containing both static beams and VMAT arcs). Gamma analysis features were extracted in the same way as in the CHUQ dataset, by using the EPID measurement of the first fraction as reference (γQ), and additionally by using an EPID dose prediction as reference (γM). For Maastro patients, cone beam computed tomography (CBCT) scans and image-guided radiotherapy (IGRT) classification of these images were available for each fraction. Contours were propagated from the planning CT to the CBCTs, and the dose was recalculated using a Monte Carlo dose engine. Dose-volume histogram metrics for targets and organs-at-risk (OARs: lungs, heart, mediastinum, spinal cord, brachial plexus) were extracted for each fraction, and compared to the planned dose. HMM classification of the external validation set was compared to threshold classification based on the average gamma value alone (a surrogate for clinical classification at CHUQ), IGRT classification as performed at Maastro, and differences in DVH metrics extracted from 3D dose recalculations on the CBCTs. The HMM achieved 65.4%/65.0% accuracy for γQ and γM, respectively, compared to average gamma threshold classification. When comparing HMM classification with IGRT classification, the overall accuracy was 29.7% for γQ and 23.2% for γM. Hence, HMM classification and IGRT classification of anatomical changes did not correspond. However, there is a trend towards higher deviations in DVH metrics with classification into higher categories by the HMM for large OARs (lungs, heart, mediastinum), but not for the targets and small OARs (spinal cord, brachial plexus). The external validation shows that transferring the HMM for anatomical change classification to a different center is challenging, but can still be valuable. The HMM trained at CHUQ cannot be used directly to classify anatomical changes in the Maastro data. However, it may be possible to use the model in a different capacity, as an indicator for changes in the 3D dose based on two-dimensional EPID measurements.