The aim of this study was to test the feasibility and reliability of the Animal Welfare Indicators (AWIN) protocol for welfare assessment of dairy goats when applied to semi-extensive farming conditions. We recruited 13 farms located in the NW Italian Alps where three assessors individually and independently applied a modified version of the AWIN welfare assessment protocol for goats integrated with some indicators derived from the AWIN welfare assessment protocol for sheep. The applied protocol consisted of nine individual-level (body condition score, hair coat condition, abscesses, overgrown claws, udder asymmetry, fecal soiling, nasal discharge, ocular discharge, and improper disbudding) and seven group-level (severe lameness, Qualitative Behavior Assessment-QBA, thermal stress, oblivion, Familiar Human Approach Test-FHAT, synchrony at grazing, synchrony at resting) animal-based indicators. On most farms, the level of welfare was good. Many of the considered welfare problems (overgrown claws, fecal soiling, discharges, and thermal stress) were never recorded. However, oblivion, severe lameness, hair coat condition and abscesses were detected on some farms, with percentages ranging from 5 to 35%. The mean percentage of animals with normal body condition was 67.9 ± 5.7. The level of synchronization during resting was on average low (14.3 ± 7.2%). The application of the whole protocol required more than 4 h/farm and 3 min/goat. The inter-observer reliability varied from excellent (udder asymmetry, overgrown claws, discharges, synchrony at resting, use of shelter) to acceptable (abscesses, fecal soiling, and oblivion), but insufficient for hair coat condition, improper disbudding, synchrony at grazing, QBA. Differences in background of the assessors and feasibility constraints (i.e., use of binoculars in unfenced pastures, individual-level assessment conducted during the morning milking in narrow and dark pens, difficulties when using the scan and instantaneous sampling method due to the high number of animals that moved at the same time) can affect the reliability of data collection. Extensive training seems necessary for properly scoring animals when applying the QBA, whereas the FHAT to evaluate the Human-Animal Relationship of goats at pasture seems promising but needs to be validated. Indicators that evaluate the synchrony of activities require to be validated to identify the best moment to perform the observations during the day.