Volcanic tremors are often observed during volcanic activity and volcanic eruptions, and their generation processes provide clues for understanding volcanic fluid activity underground and eruption dynamics. However, tremors are characterized by continuous oscillations that mask P- and S-waves; hence few studies have precisely located the source, which is the most fundamental information for understanding the generation mechanism. In this study, we focus on volcanic tremors excited by continuous gas emissions occurring at a vent called Y2a in Iwo-Yama, the Kirishima Volcanic Complex, Japan, to clarify the source process of the tremor as well as gas emission activity. We simultaneously observed the volcanic tremor by deploying a small aperture array consisting of six seismometers and the gas emission activity by using a newly developed visual IoT system that can be operated without commercial electricity. MUSIC analysis locates the tremor at depths ranging from the ground surface to approximately 200 m beneath the Y2a and Y2b vents, which are approximately 30 m apart, for approximately four months from November 2021 to February 2022. The source locations of the tremors in the 2 Hz (1.2–2.6 Hz), 4 Hz (3–4 Hz), and 5 Hz (4–5.5 Hz) ranges show some differences and changes with time. The source location tends to become deeper when the 2 Hz amplitude is large. The infrasound generated by gas emission activity is dominant in the tremor signals, which are recognized in the wave propagation velocity with an acoustic velocity of 330 m/s when the 2 Hz amplitude is small. The visual IoT system succeeded in detecting long-term changes in the gas emission activity, and we found that the 2 Hz amplitude of tremor was well correlated with the amount of hot water in the boiling pool of Y2a, which was controlled by precipitation and evaporation during non-rainy days. From these observations, we infer that the volcanic tremor is generated by resonance of volcanic gas and hot water in a crack-like structure beneath Y2a. The resonance was triggered by the counterforces of the gas emissions in the boiling pool, and the infrasound was dominant during periods of hot water depletion in the boiling pool. Temporal changes in the source depths may be caused by changes in the fluid properties, configuration of the resonator and/or the strengths of the underground sources and infrasound. Our simultaneous observations of seismic array and visual IoT system clarify that even the continuous gas emission activity that looks stable is controlled by external sources such as precipitation.