To overcome of the high cost of developing IoT (Internet of Things) services by vertically integrating devices and services, Open IoT has been developed to enable various IoT services to be developed by integrating horizontally separated devices and services. For Open IoT, we have proposed Tacit Computing technology to discover the devices that can provide the data users need on demand and use them dynamically. We have also proposed an automatic GPU (graphics processing unit) offloading method as an elementary technology of Tacit Computing. However, our GPU offloading method can improve only a limited number of applications because it only optimizes the extraction of parallelizable loop statements. Therefore, in this paper, to improve performances of more applications automatically, we propose an improved GPU offloading method with fewer data transfers between the CPU and GPU that can improve performance of many IoT applications. We evaluate our proposed GPU offloading method by applying it to Darknet and Fourier Transform, which are general large applications for CPU, and find that it can process them 3 times and 5 times as quickly as only using CPUs within 10-hour tuning time.