In the industry 4.0, factories around the world grow automated and intelligent, and where smart camera plays an important role. Smart camera is equipped with processor, memory, communication interface, and operating system, so it can process large amounts of data in advance to assist follow-up automatic inspection and judgment. Additionally, since smart camera is an independent system, it will not affect the original system of factories, which is an immense advantage in troubleshooting. Besides, thanks to technology breakthroughs in recent years, using Graphics Processing Unit (GPU) to implementing tons of parallel computing helps to significantly boost the overall efficiency. Therefore, when a rising number of factories consider improving production capacity of production lines, how to use GPU to assist the improvement is an important issue. Based on this scenario, this paper used NVidia Tegra TX1 platform with 256 GPU CUDA cores and Quad-core ARM Cortex A57 processor and Basler USB 3.0 industrial camera to simulate a smart industrial camera, which has GPU and can perform a myriad of complex computations. This paper designed how to recognize and count objects in a real time manner in a high-speed industrial inspection environment with large volumes of data, so as to verify the concept (smart camera with GPU cores) we proposed. The experimental results proved our ideas, and the software design architecture provided in this paper is a simple and efficient design. In the future application in the Internet of Things or the Internet of Everything, this structure can be a valuable reference.