Assessment of an individual factory worker's productivity and safety has been increasingly demanded in the manufacturing industry. Although the long-term accuracy of identifying and locating people are key for such applications, recent technologies such as vision-based person re-identification (Re-ID) and radio-based Real Time Location Systems (RTLS) do not meet the requirements in open-world settings. In addition, vision-based person Re-ID might cause privacy concerns as it utilizes a person's appearance for identification. This article explains a sensor fusion framework that combines vision deep-learning person detection and radio beacon ranging to match the requirements for open-world settings and preserves privacy through an opt-out option. Its scalability is evaluated via Rank-1 person identification accuracy with images generated by a three-dimensional (3D) game engine to conduct hard-to-reproduce extreme scenarios. The representative Rank-1 is 88.7% with 4 people for low-latency applications ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e.g.</i> personalized geofencing) and 90.8% with 8 people for latency-tolerant applications ( <italic xmlns:mml="http://www.w3.org/1998/Math/MathML" xmlns:xlink="http://www.w3.org/1999/xlink">e.g.</i> individual worker's productivity analysis).