Digital twins are increasingly useful in metaverse-related applications, but their construction process usually involves deep technical expertise and costly resources. This position paper demonstrates an early prototype (see Figure 1) of a web-based digital twin authoring system to enable untrained users to collaboratively build digital twin environments. The system aims to explore the combination of photogrammetry and GAN-based machine learning models to enable near-real-time collaboration between capture client (scanning objects using common smartphone cameras) and editing client (constructing 3D scenes on thin client devices) users.