The technology of visual servoing, with the digital twin as its driving force, holds great promise and advantages for enhancing the flexibility and efficiency of smart manufacturing assembly and dispensing applications. The effective deployment of visual servoing is contingent upon the robust and accurate estimation of the vision-motion correlation. Network-based methodologies are frequently employed in visual servoing to approximate the mapping between 2D image feature errors and 3D velocities, offering promising avenues for improving the accuracy and reliability of visual servoing systems. These developments have the potential to fully leverage the capabilities of digital twin technology in the realm of smart manufacturing. However, obtaining sufficient training data for these methods is challenging, and thus improving model generalization to reduce data requirements is imperative. To address this issue, we offer a learning-based approach for estimating Jacobian matrices of visual servoing that organically combines an extreme learning machine (ELM) and a differential evolutionary algorithm (DE). In the first stage, the pseudoinverse of the image Jacobian matrix is approximated using the ELM, which solves the problems associated with traditional visual servoing and is resistant to outside influences such as image noise and mistakes in camera calibration. In the second stage, differential evolution is utilized to select input weights and hidden layer bias and to determine ELM’s output weights. Experimental results conducted on a digital twin operating platform for 4-DOF robot with an eye-in-hand configuration demonstrate better performance than classical visual servoing and traditional ELM-based visual servoing in various cases.