Coincidence measurement has become an emerging technique for optical imaging. Based on measuring the second-order coherence g2, sample features such as reflection/transmission amplitude and phase delay can be extracted with developed algorithms pixel-by-pixel. However, an accurate measurement of g2 requires a substantial number of collected photons which becomes difficult under low-light conditions. Here, we propose a deep-learning approach for Jones matrix imaging using photon arrival data directly. A variational autoencoder (β-VAE) is trained using numerical data in an unsupervised manner to obtain a minimal data representation, which can be transformed into an image with little effort. We demonstrate as few as 88 photons collected per pixel on average to extract a Jones matrix image, with accuracy surpassing previous semi-analytic algorithms derived from g2. Our approach not only automates formulating imaging algorithms but can also assess the sufficiency of information from a designed experimental procedure, which can be useful in equipment or algorithm designs for a wide range of imaging applications.