Early feed-forward neural methods of arbitrary image style transfer utilized the encoded feature map up to its second-order statistics, i.e., mean and variance (or covariance) of the encoded feature map. Recent methods have begun to utilize feature statistics beyond the 2nd-order statistics, e.g., a higher-order moment, slicing-based method, and histogram matching for better style representation. However, even in the distribution matching of the latest method, considering the channel-correlation and the order of duplicated values in the feature map is missing. This resulted in insufficient style transfer or artifacts on style-transferred images e.g., unexpected color or pattern co-occurrence, skewed color or style pattern into one side, noise between background and foreground, etc. In this work, we show how to make a feature transform layer to consider both channel-correlation and duplicated values in the feature map for exact feature distribution matching in style transfer. Our experimental results prove that the stylized images obtained with our method are more similar to the target style images qualitatively without losing content clearness and superior to the existing methods quantitatively in all possible style measures, e.g., 0.0311 lower sliced Wasserstein distance than the previous method and the most favored choice with 25.54% preference.
Read full abstract