Abstract

This study aims to learn task-agnostic representations of narrative multimedia. The existing studies focused on only stories in the narrative multimedia without considering their physical features. We propose a method for incorporating multi-modal features of the narrative multimedia into a unified vector representation. For narrative features, we embed character networks as with the existing studies. Textual features can be represented using the LSTM (Long-Short Term Memory) autoencoder. We apply the convolutional autoencoder to visual features. The convolutional autoencoder also can be used for the spectrograms of audible features. To combine these features, we propose two methods: early fusion and late fusion. The early fusion method composes representations of features on each scene. Then, we learn representations of a narrative work by predicting time-sequential changes in the features. The late fusion method concatenates feature vectors that are trained for allover the narrative work. Finally, we apply the proposed methods on webtoons (i.e., comics that are serially published through the web). The proposed methods have been evaluated by applying the vector representations to predicting the preferences of users for the webtoons.

Full Text
Paper version not known

Talk to us

Join us for a 30 min session where you can share your feedback and ask us any queries you have

Schedule a call

Disclaimer: All third-party content on this website/platform is and will remain the property of their respective owners and is provided on "as is" basis without any warranties, express or implied. Use of third-party content does not indicate any affiliation, sponsorship with or endorsement by them. Any references to third-party content is to identify the corresponding services and shall be considered fair use under The CopyrightLaw.