Language and image processing in social media and fashion e-commerce
Susana Zoghbi
Katholieke Universiteit Leuven, Belgium
: J Comput Eng Inf Technol
Abstract
Statement of the Problem: It has been our longstanding dream to develop machines that understand languages and images just as humans do. This goal heightens by the explosion of data generation from social media and e-commerce. This talk addresses two tasks: Developing algorithms that process language from social media in a product retrieval setting and jointly processing images and language for fine-grained attribute recognition in the fashion e-commerce domain. These are challenging tasks because usergenerated content is extremely noisy and unstructured. Methodology & Theoretical Orientation: We develop novel textual representations based on the combination of deep learning and the family of latent Dirichlet allocation (LDA) models. Our core insight is that we can learn representations that allow us to connect images and language by leveraging pairs of aligned documents as found on the wild web. Our proposed multi-idiomatic latent Dirichlet allocation (MiLDA) model explicitly takes into account the shared topic distribution between sources, while modeling both the differences and similarities in the language. In addition, our neural network (NN) learns to embed images and text into a shared low-dimensional space where related concepts occupy close-by regions. Conclusion & Significance: We created statistical models that learn to bridge the cross-idiomatic gap between social media and e-commerce, directly from widely available cheap data on the web. Additionally, our models allow us to semantically connect images and language. We illustrate by the task of cross-modal search, i.e., given a query image, we retrieve words that describe the visual content (image to text), and given a set of textual descriptors, we find images that display such attributes (text to image).
Biography
Email: susana.zoghbi@cs.kuleuven.be