10.1145/3308558.3313614acmotherconferencesArticle/Chapter ViewAccess DenialPublication PageswwwConference Proceedings

Improving Outfit Recommendation with Co-supervision of Fashion Generation


The task of fashion recommendation includes two main challenges: visual understanding and visual matching. Visual understanding aims to extract effective visual features. Visual matching aims to model a human notion of compatibility to compute a match between fashion items. Most previous studies rely on recommendation loss alone to guide visual understanding and matching. Although the features captured by these methods describe basic characteristics (e.g., color, texture, shape) of the input items, they are not directly related to the visual signals of the output items (to be recommended). This is problematic because the aesthetic characteristics (e.g., style, design), based on which we can directly infer the output items, are lacking. Features are learned under the recommendation loss alone, where the supervision signal is simply whether the given two items are matched or not.

To address this problem, we propose a neural co-supervision learning framework, called the FAshion Recommendation Machine (FARM). FARM improves visual understanding by incorporating the supervision of generation loss, which we hypothesize to be able to better encode aesthetic information. FARM enhances visual matching by introducing a novel layer-to-layer matching mechanism to fuse aesthetic information more effectively, and meanwhile avoiding paying too much attention to the generation quality and ignoring the recommendation performance.

Extensive experiments on two publicly available datasets show that FARM outperforms state-of-the-art models on outfit recommendation, in terms of AUC and MRR. Detailed analyses of generated and recommended items demonstrate that FARM can encode better features and generate high quality images as references to improve recommendation performance.

Get full access to this Publication

Purchase, subscribe or recommend this publication to your librarian.


  1. Timo Ahonen, Abdenour Hadid, and Matti Pietikainen. 2006. Face Description with Local Binary Patterns: Application to Face Recognition. IEEE Trans on Pattern Analysis and Machine Intelligence (TPAMI) 28, 12(2006). Google ScholarGoogle Scholar
  2. Artem Babenko, Anton Slesarev, Alexander Chigorin, and Victor S. Lempitsky. 2014. Neural Codes for Image Retrieval. In European Conf. on Computer Vision (ECCV'14).Google ScholarGoogle Scholar
  3. Jianmin Bao, Dong Chen, Fang Wen, Houqiang Li, and Gang Hua. 2017. CVAE-GAN: Fine-Grained Image Generation through Asymmetric Training. In International Conf. on Computer Vision (ICCV'17). 2764-2773.Google ScholarGoogle Scholar
  4. David M. Blei, Alp Kucukelbir, and Jon D. Mcauliffe. 2017. Variational Inference: A Review for Statisticians. Journal of the American Statistical Association (JASA) 112, 518(2017).Google ScholarGoogle Scholar
  5. Lubomir D. Bourdev, Subhransu Maji, and Jitendra Malik. 2011. Describing people: A Poselet-based Approach to Attribute Classification. In International Conf. on Computer Vision (ICCV'11). 1543-1550. Google ScholarGoogle Scholar
  6. Lei Cai, Hongyang Gao, and Shuiwang Ji. 2017. Multi-Stage Variational Auto-Encoders for Coarse-to-Fine Image Generation. CoRR abs/1705.07202(2017). http://arxiv.org/abs/1705.07202Google ScholarGoogle Scholar
  7. Ken Chatfield, Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. 2014. Return of the Devil in the Details: Delving Deep into Convolutional Nets. In British Machine Vision Conf. (BMVC'14).Google ScholarGoogle Scholar
  8. Tianqi Chen, Mu Li, Yutian Li, Min Lin, Naiyan Wang, Minjie Wang, Tianjun Xiao, Bing Xu, Chiyuan Zhang, and Zheng Zhang. 2015. MXNet: A Flexible and Efficient Machine Learning Library for Heterogeneous Distributed Systems. In Annual Conf. on Neural Information Processing Systems (NIPS'15).Google ScholarGoogle Scholar
  9. Navneet Dalal and Bill Triggs. 2005. Histograms of Oriented Gradients for Human Detection. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR'05). 886-893. Google ScholarGoogle Scholar
  10. Xavier Glorot and Yoshua Bengio. 2010. Understanding the Difficulty of Training Deep Feedforward Neural Networks. Journal of Machine Learning Research (JMLR) 9 (2010), 249-256.Google ScholarGoogle Scholar
  11. Xintong Han, Zuxuan Wu, Yu-Gang Jiang, and Larry S. Davis. 2017. Learning Fashion Compatibility with Bidirectional LSTMs. In ACM International Conf. on Multimedia (MM'17). 1078-1086. Google ScholarGoogle Scholar
  12. Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. 2016. Deep Residual Learning for Image Recognition. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR'16).Google ScholarGoogle Scholar
  13. Wei Lin Hsiao and Kristen Grauman. 2018. Creating Capsule Wardrobes from Fashion Images. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR'18).Google ScholarGoogle Scholar
  14. Yang Hu, Xi Yi, and Larry S. Davis. 2015. Collaborative Fashion Recommendation: A Functional Tensor Factorization Approach. In ACM International Conf. on Multimedia (MM'15). 129-138. Google ScholarGoogle Scholar
  15. Tomoharu Iwata, Shinji Watanabe, and Hiroshi Sawada. 2011. Fashion Coordinates Recommender System Using Photographs from Fashion Magazines. In International Joint Conf. on Artificial Intelligence (IJCAI'11). 2262-2267. Google ScholarGoogle Scholar
  16. Vignesh Jagadeesh, Robinson Piramuthu, Anurag Bhardwaj, Wei Di, and Neel Sundaresan. 2014. Large Scale Visual Recommendations from Street Fashion Images. In ACM Knowledge Discovery and Data Mining (KDD'14). 1925-1934. Google ScholarGoogle Scholar
  17. Gareth James, Daniela Witten, Trevor Hastie, and Robert Tibshirani. 2013. An Introduction to Statistical Learning. Springer. Google ScholarGoogle Scholar
  18. Shatha Jaradat. 2017. Deep Cross-Domain Fashion Recommendation. In ACM Conf. on Recommender Systems (RecSys'17). 407-410. Google ScholarGoogle Scholar
  19. Yangqing Jia, Evan Shelhamer, Jeff Donahue, Sergey Karayev, Jonathan Long, Ross Girshick, Sergio Guadarrama, and Trevor Darrell. 2014. Caffe: Convolutional Architecture for Fast Feature Embedding. In ACM International Conf. on Multimedia (MM'14). 675-678. Google ScholarGoogle Scholar
  20. Wang-Cheng Kang, Chen Fang, Zhaowen Wang, and Julian McAuley. 2017. Visually-Aware Fashion Recommendation and Design with Generative Image Models. In International Conf. on Data Mining (ICDM'17). 207-216.Google ScholarGoogle Scholar
  21. M. Hadi Kiapour, Xufeng Han, Svetlana Lazebnik, Alexander C. Berg, and Tamara L. Berg. 2015. Where to Buy It: Matching Street Clothing Photos in Online Shops. In International Conf. on Computer Vision (ICCV'15). 3343-3351. Google ScholarGoogle Scholar
  22. Diederik P. Kingma and Jimmy Ba. 2015. Adam: A Method for Stochastic Optimization. In International Conf. on Learning Representations (ICLR'15). http://arxiv.org/abs/1412.6980Google ScholarGoogle Scholar
  23. Diederik P. Kingma and Max Welling. 2014. Auto-encoding Variational Bayes. In International Conf. on Learning Representations (ICLR'14).Google ScholarGoogle Scholar
  24. Christian Ledig, Lucas Theis, Ferenc Huszár, Jose Caballero, Andrew Cunningham, Alejandro Acosta, Andrew Aitken, Alykhan Tejani, Johannes Totz, and Zehan Wang. 2017. Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR'17). 105-114.Google ScholarGoogle Scholar
  25. Jing Li, Pengjie Ren, Zhumin Chen, Zhaochun Ren, Tao Lian, and Jun Ma. 2017. Neural Attentive Session-based Recommendation. In ACM International Conf. on Information and Knowledge Management (CIKM'17). 1419-1428. Google ScholarGoogle Scholar
  26. Yuncheng Li, Liangliang Cao, Jiang Zhu, and Jiebo Luo. 2017. Mining Fashion Outfit Composition Using an End-to-End Deep Learning Approach on Set Data. IEEE Transactions on Multimedia (TMM) 19, 8 (2017), 1946-1955.Google ScholarGoogle Scholar
  27. Kevin Lin, Huei Fang Yang, Jen Hao Hsiao, and Chu Song Chen. 2015. Deep Learning of Binary Hash Codes for Fast Image Retrieval. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR'15). 27-35.Google ScholarGoogle Scholar
  28. Yujie Lin, Pengjie. Ren, Zhumin Chen, Zhaochun Ren, Jun Ma, and Maarten de Rijke. 2018. Explainable Fashion Recommendation with Joint Outfit Matching and Comment Generation. CoRR abs/1806.08977(2018).Google ScholarGoogle Scholar
  29. Si Liu, Jiashi Feng, Zheng Song, Tianzhu Zhang, Hanqing Lu, Changsheng Xu, and Shuicheng Yan. 2012. Hi, Magic Closet, Tell Me What to Wear!. In ACM International Conf. on Multimedia (MM'12). 619-628. Google ScholarGoogle Scholar
  30. Yihui Ma, Jia Jia, Suping Zhou, Jingtian Fu, Yejun Liu, and Zijian Tong. 2017. Towards Better Understanding the Clothing Fashion Styles: A Multimodal Deep Learning Approach. In AAAI Conf. on Artificial Intelligence (AAAI'17). 38-44. Google ScholarGoogle Scholar
  31. Julian McAuley, Christopher Targett, Qinfeng Shi, and Anton van den Hengel. 2015. Image-Based Recommendations on Styles and Substitutes. In International Conf. on Research on Development in Information Retrieval (SIGIR'15). 43-52. Google ScholarGoogle Scholar
  32. Takuma Nakamura and Ryosuke Goto. 2018. Outfit Generation and Style Extraction via Bidirectional LSTM and Autoencoder. In ACM Knowledge Discovery and Data Mining (KDD'18).Google ScholarGoogle Scholar
  33. Jose Oramas and Tinne Tuytelaars. 2016. Modeling Visual Compatibility through Hierarchical Mid-level Elements. CoRR (2016). http://arxiv.org/abs/1604.00036Google ScholarGoogle Scholar
  34. Razvan Pascanu, Tomas Mikolov, and Yoshua Bengio. 2013. On the Difficulty of Training Recurrent Neural Networks. In International Conf. on Machine Learning (ICML'13). III-1310-III-1318. Google ScholarGoogle Scholar
  35. Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. 2009. BPR: Bayesian Personalized Ranking from Implicit Feedback. In International Conf. on Uncertaintyin Artificial Intelligence (UAI'09). 452-461. Google ScholarGoogle Scholar
  36. Steffen Rendle and Lars Schmidt-Thieme. 2010. Pairwise Interaction Tensor Factorization for Personalized Tag Recommendation. In ACM International Conf. on Web Search and Data Mining (WSDM'10). 81-90. Google ScholarGoogle Scholar
  37. Danilo Jimenez Rezende, Shakir Mohamed, and Daan Wierstra. 2014. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. In International Conf. on Machine Learning (ICML'14). 1278-1286. Google ScholarGoogle Scholar
  38. Edgar Simo-Serra, Sanja Fidler, Francesc Moreno-Noguer, and Raquel Urtasun. 2015. Neuroaesthetics in fashion: Modeling the perception of fashionability. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR'15), Vol. 00. 869-877.Google ScholarGoogle Scholar
  39. Karen Simonyan and Andrew Zisserman. 2015. Very Deep Convolutional Networks for Large-Scale Image Recognition. In International Conf. on Learning Representations (ICLR'15).Google ScholarGoogle Scholar
  40. Xuemeng Song, Fuli Feng, Xianjing Han, Xin Yang, Wei Liu, and Liqiang Nie. 2018. Neural Compatibility Modeling with Attentive Knowledge Distillation. In International Conf. on Research on Development in Information Retrieval (SIGIR'18). Google ScholarGoogle Scholar
  41. Xuemeng Song, Fuli Feng, Jinhuan Liu, Zekun Li, Liqiang Nie, and Jun Ma. 2017. NeuroStylist: Neural Compatibility Modeling for Clothing Matching. In ACM International Conf. on Multimedia (MM'17). 753-761. Google ScholarGoogle Scholar
  42. Nitish Srivastava, Geoffrey Hinton, Alex Krizhevsky, Ilya Sutskever, and Ruslan Salakhutdinov. 2014. Dropout: A Simple Way to Prevent Neural Networks from Overfitting. Journal of Machine Learning Research (JMLR) 15, 1 (2014), 1929-1958. Google ScholarGoogle Scholar
  43. Andreas Veit, Balazs Kovacs, Sean Bell, Julian Mcauley, Kavita Bala, and Serge Belongie. 2015. Learning Visual Clothing Style with Heterogeneous Dyadic Co-Occurrences. In International Conf. on Computer Vision (ICCV'15). 4642-4650. Google ScholarGoogle Scholar
  44. Kota Yamaguchi. 2012. Parsing Clothing in Fashion Photographs. In IEEE Conf. on Computer Vision and Pattern Recognition (CVPR'12). 3570-3577. Google ScholarGoogle Scholar
  45. Kota Yamaguchi, M. Hadi Kiapour, Luis E. Ortiz, and Tamara L. Berg. 2015. Retrieving Similar Styles to Parse Clothing. IEEE Trans on Pattern Analysis and Machine Intelligence (TPAMI) 37, 5(2015), 1028-1040.Google ScholarGoogle Scholar
  46. Wenhui Yu, Huidi Zhang, Xiangnan He, Xu Chen, Li Xiong, and Zheng Qin. 2018. Aesthetic-based Clothing Recommendation. In International World Wide Web Conferences (WWW'18). 649-658. Google ScholarGoogle Scholar
  47. Matthew D. Zeiler and Rob Fergus. 2014. Visualizing and Understanding Convolutional Networks. In European Conf. on Computer Vision (ECCV'14). 818-833.Google ScholarGoogle Scholar
  48. Matthew D. Zeiler, Graham W. Taylor, and Rob Fergus. 2011. Adaptive Deconvolutional Networks for Mid and High Level Feature Learning. In International Conf. on Computer Vision (ICCV'11). 2018-2025. Google ScholarGoogle Scholar
  49. Han Zhang, Tao Xu, Hongsheng Li, Shaoting Zhang, Xiaogang Wang, Xiaolei Huang, and Dimitris Metaxas. 2017. StackGAN: Text to Photo-Realistic Image Synthesis with Stacked Generative Adversarial Networks. In International Conf. on Computer Vision (ICCV'17). 5908-5916.Google ScholarGoogle Scholar
  50. Hanwang Zhang, Zheng-Jun Zha, Yang Yang, Shuicheng Yan, Yue Gao, and Tat-Seng Chua. 2013. Attribute-augmented Semantic Hierarchy: Towards Bridging Semantic Gap and Intention Gap in Image Retrieval. In ACM International Conf. on Multimedia (MM'13). 33-42. Google ScholarGoogle Scholar


Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Sign in

HTML Format

View this article in HTML Format .

View HTML Format