A Semantic Vector Representation of Illustrations
Unsplashed background img 1


Referring to existing illustrations helps novice drawers to realize their ideas. In order to find such helpful references from a large image collection, we first build a semantic vector representation of illustrations by training convolutional neural networks. As the proposed vector space correctly reflects the semantic meanings of illustrations, users can efficiently search for references with similar attributes. Besides the conventional image-based search that finds references with a single query, a semantic morphing algorithm that searches the intermediate illustrations that gradually connect two queries is proposed in this paper. Several experiments were conducted to demonstrate the effectiveness of the proposed methods.

Pre-trained model

Name Link
Pre-trained model for tag prediction, version 2.0 Download
Pre-trained model for extracting feature vectors, version 2.0 Download
Pre-trained model for tag prediction, version 1.0 Download
Pre-trained model for extracting feature vectors, version 1.0 Download
Network configuration file (tag prediction) Download
Network configuration file (feature vectors) Download
Image mean file of our dataset Download
List of tags used for tag prediction (JSON format) Download

We provide two pre-trained models used in our experiments; the first one is the Convolutional Neural Network (CNN) model that estimates 1,539 tags representing semantic attributes of illustrations, and the second one is another CNN model that outputs a 4,096-dimensional feature vector from a given illustration. We consider that these models are especially useful for the following users: (1) engineers who want to estimate some semantic tags (e.g., short hair, smile, sitting, etc.) from illustrations and comics, (2) programmers who want to implement the system that detects pornographic illustrations, and (3) researchers who are interested in illustrations or animations, and want to extract semantic feature vectors.

Besides the original pre-trained models used in our main paper (version 1.0), we also provide a recent version of pre-trained models improving the accuracy by spending much more time in training the CNN (version 2.0). The qualitative result is shown in the following table. We recommend you to use version 2.0 if you simply want a high-performance model.

Model name General Character Copyright Rating
Pre-trained VGG model (16 layers) 10.1 4.05 6.46 59.3
Illustration2Vec (version 1.0) 32.2 27.8 57.4 85.9
Illustration2Vec (version 2.0) 32.5 32.9 61.4 85.8

Mean average precision of four tag categories (higher is better). The details are shown in our main paper.

How to use

In order to use our pre-trained models, either caffe or chainer framework is required. For users who are not familiar with caffe(or chainer), we put a simple python library that predicts semantic tags and also extracts feature vectors to GitHub. Please refer to the project page for details.


The pre-trained models and the other files we have provided are licensed under the MIT License. In other words, you can freely use our models regardless of academic use or commercial use.

Warning: In terms of character tags and copyright tags, original copyright holders retain their rights. Please be careful when you consider explicitly using their productions.


Please cite our paper in your publications if it helps your research:

  Author = {Saito, Masaki and Matsui, Yusuke},
  booktitle = {SIGGRAPH Asia Technical Briefs},
  Title = {Illustration2Vec: A Semantic Vector Representation of Illustrations},
  Year = {2015}