NLP x RecSys

UPDATE 5/14: Our SOTA model (>0.6 item recall now updated on OtakuRoll)

UPDATE 5/4: Submitted our SOTA results for publication!

UPDATE 4/3: Got valuable feedback from Reddit roast thread *initiate search for devs*…

UPDATE 3/28: Launched! Sign up for free at

UPDATE 3/24: We received back feedback for our final project report. We’ll now be working hard towards publication at RecSys 2021!

UPDATE 3/1: DeepAniNet (our model) is now live! Try it on the site.

UPDATE 2/21: Hacked together a prototype with a well known baseline algorithm. Try it out now!

UPDATE 2/16: We handed in our project proposal, so it’s time to start building models! We’re implementing a former NeurIPS paper.

UPDATE 2/07: We finished scraping our dataset, and am converging on a paper we want to emulate.

(This spun out of a CS224N team project.)


Most recommendation engines are trained on the basis of a large user-item (preference) matrix. Along with matrix completion techniques like RPCA, these give rise to traditional algorithms like collaborative filtering.

However, I believe for higher forms of art like anime, independence assumptions are not fulfilled as anime recommendations inherently travel by word of mouth. Internet forums create huge chain reaction effects that inflates viewership of bad anime while good ones often slip under the radar.

To take the best of both worlds, we want to pioneer a NLP-driven recommendation system that leverages representations of Transformer Encoders like BERT for the basis of content embeddings.


Our goal is to first train a good language model from Internet forum data, Crunchyroll reviews, and show descriptions. That becomes the basis for an actual recommendation engine which takes user submitted forms of favorite set of shows that look like:

user 1: {naruto, bleach, fullmetal alchemist, etc.}

user 2: {hunter x hunter, one piece, naruto, etc.}

We can embed users by the shows they enjoyed, showing off our representations, then feed it to the downstream task of reconstructing relevances and retrieving nearest neighbors. There’re many ways to go about this, but we want to combine ideas from:

  • Matrix factorization
    • Dense latent vectors of users/shows
    • Support for cold start (new) users/shows
  • Representation Learning
    • Encodings of shows’ content
    • Hybrid loss functions
      • Same ones as word2vec for embeddings
      • Regression losses like RMSE for relevance
      • Reconstruction losses for latent vectors
  • Online Learning
    • Retraining latent representations
    • Fine-tuning embedding layer over time

The challenge will be how we can support many different ideas while still optimizing one fully end-to-end network. We hope to come up with something innovative!


TLDR; We achieve amazing results (>60% item recall@100) on our dataset, coined AnimeULike, and equivalent SOTA results on CiteULike (for scientific articles). In other words, our model returns >60% of animes you will watch in the top 100 results out of a database of 10000 animes. Moreover, the recommendations are far more diverse than WMF and Top-k CF:

More on this to come as our manuscript gets reviewed.

%d bloggers like this: