In case you didn’t get the link, here‘s the live site, with a video demo (sign up should work, if not login with username: “email@example.com” and password: “zzz”.
UPDATE 10/3: Our AWS compute credits ran out. The instance has been stopped.
UPDATE 2/20: You can view the legacy codebase here.
I met one of my close friends of today, Girish Kumar, at Pear Garage, a startup accelerator where I was fortunate to be amongst the two dozen selected out of hundreds Stanford-affiliated undergrads, grads, and PhD students. The goal was to “pear” up and hack on a project through the year with their mentorship (but most importantly Google/AWS compute credits). Like myself (see end section of my resume), Girish was a science fair enthusiast in high school, who was a Google Science Fair finalist for building an automated quiz generator back in 2016, so we bonded not over that but also the potential to improve on the work he had with a fully deep learning pipeline (more suited to our times).
(Girish and I have since moved onto bigger things. This project is mostly legacy, so I should be able to share this information. The code repo, however, remains private.)
A fully end-to-end deep learning pipeline
- Sentence classifier
The sentence classifier trained with the Stanford Question Answering Dataset (SQuAD) dataset that takes in as input the embedding obtained from Google’s Universal Sentence Encoder and outputs a probability the sentence makes a good question. Later experiments showed improvement with incorporation of context within a hyper-parameter-set sliding window of words before/after the sentence.
- Gap classifier
The gap classifier, trained with the same dataset, iterates through each word in the encoded sentences candidates, takes as input its embedding, and classifies its candidacy as a blank for the MCQ question.
- NN search for distractors
With the sentence and gap candidates, the final step is do a nearest-neighbors search on a word2vec dictionary we trained with a biology textbook. Then all distractors go through post-processing (i.e. make sure they are the same POS as the blanked out word) and the generated distractors are the top four in distance similarity.
A full stack application
For those who actually appreciate web development and the part of the project I myself devoted the most time to, we used the Django REST-api framework with token authentication as the backend, connected to PostgreSQL on an Amazon RDS instance. For the frontend, we built it in ReactJS (which I had good experience in) but ultimately the state updates caused the code to turn very spaghetti (again, I’m not an expert at this…) and we ditched it for VueJS for our current site, a simpler framework that served the purpose.
Why train on a biology textbook?
This ties into the motivation in that we targeted medical students to start with, as they go through the most rote memorization in preparing for exams like the GMAT/First Step.
It was encouraging to iterate our pipeline and take ownership over both the entire full stack web app and the deep learning pipeline. Quizkly got to the point where, just from a quality standpoint, the questions were good enough a good fraction of them were indistinguishable from real practice resources, according to the two Medical students we Beta-tested it with.
U.n.f.o.r.t.u.n.a.t.e.l.y, the app did not gain traction with Medical students because of two main reasons:
- Popular alternatives like Anki, which already has a huge database of M.D. approved practice questions, already exist. We realized they were generally risk averse and stuck to tried-and-true methods.
- The reception we got was primarily, “cool idea, might use it for free but won’t pay for it”, whereas we held expectations for a subscription pricing model like Quizlet. With just the two of us technical people building this as a service, we couldn’t justify continuing to work on it if it couldn’t bring immediate value.