Meta’s Belebele: The Marvelous Breakthrough in Machine Reading Comprehension! 🌟📚🤖

The Journey
2 min readSep 14, 2023

Hello…ello…! Recently, Meta released Belebele! A multiple-choice machine reading comprehension (MRC) dataset spanning 122 language variants. This dataset enables the evaluation of mono- and multi-lingual models in high-, medium-, and low-resource languages.

Each question has four multiple-choice answers and is linked to a short passage from the FLORES-200 dataset.

The human annotation procedure was carefully curated to create questions that discriminate between different levels of generalizable language comprehension and is reinforced by extensive quality checks.

Graphics Credits: Medium

While all questions directly relate to the passage, the English dataset on its own proves difficult enough to challenge state-of-the-art language models. Being fully parallel, this dataset enables a direct comparison of model performance across all languages.

Composition

  • 900 questions per language variant
  • 488 distinct passages, there are 1–2 associated questions for each.
  • Each question has 4 multiple-choice answers, exactly 1 of which is correct.
  • 122 language/language variants (including English).
  • 900 x 122 = 109,800 total questions.

Training Set

The Belebele dataset is intended to be used only as a test set, and not for training or validation. Therefore, for models requiring additional task-specific training, propose using an assembled training set consisting of samples from pre-existing multiple-choice QA datasets in English.

Researchers considered diverse datasets and determined the most compatible ones: RACE, SciQ, MultiRC, MCTest, MCScript2.0, and ReClor.

For each of the six datasets, unpack and restructure the passages and questions from their respective formats. Then filter out less suitable samples (e.g. questions with multiple correct answers).

In the end, the dataset comprises 67.5k training samples and 3.7k development samples, more than half of which are from RACE. Meta provided a script (assemble_training_set.py) to reconstruct this dataset for anyone to perform task finetuning.

Belebele opens up new avenues for evaluating and analyzing the multilingual abilities of language models and NLP systems. Yuhooo!

Full Paper: The Belebele Benchmark

Follow for more things on AI! The Journey — AI By Jasmin Bharadiya

--

--

The Journey
The Journey

Written by The Journey

We welcome you to a new world of AI in the simplest way possible. Enjoy light-hearted and bite-sized AI articles.

No responses yet