Facebook Launches Casual Conversations v2, a More Inclusive Dataset for Measuring Fairness in AI Models
Facebook has recently announced the release of Casual Conversations v2, a new and improved dataset that aims to measure fairness and inclusivity in AI models. This consent-driven dataset is a more inclusive and diverse version of its predecessor, which was launched two years ago. With 26,467 video monologues recorded in seven countries, the dataset features 11 self-provided and annotated categories to better evaluate the fairness and robustness of certain types of AI models.
One of the main challenges in evaluating fairness in AI models is the lack of diverse and inclusive datasets. With the help of Casual Conversations v2, researchers can now assess how well AI models work for different demographic groups, particularly in applications of computer vision and speech recognition. The dataset is the first open-source dataset with videos collected from multiple countries using highly accurate and detailed demographic information to help test AI models for fairness and robustness.
Casual Conversations v2 is a significant improvement from its predecessor as it includes more granular and self-provided categories, such as age, gender, language/dialect, geolocation, disability, physical adornments, and physical attributes. The remaining categories, voice timbre, apparent skin tones, recording setup, and activity, were labeled by annotators with detailed guidelines to enhance consistency and reduce the likelihood of subjective annotations during the labeling process.
What sets Casual Conversations v2 apart is its inclusivity and diversity. With self-provided categories, participants can input their information in their preferred language, making the dataset more accessible to non-English speakers. The dataset also offers a more comprehensive literature review around relevant demographic categories, and was created in consultation with internal experts in fields such as civil rights.
Casual Conversations v2 is publicly available to aid researchers in their efforts to measure fairness and support robustness. By leveraging this dataset, researchers can investigate whether a speech recognition system is working consistently across a variety of demographic characteristics and environments. This dataset is a significant step towards building more inclusive and fair AI models that serve communities more effectively.
References:
https://ai.facebook.com/blog/casual-conversations-v2-dataset-measure-fairness/
Comments
Post a Comment