Cindy Conaway deposited Creating a Meaningful Genre Schema and Metadata using IMDb data for a Large-Scale Digital Humanities Project in Media Studies on Humanities Commons 2 years, 8 months ago
A long-term DH project examining the social networks of actors/crews across 32,500+ media items, 1938-2017. Primary source is the Internet Movie Database. IMDb is robust and provides free downloadable data, but problematic (Conaway/Shichtman DH2018). “Genres can be approached from the point of view of the industry and its infrastructure . . . aesthetic traditions . . . broader socio-cultural environment . . . audience understanding and response” (Neale). Genre on IMDb uses terms inconsistently. What it calls “genres” actually combines traditional genres, subgenres, and target audiences, allowing multiple selections. IMDb relies heavily on users for its data and much editing. “Although user editing allows a reference website such as IMDb to be up-to-date, it diffuses the responsibility for fact-checking, leading to greater uncertainty about accuracy and objectivity of information” (Wasserman). Other schemas use macro or idiosyncratic descriptors allowing an item to be included in multiple “lists.” Library of Congress uses simply Comedy, Drama, Action, etc. AFI adds “Most Thrilling” (action, horror, adventure). Netflix’s “genres, based on a complicated algorithm that uses reams of data about users’ viewing habits . . . number in the tens of thousands” (Telegraph) including “Family Watch Together TV.” It has taken significant additional research and reorganization to use the data effectively for statistical analysis. While most people can tell a western from science fiction, it’s harder to deal with hybrid genres like dramedies or family movies, or genre combinations like science fiction western or action with romance. Therefore, we created a taxonomy with a variety of categories, including subjects, styles, settings, and audiences, with concise definitions for categories. If other scholars also use this schema, each media item can be described in a way that allows for effective and relatively consistent coding by multiple scholars.