Wikipedia biographies favor men
A few notes on the methodology of the biographical gender research:
- MZMcBride pulled a randomized selection of 500 Wikipedia biographies of living persons (BLPs) for Kohs, at his request. McBride used the following code:
- FROM page
- JOIN categorylinks
- ON cl_from = page_id
- WHERE cl_to = 'Living_people'
- AND page_is_redirect = 0
- AND page_namespace = 0
- AND page_random > RAND()
- ORDER BY page_random
- LIMIT 500;
- Kohs then took the first 200 names from the list and pasted them into a spreadsheet. He then marked with an "M" or an "F" all of the unambiguous names that easily identify to a gender -- e.g., "Mike" and "Mohamed" are male, "Carolyn" and "Edith" are female. He left blank all of those that could have any reasonable level of ambiguity... and he was conservative -- he left blank names like "Sandy" and "Casey". Many non-Western names, he had no idea and thus left blank ("Suriya" and "Nyjer", for example).
- Then, Kohs manually looked up on Wikipedia each of the blank entries. All but one were fairly easy to identify their gender -- Kohs noted that the majority (or the near majority) either played rugby or soccer, or were an Olympic athlete.
Studies like these are stuff that any of the 50+ staff members of the Wikimedia Foundation should be turning out on a daily basis, but Kohs does them for free.
Here is the list of 200 names: