25 March 2014

Telling gender from Facebook posts

Researchers have found that it is possible to predict gender and age purely from Facebook status messages. A September 2013 study published in research journal Plos One says that there are clear differences in the way males and females use language in social media. Age and location can also affect the choice of words, researchers said in the report, titled Personality, Gender, and Age in the Language of Social Media: The Open-Vocabulary Approach.

Singapore bank DBS' lucky red packets
Females are more likely to describe emotions with words like "love", "wonderful", "excited", and use "I" while males are more likely to swear, talk about electronic games and sports, said researchers H. Andrew Schwartz, Johannes Eichstaedt, Margaret Kern, Lukasz Dziurzynski, Stephanie Ramones, Megha Agrawal, Achal Shah, Michael Kosinski,  David Stillwell, Martin Seligman, and Lyle Ungar, all from the University of Pennsylvania except Kosinski, who is from the University of Cambridge. 
 
Word clouds created by researchers show that feminine status messages are about being excited about tomorrow, how cute puppies and babies are, feelings about the other half, or concerning family members. The masculine ones are about battles, football, Xbox, and the government. 

Areas to be explored further include investigating whether emotionally stable individuals indeed spend time writing about enjoyable social activities that may foster greater emotional stability, such as ‘sports’, ‘vacation’, ‘beach’, team’, or if introverts are actually interested in Japanese media as indicated by the research, ‘anime’, ‘manga’, ‘japanese’, and Japanese style emoticons: ˆ_ˆ predominating. 

The researchers used approximately 19 million Facebook status updates written by 136,000 participants that had been shared as part of  voluntary permission given through the My Personality application. The actual users studied had to use English as a primary language, have written at least 1,000 words in their status updates, and under 65 years old. This resulted in 74,941 volunteers, and 309 million words across 15.4 million status updates studied. 
  
The full study can be viewed here.