

Overall, I think it is quite interesting to see how the results support the assumption about typical chick flicks and guy movies. This is significantly higher than the 54% no information rate, which is the percentage of the largest class in the training set. When applied to the test set, this model predicts 82% of the categories correctly.
#Best chick flick movies 2017 movie#
However, if ‘ love’ occurs no more than 11 times in the subtitles and ‘ hell’ at least once, the movie is probably produced for a male audience. The following tree is based on a randomly selected training set.Īccording to this model, if the word stem ‘ love’ appears more than 11 times in the subtitles and ‘ gun’ is only used once, the movie is most likely targeted at women. The target category is either chick flick or guy movie. In this case, the input variables for each movie are the numbers of occurrences of each word stem. The goal of this method is to derive a set of rules from input variables that can predict which class an item belongs to. As this is a traditional text classification problem, a wide variety of machine learning algorithms exist, such as support vector machines, naive Bayes classifiers, or boosting classifiers, which could be applied.Įven though some of these algorithms would probably be more accurate and robust for this task, I used a classification tree model (‘ rpart’, R-package) because it produces results that are easy to understand and apply. Furthermore, these findings suggest that we can use subtitles to determine whether a movie is more likely to target a male or female audience. These results show that the use of certain words differs significantly between the two categories. A quick sentiment analysis performed through the ‘ sentimentr’ R-package confirms that movies targeted at a female audience are slightly more positive than movies produced for a male audience. The more frequent use of profane language in guy movies and the higher use of positive words in chick flicks is in line with the assumption that women prefer relationship movies while men are more interested in thriller and action genres. To remove movie specific words, such as character names, I dropped all words from the sample that were not in at least three different movie subtitles. Justice League: Throne of Atlantis (5.6%)įirst, I used the ‘ wordcloud’ package in R to plot a comparison word cloud, which highlights the words that were heavily used in one of the two categories. The following table shows the top ten movies for each category: # Considering the rather vague definition of these two terms, I believe that this approach is sufficient for this context.

Next, I removed movies with less than 1,000 total votes to ensure a fair comparison. In this sample, women make up on average about one-quarter of all gender votes.Ģ5% of the movies with the lowest share of female votes were defined as guy movies while the top 25% were defined as chick flicks. To determine whether a film is targeted at a male or female audience, I used the proportion of women votes on the IMDb score. I downloaded the subtitles of the top 1,000 Amazon Video bestsellers (7th March, 2017) and gathered additional information from IMDb. Since subtitles are protected by copyright, please understand that I cannot share the files that I used in this project. At the end of this post, you will find a brief description of how subtitles can be downloaded from the Amazon website. As obtaining the original scripts is rather difficult, I decided to analyze the movies’ subtitles instead, which were available from Amazon Video. For example, a study found that female viewers have a stronger preference for movies with happier themes than their male counterparts ( Banerjee et al., 2008).įor this post, my goal is to investigate how chick flicks, and guy movies differ regarding the words spoken. The colloquial term chick flick refers to movies targeted to a female audience whereas guy movies are mainly aimed at male viewers. Romantic comedy or action movie? While this cliché certainly oversimplifies the differences in movie preferences between men and women, it is common knowledge that certain movie genres primarily target one gender. Menu Analyzing Subtitles - Chick Flick or Guy Movie?
