The major goal of the project is to discover how natural language discourse reflects social dynamics in English, Arabic, Chinese, and other languages. We develop computational models in our analyses of a large and diverse collection of documents from these languages and associated cultures (such as political speeches, letters, emails, chat, tweets). Our expectation is that these computer analyses of language/discourse can predict socially significant states, such as leadership, status, familiarity of group members, personality, social cohesion, deception, and social disequilibrium. This research is expected not only to advance the social sciences but also to address key national security questions that require the processing of large amounts of textual communication.
The central question is how language/discourse patterns are diagnostic of socially significant states and whether such patterns can predict such states ahead of time. The patterns are manifested in words, sentence syntax, semantics, speech act categories, cohesion, and discourse genre (e.g., narrative, informational text). Our project has uncovered interesting patterns for diverse samples of documents in different languages and cultures, but this summary will focus on the recent political crises in the Middle East and North Africa. We have conducted computer analyses on political speeches and tweets in both Arabic and English translations. The computer systems have included the Linguistic Inquiry and Word Count (LIWC, Pennebaker, Booth, & Francis, 2007), Coh-Metrix (Graesser & McNamara, 2011; Graesser, McNamara, & Kulikowich, 2011), speech act classifiers (Shala, Rus, & Graesser, 2010), a presupposition detector currently in development, and a host of other automated tools developed by researchers in the social sciences and computational linguistics in Texas, Cornell, and Memphis.
Our methodologies involve semi-automated document analysis, combined with experimental techniques. The main group of documents being analyzed in the project include 89 political speeches of leaders of 7 Arabic speaking countries: Mubarak (Egypt), Gaddafi (Libya), Ben Ali (Tunisia), Saleh (Yemen), Basharal-Assad (Syria), King Mohammed VI (Morocco), and King Abdullah II (Jordan). We are focusing on the speeches within a month or so before or after December, 2010, which is designated as the date when the crisis reached a peak of international attention. Tweets are also available, both in Arabic and English translations. The documents in English have been run with Coh-Metrix and LIWC, whereas cohesion analyses are being conducted on Arabic. These speeches that occur near the December crises are being compared to speeches earlier in the leaders’ reigns, which are between 6 and 42 years. We are also performing more fine-grained analyses of the speeches over time before and after the downfall of the leaders, or major episodes of social discord. Are there language/discourse patterns that can diagnostically predict social disequilibrium in a country? It should be noted that z-score norms have been computed on a number of measures and principal components of CohMetrix (Graesser, McNamara, & Kulikowich, 2011) and LIWC, based on 37,520 texts that are representative of what a typical adult English speaker would have been exposed to.
One set of analyses on Arabic speaking leaders was an attempt to confirm some findings from our analyses of the speeches of Mao Zedong of China. The language/discourse patterns were very different in historically good times (China’s economy was good) versus bad times (war and civil strife). When times were good, Mao’s speeches showed Coh-Metrix z-scores with relatively high narrativity (stories), low cohesion, and simple syntax; LIWC principal components showed high conversational interaction and narrative presence, with fewer negative emotions. When times were bad, the z-scores of Mao were entirely the opposite (in z-score signs). We therefore analyzed the texts of the Arabic leaders of the December 2010 crises to see whether their scores matched the profile of bad times. Except for cohesion, the profile of mean z-scores matched the predictions of bad times: narrativity (-.52), cohesion (-.16), syntactic simplicity (-.71), conversational interaction (-.48), narrative presence (-.34), and negative emotions (.21). This was a very encouraging confirmation of the findings for Mao. When there is war and civil strife, there tends to be a deviation from speeches with stories, conversation, simple syntax, and a more positive emotional slant. However, to further substantiate this more rigorously we are currently analyzing previous speeches of these leaders at relatively good historical times.
We recently completed documentation and verificational studies for the first complete French version of LIWC (Piolet, Booth, Chung, Davids & Pennebaker, 2011). We have also completed the Chinese LIWC, as well as the Russian LIWC (Kailer & Chung, 2011), and the Arabic LIWC (Hayeri, Chung, Booth, & Pennebaker, 2010).
We have further developed a program that tracks natural language in small online working groups in the classroom or laboratory and assesses the group dynamics. While it usually impossible to access the online chats of high value terrorist groups, the findings here can be useful to better understand emerging leadership and the group dynamics of extremist groups that post online and that may or may not engage in violent behaviors. Experimental subjects participated in two counterbalanced 20-minute tasks for which they had to collectively generate a meaningful solution to a complex visual task. Preliminary results showed that groups that were matched on personality had different language and communication patterns than groups with randomly assigned members. Specifically, we found that similarly matched groups used more first person plural pronouns (e.g., we, us, our) than did non-matched groups. Secondly, we found that the linguistic profile for each of the personality dimensions was expressed more strongly when communicating with members with more similar personality profiles than with randomly assigned group members. These results point to implicit processes in creating a sense of in-groupness, and the ability of function word analysis to detect them. The results are promising since until recently, it has been almost impossible to efficiently record, monitor, and assess ongoing communication patterns in order to identify emerging leadership and group dynamics.
Anticipated Outcomes of Research:
We are developing diagnostic detectors of social disequilibrium in a culture based on the political speeches of the leaders. Trouble would be detected when the speeches deviate from an oral linguistic style with a positive emotional stance. A quantitative metric is being developed to produce a single metric of social disequilibrium (0 to 1) from the Coh-Metrix and LIWC indices. Such an index can track political leaders over time and possibly predict crisis points.
Potential Impact on DoD Capabilities and Broader Implications for National Defense:
If the style of the speeches of political leaders, and the types of language used in social media, are both diagnostic signals of social disequilibrium, then our metrics could be used to detect critical periods of change in regions of conflict, and to identify critical changes in online groups that have been identified to be of strategic interest.