Tuesday, March 04, 2014

How to Measure “Influence” In Social Networks

In social networks, new ideas and thoughts can spread quickly. An analysis of this diffusion can answer various questions, such as the important topics or the speed of their propagation.  It is of particular interest to find people who successfully share their own ideas and concepts. These people can influence others to change their behavior and bring in new terms to the communication network.

This post describes my master's thesis in which the goal lies in finding the most influential people in social networks. The thesis has been written in collaboration between the MIT and the University of Applied Sciences Northwestern Switzerland and the results are now implemented as new functionality in Condor 3. Various communication networks were used as test data to validate the use of this new metric as a meaningful measurement of influence.

Defining influence


Different applications use different definitions for the term “influence”. For example the so-called Klout Score calculates influence based on the number of followers, frequency of retweets and some other factors. Unfortunately, the exact calculation is proprietary, and thus cannot be compared with other values.

Despite these various definitions of influence, each is trying to measure whether a person can cause a certain behavior change in their environment. Often, this behavior is directly visible in the communication network, for example in the form of new discussion topics, retweets or changes in the structure of the network.

Observable behavior changes can be found in the language used by people of the communication network, which changes over time. An influential person is able to introduce new ideas, beliefs and behavior patterns. Therefore, “influence” can be defined as the amount of new terms, concepts, and ideas which a person has introduced into the network and which are subsequently used by other members of the network.

This definition of influence requires the analyses of messages to measure their impact on the receiver. If the receiver of a message d writes new messages soon afterwards, he might have been influenced by the message he received. To determine whether or not a message has been influenced by d, three things need to be checked:
  •          Time difference to d
  •          Similarity to d
  •          Did the user’s behavior change in any way?
Influential messages provoke the receiver to send new messages soon afterwards and those messages use some of the same words. In addition, it should be checked what kind of messages the person usually sends. For example, if someone always talks about apple-products and retweets nearly every tweet from apple, then a new apple-related tweet won’t have much influence on this person. It would be much more relevant, if this person were to suddenly retweet Google who talks about a new feature in Android. In this case, the tweet by Google would be influential, as it even managed to get an apple-fan to tweet about the rival.

Test data

The new metric “influence” has been tested with various networks. The primary use-cases are Twitter and email networks. The following examples provide an overview of how the metric can be used to gain new information about a network.

Twitter: Swiss politicians


In Switzerland, approximately one-third of the Parliament has a Twitter account. But only part of those are interactive and involve many other people in the conversation. Others may have a large amount of followers because of their political profile, but are not important in the twitter network. The measurement of influence shows a good overview of who is active in the network and manages to introduce new topics and hashtags in the network. People who are influential in the network might not be the most famous politicians, but they are important in deciding what topics other politicians talk about.

The color indicates the political party and the node size the influence of the politician.

Twitter: BMW

By fetching all tweets about a given brand, it becomes possible to find important thought leaders who talk about the company or the product. For the brand BMW, a search for the most important twitter accounts in a short period of time (one single day in February 2014) has been done. In this time frame the accounts @BMW and @BMW_Espana are very central in their subnetwork. However, the account BMW_Ocean was more influential, as they talked about a new showroom in Plymouth (England) where new BMW cars were presented. This caused a lot of discussion in the network about the showroom and the new models that were on display there. Even though BMW_Ocean is not very central to the network and doesn’t generate a lot of retweets, it was very successful in conveying their message. Only the metric “influence” accurately represents this fact.
The image on the right shows the interesting part of the network, where Ocean_BMW managed to influence others.
Email: COINs Seminar

The course “Collaborative Innovation Networks”, or COINs in short, involves students from five universities: MIT, SCAD, Aalto University, University of Cologne and University of Bamberg, who participated at the same time in the course. Cross-university project teams were created who worked together for the term/semester. A special feature of this course is, that the students use the Condor software to analyze the email communication within their project teams. All messages are cc’d to a dummy email address throughout the course.

For the analysis every member of the project teams of the course in 2013 and 2012 has been asked the following question:

Who in your team had the greatest influence on the result of your project?

In total, 45 answers from 16 project teams with a total of 84 people were obtained. Since the question can be answered very subjective, the answers in most teams are not unanimous. The data can be used as a comparison to the calculated value of the Influence measure, but it must be noted that some uncertainties exist. Nevertheless, evaluating the results of the participants' responses against the calculated Influence scores for each project team does serve as valid quality check.
Node size represents the amount of inluence in the network.
The results have shown a very strong correlation between the given answers and the results from the influence calculations. In 10 of the 16 teams the person who received the highest Influence score also had the most votes. In three other teams the person with the highest Influence score received at least one vote and only three teams showed no positive correlation between the number of votes and the Influence score. However, in one case not all communication was sent to the dummy Gmail address.

Simple network metrics, such as the Betweenness Centrality would not work in this case, as the people in the project team each sent messages to everyone else. This would result in a Betweenness Centrality of exactly 0 for every project member. Calculation of influence takes into account a lot more information and is therefore very accurate in predicting important members of a network of email communication.

Conclusion


The inclusion of text analysis allows important insights into the analysis of social networks. The calculation of the influence of a single message, and its direct impact on a receiver is a useful extension and generalization of existing approaches, which often work only for individual, predefined networks.

The biggest challenge is addressing the variety of individual network properties that need to be taken into account in order to convert the messages into a common schema for efficient analysis. However, this study demonstrates that these challenges can be overcome and it is possible to trace the diffusion of new ideas, words and concepts among users over time based on the content of their digital communication.

A disadvantage of the method is it is not optimized for a particular network, or for a specific language. The Influence metric calculation assumes that people have not used identified keywords in prior communications, but this assumption may not always be true, because of the lack of a sufficient historical data going further backward in time.

However, the selected test cases have demonstrated that a relatively wide range of possible applications can be covered with meaningful accuracy. Compared to the common structural network measures, the new influence content measure has outperformed them in identifying the influential people in a communication network.