Tuesday, December 27, 2011

Newt or Mitt: Predicting the 2012 US Presidential Republican Candidate

Based on earlier work analyzing and visualizing the hidden links in Wikipedia, we have added a new dynamic Wikipedia mapping function to Condor. The new Wiki-Evolution feature collects all links originating and/or pointing to one or more Wikipedia articles over a given time period.
Today I used this feature, to see what the Wikipedians have to say about Mitt Romney and Newt Gingrich. The picture below shows the bidirectional link network of the most important articles about the two contenders for RepublicanUS presidential candidate. The nodes are colored by actuality (the more edits they have in the last 14 days, the darker the red. (0 to 20 edits is grey, 21 to 50 edits is orange, over 50 edits is dark red).

As we can see, the articles about the two candidates have attracted the most edits, but also the rock group LMFAO (whose singer SkyBlu was involved in an altercation in a plane with Romney in 2010), Pope John Paul II (who shares the distinction with Gingrich of having been chosen by Time Magazine as a Person of the Year) have had lots of edits. Among the other presidential contenders, only Herman Cain has attracted a similar number of recent edits.
The video below shows 4 months, August 2011 to December 2011, in the candidate's life reflected through Wikipedia. The pages linked to Mitt are in blue, the pages linked to Newt are in green.



The timeline comes out marvelously:
It starts with the Ames Straw Poll (August 13) where both candidates did very poorly. It then shows Mitt Romney embedded into the network of other candidates, with an explosion of activity around September 18 (his shuffle with the surging Rick Perry candidacy). Discussions about Bain Capital and his membership in the Mormon church are continuous editing topics. October 8 the Newt Gingrich candidacy takes off, slowly at first, but then really exploding October 24th, when he and Herman Cain, the favorite at that time, accepted a Tea Party sponsored debate. It subsequently illustrates the fight between Mitt and Newt, taking punches at each other, ending with Gingrich's link to the National Review, and Tim Pawlenty's support of Romney's campaign.

Finally, I was also curious whom the Wikipedians, as of December 27, would vote for. The picture below shows the betweenness of the two candidate pages in the Wikipedia link network.

While the race is close, Mitt (betweenness centrality of 0.54) is slightly ahead of Newt (betweenness centrality of 0.53). Let's wait and see how well the swarm predicts!

Sunday, November 27, 2011

Will the EURO break up?

With the crisis in the Eurozone approaching its climax, I was curious to read the collective mind. On the Web, in the blogosphere, and on Twitter there is a lot of buzz about Eurozone breakup or survival.

I decided to ask both the swarm (through blogs) and experts (through News Web sites) as well as the crowd (through Twitter), using our Condor coolhunting tool.
It turns out, swarm and experts think the Euro will survive intact - albeit by quite a slim margin:

The picture above shows the Blog/Web site network, with the two search terms weighted by the importance (betweenness centrality) of the bloggers and Web sites. The bloggers/Web sites vote 52% for Eurozone survival, and 48% for Eurozone breakup.

The crowd, measured through the tweeters , believes the opposite. The picture below shows the snapshot of today (11/27/2011) of the retweets about “euro survive” and “euro breakup”.

The crowd on Twitter votes only 33% for Eurozone survival, with a decisive 67% of the vote for Eurozone breakup.

The question now is: whom to trust? The crowd is fickle, and the wisdom of crowds easily flips to madness, while the swarm usually has a much better grasp of what the future might be bringing. So perhaps it’s not as bleak for the EURO, as everybody thinks?

What do the Wikipedians think about the Euro?
As an additional expert opinion, I also checked, using our new Wikimaps tool, what the Wikipedians think about the EURO, exploiting the hidden link structure in Wikipedia. I ranked the links by two different algorithms: (1) by the numbers of links and backlinks, and by (2) actuality, i.e. freshness of the edits.
As the two pictures below show, the link-network looks very different for the two rating-algorithms:


Just looking at the Wikipedia linking structure (top picture) puts the different coins and currencies making up the Euro closest. While the economy of Europe is important for both networks, in the actuality picture (bottom network) the economy of Greece and Portugal, Frankfurt, the European Central Bank, the International Monetary Fund, and Angela Merkel suddenly become key players.

Sunday, November 20, 2011

Occupy Wallstreet battling TeaParty – Divided they tweet!

Today (11/20/11) I ran a Condor twitter analysis for #ows (the Occupy Wallstreet Twitter tag) and #teaparty (the Tea Party Twitter tag), trying to predict public sentiment for these two social movements.
I only collected retweets, and constructed the retweet-network, measuring the importance of people retweeting based on their social network position. The picture below shows the resulting network, each dot is a twitterer, each line is one or more retweets. Surprisingly we get three clear clusters, a Occupy Wallstreet cluster (blue, at the bottom), a Tea Party cluster (yellow, in the center) and a mixed cluster at the top. Red dots are people tweeting both about #ows and #teaparty.

A closer look at these three clusters tells us that the blue cluster is Occupy Wallstreet sympathizers talking about issues near and dear to them, the yellow cluster is Tea-Party sympathizers doing the same about their cause, while the mixed cluster at the top consists of Occupy Wallstreet sympathizers badmouthing the Tea Party, and Tea Party sympathizers lambasting Occupy Wallstreet and Barack Obama.

Aggregating the network, and weighing the tweet of each twitterer with her/his social network position, lead to 55% of weighted votes for Occupy Wallstreet, and 45% for the TeaParty. The results are clear: Occupy wall street sympathizers carry more weight in the Twittersphere than Tea Party members – the question of course remains how representative this is for the rest of the American population.

I then also checked positivity and negativity of tweets. Again I was in for a surprise. Usually human beings are optimists, and positivity is much larger than negativity. But not so here, for both Tea Party and Occupy Wallstreet tweets negativity was about two times bigger than positivity. In an additional twist, the (mostly negative) tweets about the Tea Party were more positive than the tweets about Occupy Wallstreet (see picture below).

The first conclusion of this chart is the general unhappiness with the current political situation. While both Tea Party and Occupy Wallstreet sympathizers are very unhappy, Tea Party twitterers are slightly happier, although they seem to carry less political weight.
At last I looked at what the key issues of the Occupy Wallstreet discussion today were, collecting the most recent blog posts with Condor (see semantic network picture below).

While the Tea Party members rejoice about the booing of Michelle Obama and Joe Biden at the Nascar race in Florida, the Occupy Wallstreet sympathizers lambast Mayor Bloomberg for his lifestyle and the closing of Zuccotti Park. Religion is quite central - as expected - for the Tea Party sympathizers, while a large part of the discussion is focussed on the presidential candidates.

Tuesday, November 15, 2011

What creative swarms can learn from the bees

Last Friday night I had a great discussion with Billie Bivins, host of the show "Make Art...Feel Better" at the Belmont Media Center about creative swarming and the bees. She even got me to cobble together my own bee. Here is the link to the resulting video. Very cool.

Tuesday, July 19, 2011

Wikimaps Revised

The first version of the Wikimaps Page (http://www.ickn.org/wikimaps/) that we published a couple of weeks ago helped to visualize the basic idea of Wikimaps. It consists of an interactive animation that allows visitors to visually track the changes in Wikipedia articles over a given time period. Real world activities and events are reflected in updates of the respective articles and the links between them.

Rise and Fall (of Swiss Tennis Star Hingis) on Wikimaps
A good example is the retirement of the Swiss tennis player Martina Hingis. While her page (node) is still well connected to the network in 2008 (roughly a year after she retired), the page is not listed in the network anymore after February 2010. It is important to note that this view of the network is filtered and only displays nodes that “survive the cut”: The page of the former number one ranked player is still there and has many links pointing to it, just not enough to appear in this filtered “most important pages” view.

Another example is the case of the former president of the International Monetary Found, Dominique Strauss-Kahn. We tracked changes in related pages for a time span of approximately 8 months and built a network with weekly snapshots. On May 14th 2011, Strauss-Kahn was arrested in New York City, this event lead to an spike in the activities in the network surrounding the page of Strauss-Kahn. Interestingly enough the increased activities that lead to this spike were not solely based on pages directly related to the arrest. The attention lead to a general increase of activities on related pages.

Watch a video of the changes in the Dominique Strass-Kahn graph:


The following graph shows the spike in activities in the graph around the 14th of May.

(Activity in a network is defined as the sum of additions and deletions of nodes within a given time frame)

Although we think this first visualization is already pretty cool, the results did not really surprise anyone. The data that was initially used was very static. We simply picked (seemingly related) categories and selected the pages that had the highest indegree values. Pages that would be “close” or relevant but not members of the selected categories would never show up in the graph.

To mitigate the shortcomings of this approach we decided to change our approach for the collection of the pages that would be considered candidates for the graph. The most promising idea was and still is, a combination of weighted components, possibly applied in multiple iterations. Or as we call it, a Filtered Breadth First Search.

Effective Filtering is Key
One of the challenges of working with the Wikipedia graph is the size of it. An optimal algorithm should therefore handle the trade-off between maintaining a small sub-graph while still returning meaningful results. A naively executed BFS would quickly lead to an explosion of articles that would have to be considered. To prevent this we only follow edges (links) that are considered interesting or relevant. The decision whether to follow a link during the execution of the search is based on a weighted mix of the following metrics:
● Local Indegree
● Global Indegree
● Number of recent page edits
● Reciprocal Links to source page
● Shortest Path Distance to Source Page
● Wikipedia Full-text search results

Naive Degree-Based Filtering leads to “boring” results
It would be a lot easier to simply include pages based on a single metric, namely the one that is the least expensive and seemingly a very meaningful one: The (local) Indegree, the number of pages that link to a certain page. The problem is, that this metric strongly favors so called hub-pages, these are pages that are linked to a lot altough they are semantically not directly related. Typical examples are pages for certain dates or countries. There were ideas to filter these pages using blacklists or to work with an Indegree-Band (as opposed to a lower limit).

These two ideas to however turned out to be very tedious and error-prone. We further believe that the most relevant results can only be found by a cleverly tuned combination of many factors.

Outlook
There is another network on wikipedia besides the one that based on articles and links. It’s the network of the Wikipedia authors and their collaborations. We anticipate that the incorporation of these informations will additionally improve the relevance of the nodes in a Wikimap network. Read this previous blog post for an explanation of the basic idea.

posted by Reto Kleeb

Thursday, June 30, 2011

Wikimaps: Dynamic Maps of Knowledge

Wikipedia does not only provide the digital world with a vast amount of high quality information, it also opens up new opportunities to investigate the processes that lie behind the creation of the content as well as the relations between knowledge domains.

In their daily work Wikipedia editors make sure to keep articles updated: Natural disasters, shiny new pop icons and scandals are reflected in new articles or in links between them. But how do these pages and their links evolve over time? Can we visually track how ties between subject-areas grow stronger, is there a way to notice that an article becomes more influential?

Our first attempt to come up with an answer to these questions was the development of a visualization that renders pages as nodes of a graph. If there is a link between two pages, the corresponding links are represented as an edge. Each graph represents a snapshot of the articles at a specific date, the slider and the video controls on the left allow you to navigate back and forth in time.

http://www.ickn.org/wikimaps/
Try it out: Scroll to zoom in and out, use the video controls to start and pause the animation or drag to slider to any point in time.

Selection of the Nodes
There are currently 3,6 Million articles in the English Wikipedia and displaying nodes for all of them at the same time does barely make sense. For our first prototype we decided to display a subset of the 50 most important nodes out of a given data-set.

How do we define importance? We decided to select the top nodes by using their indegree value - the number of links that point to an article, a trivial way to measure basic influence and relevance. The data-sets that are used, are based on related categories on Wikipedia e.g. to look at modern Musical groups we look at all the members of the categories “Musical groups established in 1990”, “Musical groups established in 1991” and so forth.

Collecting the necessary data is a time consuming process. The usual approach for doing network analysis on Wikipedia is to use complete database dumps that are provided by the Wikipedia foundation. The problem with these dumps is that they are either very large (complete dump that contains all historical data: 5 TB) or do not provide a high enough date resolution to accurately track the development of current events. To get around these issues we developed a data fetcher that uses the HTTP API. It continuously collects and stores the minimal amount of information that we need to build link-networks for a selected list of articles with the desired date resolution.

Future Work
Looking at the changes in the graph over time, it becomes clear that the simple indegree criterion does suffer from some shortcomings. It does not work to discover (fast) rising subjects. Or speaking figuratively: Despite the attention they currently receive, Lady Gaga and Justin Bieber do not stand a chance against Madonna or Eric Clapton. While one might claim that this situation is perfectly justified and reflects their artistic contributions, it would still be interesting to develop a set of metrics to select and rank nodes based on short term spikes in interest or relevance.
posted by Reto Kleeb

Sunday, April 17, 2011

The US – a Loophole Society – or a Society of Trust?

My immersion into the loophole society concept took place in 2007 when I was bringing used computers to Ghana, to be donated to schools. While the total value of the computers was about $1200, getting them through Ghanaian customs took two weeks and cost me another $1200. I had to hire an agent, who was a relative of the headmaster at the receiving school, who expected to be paid $200 to shepherd me through the myriad customs clearance offices. This customs process, designed to plug customs loopholes for importers, doubled the costs of the goods. However when I had delivered the computers I found out that I could have bought the same computers for about $1200 on the public Makola market in Accra – so it seems clever people always find ways to exploit the loopholes.

It is my perception that the loophole society concept is not restricted to African countries. Even the US has become more and more a society where people exploiting loopholes are rewarded and admired. Last week we learned that, by clever exploitation of tax loopholes, GE had 10.8 billion of profits, but a tax bill of $0 for 2009. The loophole phenomenon however is by no means restricted to big companies, but trickles down to individuals looking for loopholes to get a little break in dealing with others.

For me, the culture of loopholes, as compared to a culture of trust, is based on small worlds, or more precisely, the lack of small worlds. In a society with a small world structure where everybody knows everybody, loopholes have little chance. Exploiting loopholes is replaced by a culture of trust. The smaller the “world”, the more people value their reputation and their social capital and therefore don’t dare exploiting loopholes.

I learned about the differences between “small worlds” – engendering trust, and the “big world” encouraging exploitation of loopholes recently when I was attending a meeting of the Swiss-American chamber of commerce. A frustrated Swiss businessman – coming from a very small world – bitterly complained about the 500 page contract that the lawyers of his US business partners wanted him to sign. As he said, in Switzerland business contracts are still one or two pages, containing the key points of the business deals, and not 500 pages of provisions trying to plug every possible loophole. Because, as he said, if something goes wrong, instead of trying to resolve the issue, lawyers from both parties will start pouring over the 500 pages, and try to find the loopholes in their favor. This is great news for the lawyers, as it keeps them happily employed. It is not so great news for the Swiss business owner, because he will have to spend most of his profits, and then some, for the fees of his American lawyers.

Doing sponsored research in both the US and Switzerland gives another opportunity to compare the loophole society with the trust society. Research dollars spent at a top US university carry an overhead of 70%. This compares to an overhead rate of 15% in Switzerland. This means, that out of every US research dollar, 70 cents are spent on internal university administration, whose main task it is to make sure that the other 30 cents are not squandered. Compare this to the overhead at the Swiss university, where 15 cents on every Swiss Franc are spent on oversight and administration, and the remaining 85 cents on the researchers.

While the last two examples are somewhat oversimplified, they nevertheless illustrate a larger trend. The point really is that we should be moving towards a society of trust, and not a society of exploiting loopholes. This means that we should try to create localized small worlds based on self-organization and trust, where individuals are trusted to do the “right thing”, but are also held accountable for their own actions.

Saturday, April 16, 2011

Might growing health care costs be a good thing?

Everybody is complaining about the ever-rising costs of health care. But could it be that this is actually a good thing, because it means we can afford to spend an ever-rising share of our dispensable income on our health?
While there is undoubtedly some misuse of our healthcare dollars, and money is wasted on unnecessary beauty operations, or even worse, on lawyers filing malpractice suits, I think that the overall fraction of dispensable income a society can afford to spend on healthcare is a good benchmark for gross national happiness.
There are many variables influencing happiness, such as income, being married, and age, but being in good health has been found to be one of the most reliable predictors of happiness, as has been shown by many researchers. Countries which are able to spend a large amount of their income on healthcare should therefore be happier.

Does national happiness and healthcare spending indeed correlate? Because I could not find statistics, I did a quick calculation myself. I looked up mean health care spending per head in PPPS (purchasing parity adjusted dollars) of the OECD countries in 2001. I then compared these numbers to the gross national happiness index as listed on the World Database of Happiness. As a control variable in my model I took country size, looking up the population numbers on Wikipedia. Below are the actual numbers, showing that the US and Switzerland are the record spenders on healthcare per head, but are also fairly happy, although small countries like Denmark, Iceland, and Luxembourg are even happier, while spending less money on healthcare.

When I did a linear multivariate regression with these numbers, using health spending per head and country size as independent variables, and happiness as the dependent variable, I found an adjusted R squared of 0.58, with standardized significant coefficients of 0.83** for health spending per head, and -0.38** for population size. To put this in simple language, this means that 58% of the happiness of a country is explained by the health care spending and the country's size. The more a country spends on individual health care, and the smaller the country is, the happier its inhabitants are.

What’s the conclusion for the US? Well, this means investing money in health care actually might not be such a bad thing, but please, allow for local autonomy, giving subgroups of the population a say on how the money is being spent.

Saturday, March 05, 2011

Prediction Market predicted Oscars correctly 11 out of 12 times

I just stumbled on this interesting Blog post which compared the predictive quality of the Intrade prediction market to correctly predict this and last year's Oscars. It seems the market picked the winner correctly 11 out of 12 times.
Also interesting is the comment by BarTaxCa on the post, noting that depending on which prediction market one picks (HSX, Intrade, Inkling market) prediction differs. So it seems there is still a role to play for analyzing the wisdom of swarms through their Web buzz on IMDB and Rottentomatoes. In fact, what we found is that throwing the two together (prediction market + Web buzz) leads to the best results.

Monday, January 17, 2011

Facebook Pages, and why we know that you probably like Lady Gaga.

The Idea
Ever since Facebook rolled out pages in 2007, it has become very easy for users to show their interest in music, film, books, artists and other entities in various categories by clicking the "like" button on a specific facebook page. Most of the time, the information about your personal “likes” is not protected automatically and therefore can therefore accessed by everyone, even if not logged in.
We know that Mark Zuckerberg likes the Yankees and is a fan of Jay-Z, but that might just be of interest to his friends or People magazine. But there is much more information that we can infer from the social graph. Can Barack Obama know about the preferred beverages or favorite books of his fans? He can! ...but he probably doesn' t care. With the information provided by Facebook’s social graph it is easy to identify connections between books, films or brands - without conducting a survey.
The Data
Building a network by linking two pages, depending on the frequency of their occurrence on the same user profile produces graphs like the following.










The fact that people are providing this rich information creates different opportunities for analysis. Surely Facebook is already taking advantage of their data, but in social science and marketing user behaviour could be analysed. Certainly the advertising industry could benefit from, and would pay money for, such demographic information.
Demo Prototype
This web application illustrates a potential use of the data, which is based on 20 000 public Facebook profiles from different countries. An underlying bipartite “user to page” relation is used as a data source.



You can navigate through the TagCloud by clicking on a random entity. Different colors indicate categories (film, books, music, interests, other). The average of other pages listed in categories for the current page can be seen in the middle graph. The last graph shows the relative percentage of users liking this page in different countries.
It gives you a broad idea of the structure, though the current data is not representative of all Facebook users as the data was crawled from just 8 countries.
The key findings from this visualization:
  • The most popular pages are so commonly liked that they do not give a strong indication of individual personalities. E.g. Lady Gaga, Michael Jackson, Barack Obama.
  • Clicking through less popular pages reveals the “long tail” of Facebook pages with interesting cliques, and connections between them. E.g. Conservatives in the U.S. or Movie Fans.
  • Brand awareness and popularity in specific countries, e.g. Nutella in Italy can be observed.
So, if you want to stand out among those 500 million Facebook users, just don't like Lady Gaga, Michael Jackson or Barack Obama.

Facebook Pages - Categories

The chart below shows how categories of Facebook pages are used in different countries.
On average, users from Great Britain and the United States list twice as many pages in their profiles than users from Brazil. Furthermore, differences in certain categories can be identified. Listing books or activities seems to be very unpopular, in contrast to pages in the music or TV categories.

Tuesday, January 04, 2011

To Be a Better Manager Means Not to Be a Manager!

I think that time has come to fundamentally rethink the way we train and reward managers. While social entrepreneurship has become a popular buzzword at management schools, and Andrew Cuomo, the new governor of the state of New York asked all his senior staff to take an ethics class in the first sixty days of his tenure, this is still just lip service. My proposal is far more radical:
Make managers redundant!

Let me explain what I mean.

4 Motivational Phenotypes of Knowledge Workers
When trying to understand the behavior and motivation of knowledge workers, it helps to group them into four phenotypes. These four types of knowledge workers, vastly differing in skill set and motivation, are: (1) the artists, (2) the scientists, (3) the teachers, and (4) the managers.

Artists want to create something new and beautiful, to touch the lives of people interacting with their art. Whether it is painters, sculptors, actors, singers, or orchestra musicians, they do what they do mostly not because they are paid to do it, but because they love what they do.

Scientists want to discover something new, to further the state of the art in their chosen field of science. Whether it is pure science like physics or astronomy, or applied science like medicine or engineering, their goal is to create something new by taking what is there, and combining it in new, innovative ways.

Teachers want to impart knowledge to their students. They want their pupils to understand, to become lifelong learners, and to be self-sustaining members of society. The creativity of teachers consists of developing new ways and methods of conveying and transferring knowledge.

Managers want to increase the success of the organization they are leading. Their creativity consists of taking the output of scientists, artists, and teachers to make the organizations they lead succeed. The main motivation of managers, as stipulated by proponents of the free market theory, is to increase the revenue of the organization they are leading, and thus also their own paycheck.

While artists and scientists want to create something radically new, either a new piece of art, or a new scientific insight, managers and teachers are mostly executors. Most of the time they do not really excel in creating new things, but in executing project plans, or executing curricula. Our education system rewards teachers to produce managers, not artists and scientists.

Income is negatively correlated to intrinsic motivation
Artists do what they do because they love it. They are the most intrinsically motivated of the four phenotypes – followed by the scientists and the teachers, who are scientists and teachers because that’s what they like, and not to get rich quick. This is very different for the managers, who most of the time chose their profession to be successful. They expect their success to be rewarded by fat paychecks and high status in society. The income of artists, on the other hand, shows a definitively long-tail distribution, meaning that there are very few Picassos and Brad Pitts getting rich and famous. Rather, the vast majority of artists can expect to make very little money over the course of their careers. Salaries of scientists and teachers show a similar distribution with most of them living off quite modest salaries. Income distribution of managers, on the other hand, shows a fat tail, meaning that many can expect to make a substantial income, and a still sizeable number can expect to make a lot of money. The most popular way for scientists, artists, and teachers to increase the size of their salaries is to accept “managerial” roles.

The key difference therefore between managers and the three other phenotypes is that artists, scientists, and teachers are intrinsically motivated, while managers are motivated extrinsically.

On a side note I would like to emphasize however that this discussion is about phenotypes. This means that this distinction into four categories is about oversimplified role types. Artists, scientists, and teachers don’t mind getting rich and famous, and managers might genuinely want their company to succeed in making the world a better place. Reality is never black or white, but rather somewhere in the middle, and most managers also have traits of an artist, scientist, or teacher, and the other way round.

So what are my recommendations for a manager?
The answer is very short: Forget about being a manager!

Trust your emotions. Become an artist, teacher, and scientist. Discover the joys of creating something new, of coaching your employees and help them grow. This will help you start doing what you love, and not what you are paid to, becoming intrinsically motivated in your job. This will also make you much happier. In short, become a coolfarmer!