Real time search of all social media

Last Updated: Thu, Oct 25, 2012 20:40 hrs

As Americans await the results of the presidential election on November 6, two guys from Delhi could get a jump on that information: Vipul Ved Prakash and Rishab Aiyer Ghosh, co-founders of Topsy, a San Francisco-based company that tracks and analyses millions of posts on social media.

Topsy shot into the limelight when the social media platform Twitter unveiled its political index in August, to track discussion about the election and the two main candidates, US President Barack Obama and his Republican rival Mitt Romney. The Twitter Political Index, or Twindex, was built in partnership with Topsy. In an interview, Ghosh, who is also Chief Scientist at Topsy, explained how Twindex worked: “There are between a hundred thousand and a million tweets about each candidate each day, by thousands and thousands of people. And we are actually scanning every single tweet in English, and analysing and indexing the sentiment for all the terms in those tweets. That includes Obama and Romney and that’s the content that’s used by Twitter for the political index.” Topsy says its sentiment analysis of tweets about the candidates closely resembled polling firm Gallup’s results.

Ghosh and Ved Prakash, who were friends growing up in Delhi, founded Topsy with two others five years ago as a platform for indexing and searching social media. Ghosh said, “Our understanding was that the web has become much more personal and most new content is being shared in public conversations rather than being distributed through publication on websites. Existing web search technologies did not really provide access to all this new content.”

According to Ghosh, Topsy now has the world’s biggest public index of social data, “bigger than Google’s index, bigger than Bing’s and Twitter’s own index of tweets”. It includes all social media posts for the past five years, and every single post is accessible and indexable. On a typical day, Topsy’s servers will analyse and index over 400 million posts. During events that draw a lot of public attention, like a recent presidential debate, this traffic can be as high as five million tweets in an hour about the two candidates.

Topsy’s data are valuable because they also track sentiment, not just frequency of mentions or exposure. Ghosh says it’s not a replacement for an opinion poll, but points out the advantages of Topsy’s analytics: “It’s real time and it’s much much bigger in terms of sample size because you’re getting the sentiment aggregated for hundreds of thousands if not millions of data points and you get it within seconds. Whereas with an opinion poll you can phrase your questions carefully but you can only ask a few hundred people and it takes you days to get the results.”

The launch of Apple’s iPhone 4S last year demonstrated the accuracy of Topsy’s data. The early response from reviewers and the media was negative. Topsy had been contracted by a publication to analyse sentiment about the product, and based on its analysis of thousands of tweets, it reported the model had actually proved popular with consumers. The publication didn’t use the data, says Ghosh. “They were doing a story on how everyone is negative about the iPhone 4S and when they saw our data they were like, Oh, we won’t use this because it doesn’t seem right’,” he recalls. Ten days later, Apple announced sales of the iPhone 4S had exceeded all other iPhone models.

Consumer brands tracking buyer interest in their brands and products, and media organisations following popular trends, are the main customers of Topsy, according to Ghosh, followed by the financial and political sectors. The company launched its first search product tied to Twitter in 2009.

Topsy also offered an Application Programming Interface, or API, which was used by many of its commercial clients. A couple of months ago, Topsy launched an analytics product accessible through a user interface.

The company has raised $30 million through two rounds of venture capital funding, and currently has 45 employees, most of them engineers.

Ghosh says they don’t have any paid subscribers from India yet, but there are many who use the consumer site as well as the API site. He sees potential in India, though.

“Twitter is actually very popular in India and a lot of people talk about political events, brands, activities and things like that. Our sentiment analysis only works in English, and since most tweets and social posts in India are actually in English, that means that our product is actually very applicable to India,” says Ghosh.

Topsy expects to index and analyse roughly 250 billion items by the end of the year, with over 16 trillion pre-computed metrics. Ghosh explains what this means: “Using Obama as an example, you have how many times has he been mentioned in Spanish, in Mexico, by influential people. That’s one metric. Since we’re measuring multiple dimensions like language, sentiment, geographical location, and we have those for every single term, so it’s not just Obama but anything that anyone ever said. That’s a large number of metrics.” Supporters of Obama and Romney, however, will be interested only in one metric on November 6, “Who will win?”

More from Sify: