Predicting what topics will trend on Twitter — from MIT


At the Interdisciplinary Workshop on Information and Decision in Social Networks at MIT in November, Associate Professor Devavrat Shah and his student, Stanislav Nikolov, will present a new algorithm that can, with 95 percent accuracy, predict which topics will trend an average of an hour and a half before Twitter’s algorithm puts them on the list — and sometimes as much as four or five hours before.

The algorithm could be of great interest to Twitter, which could charge a premium for ads linked to popular topics, but it also represents a new approach to statistical analysis that could, in theory, apply to any quantity that varies over time: the duration of a bus ride, ticket sales for films, maybe even stock prices.

Like all machine-learning algorithms, Shah and Nikolov’s needs to be “trained”: it combs through data in a sample set — in this case, data about topics that previously did and did not trend — and tries to find meaningful patterns. What distinguishes it is that it’s nonparametric, meaning that it makes no assumptions about the shape of patterns.

In principle, Shah says, the new algorithm could be applied to any sequence of measurements performed at regular intervals. But the correlation between historical data and future events may not always be as clear cut as in the case of Twitter posts. Filtering out all the noise in the historical data might require such enormous training sets that the problem becomes computationally intractable even for a massively distributed program. But if the right subset of training data can be identified, Shah says, “It will work.”