Twitter migrates core infrastructure to the JVM and supports more than 400 million Tweets a day.
Robert Benson gets a project update from Tung Vo, senior manager of software engineering.
Twitter’s 2006 launch was a long time ago in internet years—and the real-time information network has been on an upward trajectory ever since. By 2009, journalists were reporting that sheer numbers alone couldn’t possibly describe Twitter’s social impact. Like the internet itself, the larger Twitter’s user base grew, the more useful the service became. Today Twitter is in the vanguard of the social web, an emerging cultural network in which static content from a relatively small group of publishers is being replaced by free-form, dynamic interaction among millions of participants. And while analysts debate whether Web 2.0 is truly a new paradigm or merely the collaborative medium the internet was originally envisioned to be, Twitter’s more than 200 million active users aren’t wasting their 140 characters arguing about internet semantics. They’re simply tweeting—to the tune of more than 400 million Tweets per day—as they discuss every topic imaginable, from casual chatter to world-changing social and political issues.
Thanks to Java, Twitter facilitates those discussions more competently than ever.
“Performance is one of the most important products that any service can deliver to its customers,” says Robert Benson, senior director of software engineering at Twitter. “End users want Twitter to be fast so they can get real-time information. Reliability and performance are huge goals for us, and that’s why part of our core strategy involves moving to the Java Virtual Machine [JVM] runtime environment. Twitter no longer has the performance issues it previously had, and that’s in large part due to our moving to the JVM.”
Twitter is one of the top 10 most-visited sites on the internet. Unregistered users can read Tweets, and registered users can post them via the web, SMS, or apps for mobile devices. Part of what makes Twitter attractive is its user-friendly functionality. Hashtags, trending topics, following, @replying, and retweeting all contribute to its ease of use and popularity.
Engineers at Twitter admit that while the company has always cared deeply about the quality of its service, delivering on that has been challenging in the face of such explosive growth.
Most longtime Twitter users are familiar with the fail whale error message. The whale illustration, created by Chinese-Australian artist Yiying Lu, pops up to inform users that Twitter is over capacity and urges them to try again later. But service outages on Twitter have been noticeably less frequent since late 2010. That’s no coincidence. Benson says that Twitter’s engineers have been doing a lot of thinking about Twitter’s architecture and the challenge of handling so many requests every second. Years of constantly refining their approach have brought them to a solution that uses the JVM to build systems that can handle that load easily by scaling horizontally.
Average number of
search queries per day
Monthly active Twitter users
Average number of
Tweets sent per day
"Reliability and performance are huge goals for us, and that’s why part of our core strategy involves moving to the Java Virtual Machine [JVM] runtime environment. Twitter no longer has the performance issues it previously had, and that’s in large part due to our moving to the JVM."
Robert Benson and Ben Hindman chat in one of Twitter’s casual conference booths.
“We don’t want to depend upon machines getting taller and taller or the resources in those individual boxes increasing,” Benson says. “We want computers that can handle requests in parallel. The JVM is a managed language runtime that can deal with concurrency in a very efficient way. It can handle these types of workloads. A great deal of our data center was dedicated to handling the API traffic of our customers. That can now be managed with far fewer machines on the JVM while delivering huge boosts in performance.”
The Twitter team has moved many of the company’s most critical systems to a set of services written in Java and Scala running on the JVM. The service is now a worldwide presence that can capably handle sustained peak levels during major events like the Super Bowl and the US Presidential election without an appearance from the fail whale. Users enjoy a very fast system that enables them to get information within seconds about events taking place all over the world.
Benson says the migration to the JVM not only delivers performance wins; it also provides something he likes to call observability. “Running a service of this scale, things go wrong all the time, either because of runtime issues or because software is being deployed every hour,” he says. “We want to be sure we understand why those failures happen. With the JVM, it is a lot easier for us to examine those events in a robust way than it was with other runtimes we have used in the past.”
Finally, one of the critical factors in moving to the JVM was the OpenJDK open source project. “As the guy who needs to think strategically about how my organization can work, not only inside the building but also externally, the open source nature of the JVM is very important to me because we can see the source on which we’re building the core infrastructure,” Benson says. “We hire engineers to work with that codebase and community to improve the runtime so that we can build faster and more predictable services on top of the JVM. These are the kinds of things that made the choice of migrating to the JVM easy.”
Benson discusses hiring with recruiter Christian Bogeberg (left) and Hindman in Twitter's Commons, where three meals a day are served to employees, free of charge.
The traffic on Twitter runs the gamut from logistical instructions (“September 17. Zuccotti Park. Bring tent. #OccupyWallStreet”) to simple details of people’s daily lives (“Corn dogs for lunch again. #Winning”). But no matter what people tweet about, they want Twitter to perform the same way, every time they use it.
Benson says the JVM plays an important role in refining the predictability of the service. “Because the JVM is a managed runtime, our developers can work more quickly,” he explains. “They don’t need to worry about manually managing memory. But the tradeoff is, we need the JVM to efficiently manage garbage, which are objects that the process doesn’t need anymore.”
Predictability is difficult to achieve in the garbage collection process. Benson and his team are investing engineering resources in endeavoring to fine-tune the JVM to make it a more predictable garbage collector. “This is critical in a high-throughput, low-latency system like ours,” he continues. “We are working with people in the JVM community to improve the garbage collection strategies and heuristics.”
Benson’s team is also helping to improve the JVM as Twitter moves toward a cloud architecture where there are no dedicated boxes for anything and processes can flow around a data center, automatically rescheduled in the wake of failures. “With a cloud infrastructure, we have multiple JVMs that run side by side in a single box,” he notes. “We are actively investigating ways to improve how that works to minimize performance blips due to the colocation of workloads. Our goal is to contribute back to the OpenJDK community so that we help push that platform forward.”
To help explain the importance of the JVM within the Twitter infrastructure, here is a high-level overview of the primary ways in which the JVM affects an average Tweet as it is set loose into the Twittersphere:
Benson believes his team is pushing the JVM in ways that it has rarely been pushed elsewhere. “Our latency requirements, the predictability of the garbage collection, the number of languages we run on the JVM, and much more make Twitter an exciting place to work,” he says. “The JVM gives us the flexibility to construct new and better products.”
As talented Java and Scala developers work on Twitter’s data systems, analytic systems, client infrastructure, and multiple other projects, they do more than enrich the Twitter experience for end users. They also help enhance the Java platform. That’s something most service companies don’t have the knowledge or wherewithal to do, so it’s no surprise that Twitter attracts some of the brightest software engineers in the industry.
“There are many reasons the JVM is a great fit for us,” Benson summarizes. “We benefit from concurrency, ability to run multiple languages in the same process, and observability of the JVM. Our operations are more cost-effective because we can do more with less in our data centers. Most importantly, we can deliver reliable service to our customers. The JVM gives us many tools we need to experiment, iterate, and quickly deliver what the world wants.”
Based in Santa Barbara, California, David Baum and Ed Baum write about innovative businesses, emerging technologies, and compelling lifestyles.
Article originally written for Java Magazine