Java and Scalability

Every time I hear someone say that Java is “scalable” my initial reaction is to kick the person who said it in the shin.

I have been talking to a lot of people lately about the tools we are using at Gnip. Every time I tell someone that major parts of our system are written in Java the response seems to be, “Oh, for it’s scaling capability?” While I was safely ensconced in the Ruby world I had hope that this malformed meme was dead. It seems that in the wider world it’s not quite dead.

I never actually kick the person, by the way. Instead I just sigh and explain that, no that is not the reason. Scalability cannot be the reason we use Java, because Java does not scale any better, or worse, than any other general purpose language.

There are a variety of different sorts of scalability. The most interesting type of scaling in the context of web applications, like Gnip, is how easily can you increase the number of requests/sec the system can handle. This sort of scalability, or lack thereof, derives pretty much entirely from the architecture of the system. No language will magically make your system be able, or unable, to handle an order of magnitude increase in the number of requests.

The culture1 of Java actually encourages the development of mediumly, rather than highly, scalable systems. It does this by favoring the use of multi-threading, shared state, vertical scaling and large monolithic components. These techniques do not scale infinitely. Fortunately, Java is fast enough that they can scale to quite significant levels. Even though the culture of Java encourages these less than perfectly scalable techniques you can build highly scalable systems with Java quite readily. You just have to be willing to buck the culture when it is appropriate.

Performance, on the other hand, does derived, to a significant degree, from your language,2 and that is why we use Java.

  1. Every language has a set of idioms and practices that it, and it’s community, implicitly encourage. This set of idioms and practices are what I mean by culture.

  2. I really wish this were not the case. I don’t think it has to be this way but today Java is a lot faster that most of the languages I really like.

7 thoughts on “Java and Scalability

  1. Languages in general don’t affect scalability all that much. Yes, the actual execution speed of the language can determine the amount of CPU power (machines) you need to accomplish the job, but it won’t determine whether or not your system is capable of running on that number of machines and deliver the required performance. Like you said, that’s part of the architecture.

    Java – however – is not just a language, it’s also the huge collection of frameworks. “Java” is the equivalent of both Ruby and Rails. And while neither the language nor the frameworks are perfect, they have been used to build extremely scalable systems for about a decade now. People call Java scalable not because it’s better suitable for building scalable systems, but because how to build scalable systems with it is well understood and the components you need to do so are widely available. Not only are those components widely available, they are available from multiple vendors — some free, some commercial — so you’re never locked in, an aspect that’s extremely valuable when producing commercial software.

    I totally believe that you can build equally scalable applications using Ruby & Rails, but how to do that is not widely understood. Even if you can hire someone who understands it, you can’t replace the person if he leaves.

  2. scale is always such a relative term. so many factors have to be taken into account when “scaling” a system. requests/sec, effective use of CPU cycles, memory usage, horiz vs. vertical, etc etc.

    developer liquidity is a major factor as well. you have to look at the marketplace and grab a toolset/language/framework that lots of people know, so your env. can evolve with the ecosystem in which it lives.

    I often find folks talking about insert-language-here and “scale” doing so just as a conversational starting point rather than a definitive statement.

  3. Actually Java applications are generally less rather than more scalable than other comparable languages, particularly C++. This is because as memory requirements exceed around 1 Gbyte (admittedly we are talking LARGE programs here) the garbage collection pauses … the time when all Java application threads are suspended to allow the JVM garbage collection threads to collect all references so they can defragment memory … grows into multiple minutes. Lots of bad things happen then (heartbeat detection assumes the system has failed and removes resources, web site customers call it a day, stock trade opportunities are missed, etc.).

    As a result, huge response latency times effectively puts a limit on available memory, which in turn puts a limit on ultimate Java application scalability when compared to C++ (of course, with C++ you have to be careful to give the memory back). In fact, the Real Time Java API extensions were developed as a way of addressing this problem (creating what is in effect thread local heaps) but this requires code modification … and if done wrong, can actually slow down overall performance.

    One alternative (from Azul Systems) transparently redeploys unmodified Java applications to what is effectively a Java Compute Appliance, allowing them to scale to hundreds of Gbytes of memory with negligible GC pauses … but this is only possible because Azul exploits some special hardware instructions in their box. Another is to horizontally scale out an application (all stock trades A-C go to system #1, D-F go to system #2, etc.) so memory requirements for any one instance are under control. But this results in server sprawl, crashing instances when peak loading spikes (stock B split creating lots of activity) involved synchronization on multi-stock trades, special factoring of the data set into parallel instance caches, etc.

    Scaling Java applications past a certain point ain’t easy.

  4. Being an admittedly java-centric developer and a scalability nut i had to chime in on this.

    I would say that the java language itself has nothing to do with scalability (Ron’s comments about garbage collection were outdated and haven’t been seen by me in post 1.4 JVMS on multi-gigabyte production clusters) I do agree that the J2EE server mindset guides people into design patterns that are at a medium-level of opportunity for scalability.

    However, almost everyone accepts that old-school J2EE is dead and the unstoppable trend of inversion of control has caused a dismantling of the once immutable bundle known as “the server”

    Specific to scalability, this destruction of the monolithic server has lead to wider usage of such toolkits as: jgroups, ehcache and terracotta. Those toolkits encourage average architects to use the sophisticated networking design patterns formerly applied only by the highly experienced engineers who developed clustered server products. Today, jgroups is actually used internally by Jboss to implement their clustering model.

    I would say that the true force pushing scalability is the java community itself. Perhaps we have to thank Sun for fostering that community even though they did it for their own marketing purposes. In any case, java is merely a convenient and commonly spoken language that allows for the formation of a global community. Like it or not, Sun’s tight control of the java spec forces the community to stop debating the language internals and start debating sophisticated design patterns such as those that promote scalability.

  5. “No language will magically make your system be able, or unable, to handle an order of magnitude increase in the number of requests.”
    Other than Erlang of course. ;)

Comments are closed.