Titanium is an explicitly parallel dialect of Java developed at UC Berkeley  to support high-performance scientific computing on large-scale multiprocessors, including massively parallel supercomputers and distributed-memory clusters with one or more processors per node. Other language goals include safety, portability, and support for building complex data structures.
The main additions  to Java are:
Titanium provides a global memory space abstraction (similar to other languages such as Split-C and UPC) whereby all data has a user-controllable processor affinity, but parallel processes may directly reference each other's memory to read and write values or arrange for bulk data transfers. A specific portability result is that Titanium programs can run unmodified on uniprocessors, shared memory machines, and distributed memory machines. Performance tuning may be necessary to arrange an application's data structures for distributed memory, but the functional portability allows for development on shared memory machines and uniprocessors.
Titanium is a superset of Java and inherits all the expressiveness, usability, and safety properties of that language. Titanium augments Java's safety features by providing checked synchronization that prevents certain classes of synchronization bugs. To support complex data structures, Titanium uses the object-oriented class mechanism of Java along with the global address space to allow for large shared structures. Titanium's multidimensional array facility adds support for high-performance hierarchical and adaptive grid-based computations.
Our compiler research focuses on the design of program analysis techniques and optimizing transformations for Titanium programs, and on developing a compiler and run-time system that exploit these techniques. Because Titanium is an explicitly parallel language, new analyses are needed even for standard code motion transformations. The compiler analyzes both synchronization constructs and shared variable accesses. Transformations include cache optimizations, overlapping communication, identifying references to objects on the local processor, and replacing runtime memory management overhead with static checking. Our current implementation translates Titanium programs entirely into C, where they are compiled to native binaries by a C compiler and then linked to the Titanium runtime libraries (there is no JVM).
The current implementation runs on a wide range of platforms, including uniprocessors, shared memory multiprocessors, distributed-memory clusters of uniprocessors or SMPs (CLUMPS), and a number of specific supercomputer architectures (Cray T3E, IBM SP, Origin 2000). The distributed memory back-ends can use a wide variety of high-performance network interconnects, including Active Messages, MPI, IBM LAPI, shmem, and UDP.
Titanium is especially well adapted for writing grid-based scientific parallel applications, and several such major applications have been written and continue to be further developed ([3,4], and many others).