Subscribe to Our Bi-Weekly AI Newsletter

ND4J - Tensors in Java

ND4J is a tensor and n-dimensional array scientific computing library built for the JVM, and is part of the Deeplearning4j suite of software. It is meant to be used in production environments, which means routines are designed to run fast with minimum RAM requirements.

A usability gap has separated Java, Scala and Clojure programmers from the most powerful tools in data analysis, like NumPy or Matlab. Libraries like Breeze don’t support n-dimensional arrays, or tensors, which are necessary for deep learning and other tasks. Libraries like Colt and Parallel Colt use or have dependencies with GPL in the license, making them unsuitable for commercial use. ND4J and ND4S are used by national laboratories such as Nasa JPL for tasks such as climatic modeling, which require computationally intensive simulations.

ND4J brings the intuitive scientific computing tools of the Python community to the JVM in an open source, distributed and GPU-enabled library. In structure, it is similar to SLF4J. ND4J gives engineers in production environments an easy way to port their algorithms and interface with other libraries in the Java and Scala ecosystems.

ND4J Features

Most ML algorithms (especially deep learning) and a lot of scientific computing need tensor operations. There are a number of aspects to consider:

  1. Speed
  2. Compute devices (GPUs etc)
  3. Zero copy interop with high performance libraries
  4. Built-in operations
  5. Large arrays
  6. Types not supported by JVM

Speed. Of course, Java is considerably faster than pure Python, but both are slower than C++. ND4J’s C++ library libnd4j gives it vectorization (AVX, AVX2, AVX512) and multi-threading via OpenMP. That’s as fast as you can get on CPUs, anywhere, and considerably faster than a pure Java solution.

Compute devices Java doesn’t support GPUs directly. If you want GPUs, you need off-heap memory and C++ code for your operations. There’s no way around that. The same goes for other compute devices - TPUs and related chips. You can’t run Java on them. But you can run a Java process that offloads compute to them, which ND4J does. Offloading compute to any device requires off-heap memory for storage, otherwise you pay serialization/memory copy costs on each operation, which kills your performance. The ND4J team controls the underlying C++ code, and all the heavy lifting is done there, so ND4J can essentially support any type of hardware compute device. And, even better - the Java code you write isn’t in any way device specific. You can take the exact same Java code, switch your backend (one dependency), and run it on any device ND4J supports.

Zero copy interop If you host your data on the JVM heap (as a float[], double[], etc.), and you want to call a C++ library on that data (for example, BLAS) you need to do a copy, which adds unnecessary overhead. This is especially important for inter-process communication: e.g. when you want to pass data between say a Python process and the JVM without copying. This is also used/required in Apache Arrow, which is fast becoming a widely adopted standard for data interchange in data science libraries. ND4J enables zero-copy interop.

Built-in operations Deep learning and scientific computing are basically built out of a set of primitive operations - “elementwise addition”, “matrix multiplication”, “maximum along dimensions”, “softmax” etc. For productivity, you need to have a wide array of these primitives built into a library, as building blocks that you can use. Obviously you don’t get 95 percent of those ops “out of the box” in a JVM. ND4J offers them.

Large arrays All arrays in Java are limited to around 2.1 billion elements per array (or per dimension) (2^31) due to integer indexing. There’s no way around that using on-heap data (other than ugly hacks like splitting up arrays, etc). In ND4J, you can have arrays of length 2^63 (around 9.2x10^18 elements), so the only practical restriction on array size is the amount of memory you have available in your machine.

Types not supported by JVM Java doesn’t have unsigned integer types (uint8, uint16, unt32, uint64). Nor does it have bfloat16 support, or quantized types. All have uses in deep learning and scientific computing. Many users can get away without them, but having them is definitely a plus - and is required in a few cases. ND4J either supports - or will soon support - all of these types (quantized will be added soon, uint and bfloat types are already available). Adding new datatypes is on ND4J. For example, if we need other types that Java doesn’t support - such as complex number, 128-bit integers etc - that’s just a C++ change.


  • Versatile n-dimensional array object
  • Multiplatform functionality including GPUs
  • Linear algebra and signal processing functions
  • Supports GPUs via CUDA
  • Integrates with Hadoop and Spark
  • ND4S’s API mimics the semantics of Numpy

Further Reading

Chris Nicholson

Chris Nicholson is the CEO of Skymind. He previously led communications and recruiting at the Sequoia-backed robo-advisor, FutureAdvisor, which was acquired by BlackRock. In a prior life, Chris spent a decade reporting on tech and finance for The New York Times, Businessweek and Bloomberg, among others.

A bi-weekly digest of AI use cases in the news.