IBM Storage Ceph at 20 – a storage platform of choice for AI


IBM Storage Ceph is an open source Software Defined Storage (SDS) platform for block, file and object storage. It has a key role is as persistent storage for containerized environments. Since 2017 its in-built BlueStore file system has been able to manage SSDs and HDDs directly, no longer needing to link with other conventional file systems.

The invention and evolution of Ceph

It is 20 years old in 2024 and currently in the fifth phase of its development as follows:

  • Research – in 2004 it began life as a grant from the US Department of Energy in cooperation with Los Alamos, Lawrence Livermore and National Labs. It was worked on by Sage Weil and others, including graduate students from University of California at Santa Cruz School of Engineering.
  • Incubation by DreamHost, of which Weil was a co-founder. It was integrated into the Linux kernel (2.6.34) in 2010.
  • Commercialization by Inktank, which was created by Weil in 2012 to provide professional support for Ceph.
  • Red Hat acquired Inktank in 2014, allowing Ceph to become production software for enterprises complete with hotline support and continuous development. It set up the Ceph Community Advisory Board (CCAB) in 2015 to help its community steer its course, Its advisory board included members from Intel, Canonical, CERN, Cisco, Fujitsu, SanDisk (now Western Digital) and SUSE alongside Red Hat.
  • IBM acquired Red Hat in 2018 and integrated Ceph into its storage division in 2023.

The Linux Foundation launched the Ceph Foundation as a replacement to the CCAB; membership included China Mobile, Digital Ocean, OVH, ZTE and others. SUSE withdrew its support from Ceph in 2021, shifting to Longhorn instead.

IBM’s continued development of Ceph

Since its acquisition of Red Hat, IBM has added support for Ceph into other products such as its Diamondback tape storage system, Storage Discover and Storage Protect Plus. It is also an important part of IBM’s Spectrum Fusion software and Spectrum Fusion HCI systems. It is also the target of one of its three types of pre-configured Storage Ready Nodes (see my Figure above), which are servers with internal drives delivered by its partners and based on industry standard x86 processors.
An important new feature (added as a Technical Preview) of IBM Storage Ceph 7.0 – released in December 2023 – is its NVMe over TCP storage gateway. This exposes disk access over network transport to utilize the parallelism, durability and high performance of NVMe; enabling high-performance block storage without the need (and high costs) of Storage Area Networks (SANs).
Ceph can be deployed in large single and/or multi-site configurations, supporting hundreds of Petabytes of data and tens of billions of objects for either:

  • Traditional workloads, such as those using MySQL and/or MongoDB on OpenShift/OpenStack, or
  • Newer generative AI workloads, such as IBM watsonx.data’s data ‘lakehouse’, which includes 768TB of Ceph data.

The largest publicized enterprise Ceph deployments to date include those at the research organization CERN, the cloud company OVH and Digital Ocean.

Ceph as a storage platform for AI

Ceph (or something like it) is necessary for larger scale AI platforms, since the costs of executing code only in the nodes’ memory (common in smaller models) are prohibitive. In these larger models the GPU buffer fills up quickly and images need to be written quickly to storage.
OpenAI’s ChatGPT is currently creating huge revenue growth for itself, Nvidia (for GPUs and Infiniband networking) and Microsoft (as Azure is its exclusive cloud platform). If IBM can make its own versions of Ceph the storage of choice for AI – and not just its own watsonx platform – its revenues will grow significantly. To do so, as Ceph is a standard component of the Linux kernel, IBM will need to continue to offer solutions which can be integrated with current and evolving Large Language Models (LLMs), as well as data security, backup and archive services.