IBM dives into data lakes with the Elastic Storage System 5000

I very much enjoyed a briefing on IBM’s latest storage announcements ahead of today’s launch. As usual we listen to and asked questions of Eric Herzog, IBM’s CMO and VP of Worldwide Storage Channels. I also had a briefing from Sam Werner, VP of Storage Offering Management.

IBM’s solution for integrated infrastructure needed for ‘the journey to AI’ has three components (see my version of IBM’s Figure above). Today’s announcement concerns the first two of these areas – Collect and Organize.

New hardware for file and object data lakes

For me the most important part of today’s announcement is the introduction of the Elastic Storage System 5000 – an appliance based on IBM’s own Power 9 CPUS. It has a maximum capacity of 13.5PB for the largest (SC) configuration with 8 enclosures. This new offering follows on the heels of the ESS 3000, which was launched in October last year. The new machine differs in being spinning- (as opposed to solid state-) disk based and designed to provide data lakes for data collection (Collect) as opposed to the edge computing and high performance analysis in a small 2U form factor of the all flash ESS 3000 (Analyze).

While the ESS 5000 allows IBM customers to build fast and extensive file-based data lakes, the new (3.15) version of Cloud Object Storage software does something similar for Object Storage. The performance improvements include up to 300% faster ‘reads’ and 150% faster ‘writes’, up to 30% lower latency and up to 55GB/second in a scalable 12-node configuration. IBM has also now qualified the 18TB Shingled Magnetic Recording (SMR) hard disks, increasing the density of IBM Cloud Object Storage systems by 12%, making it the first Object storage supplier to offer host-managed SMR. The result is that IBM has reduced the cost and vastly shortened the time to read and write to these systems. Currently, IBM has 3 ways an end user can deploy Object Storage: embedded in the IBM Cloud Object Storage array, as on-premises software, or as a cloud configuration. When deployed as on-premises software, end users can leverage qualified hardware systems built using this software that can sit alongside other suppliers’ arrays and AWS cloud based approaches for data collection through Spectrum Scale of course.

Software adjustments will spread the utility of data lakes

Today IBM also made a number of important announcements concerning its Spectrum range of storage software. In particular:

  • Spectrum Discover has an updated policy engine, adding RESTful APIs to enable the use of external (and usually cheaper) data movers, of which Moonwalk is the first to be certified. Spectrum Discover can now also be deployed on Red Hat OpenShift.
  • Spectrum Scale is adding the ‘Data Acceleration for AI’ feature, which enhances data movement from Object storage, increasing performance and the ability for clients to eliminate certain data silos.

It also announced that Spectrum Protect Plus will become available on AWS marketplace at the end of July for customers who want to use it to protect their EC2 instances. It joins Spectrum Scale, which are already available on the AWS marketplace.

IBM is sticking to its practical ambitions

Today’s announcement is about Big data and AI, which applies mainly to general-purpose computing and, although IBM offers an integrated solution with Cloud Object Storage and Spectrum Scale integrated into System z running both LinuxOne and z/OS, I expect it will add more mainframe storage features in future.

The current pandemic is highlighting the use of cloud computing and a conversion of many mission critical applications from merely ‘computerized’ to ‘digitally transformed’, with an increasing use of hybrid multi cloud (where on-premise data centers are co-working with services such as AWS, Microsoft Azure, Google Cloud Services and the IBM Cloud) and container-based applications – often including AI and ML techniques. However we’re still at an early stage with as little as 20% of relevant applications having been ‘containerized’ to date; often the unstructured data (typically held in new formats) is being copied and held in multiple places, sometimes insecurely and made expensive by ingress and egress costs. In addition the developers of new-style applications typically work for ‘born on the Web’ companies and can be so focused on innovation that they sometimes overlook (or are perhaps even unaware of) the strictures of the enterprise computing needs of many major organizations.

IBM is a highly innovative, yet commercial supplier with demanding large enterprise customers and many in the most highly regulated industry sectors such as Banking, Insurance, Government and Manufacturing. Its role is not just to invent new storage and data offerings (or master those of others), but also to apply itself to providing cost-effective solutions to these clients as they strain to deploy these advanced, sophisticated and expensive new applications. It innovates and keeps up with other major players, while doing its best to simplify its messaging to customers and prospects. It has perhaps the clearest vision of the journey to AI we’re on and much of the secret sauce to get us there.