Skip to Main Content

Jun 2, 2026 | 12 minute read

Performance Engineering at Elastic Path

written by Kristian van der Hoek

A coding gear on a long winding road.

Updated May 2026: This post has been updated to reflect our latest performance results and infrastructure investments since the original 2022 publication. Our performance engineering practice and the principles described here remain unchanged - we've added new sections covering what we've built, what we've optimized, and what we've proven in the years since.

Elastic Path maintains a serious commitment to product performance and scalability. We support eCommerce applications for major enterprises like Intuit, T-Mobile, and Swisscom, processing hundreds of thousands of orders hourly during peak traffic. This isn't an accident. It is the result of a dedicated performance engineering practice that has been running continuously for nearly two decades, applied first to our legacy Commerce platform and extended to Elastic Path Composable Commerce platform from the ground up.

This post describes how we think about performance engineering, the principles that guide our testing, and what our platform can actually do when pushed.

Our History

Performance engineering has been central to Elastic Path's operations for nearly two decades. We built our Commerce platform to handle hundreds of orders per second and have maintained close partnerships with enterprise clients throughout their scaling journeys, learning alongside them what it actually means to support commerce at scale.

A notable milestone came in 2010 when Elastic Path hosted the Olympic Winter Games merchandising store. Our infrastructure proved its resilience when Oprah Winfrey gifted each member of her studio audience a pair of the red Olympic mittens during a live episode, causing an immediate, massive surge in traffic that our systems absorbed without incident. That kind of test you cannot replicate in a lab, and passing it built a deep confidence in what a well-engineered commerce platform could withstand.

Over the past several years, we extended that performance expertise to Elastic Path Composable Commerce, our multi-tenant SaaS offering. Bringing the same rigor to a multi-tenant architecture introduced new challenges: shared infrastructure, noisy neighbours, and more complex database topologies. Through extensive testing and optimization, we scaled the platform's throughput by more than an order of magnitude from where it started.

Our Practice

Elastic Path has maintained a dedicated performance engineering practice for nearly two decades. This is not a discipline that gets activated before a major release and stands down afterward. Performance engineering is a standing, full-time commitment, and we hold that standard because the cost of a slow or unavailable platform is measured directly in our customers' lost conversions and lost trust. That demands people whose sole focus is ensuring it never happens.

The work demands a rare combination of disciplines: deep systems and infrastructure knowledge, load testing expertise, database performance intuition, familiarity with distributed systems failure modes, and the investigative persistence to chase a slow endpoint across a dozen microservices and three layers of infrastructure to find its root cause.

Our team's combined experience spans multiple decades, and that depth shows in the texture of the practice: in how we construct workloads, how we interpret anomalies, and how we recognize when a slow test result reflects a real regression versus environmental noise.

Core Principles

Test Early and Often

Performance testing at Elastic Path is integrated from the very beginning of the engineering process. Developers subject new code to hundreds of end-to-end tests that measure response times at the individual API endpoint level, not just aggregate throughput. Load and stress tests run in staging environments that mirror production clusters in both infrastructure configuration and dataset complexity, using large, realistic catalogs rather than toy datasets. Results are published to the entire company automatically, so engineering teams are never shielded from the performance characteristics of what they have shipped.

Long-running overnight tests, some running continuously for twelve hours or more, complement the shorter validation runs that execute on every code change. These extended tests uncover behaviors that only emerge over time: memory drift, connection pool exhaustion, slow degradation under sustained load. They are an essential complement to the point-in-time picture that shorter tests provide.

A dedicated performance lab handles work that cannot fit into standard staging environments: massive-scale throughput tests, exploratory stress tests on new infrastructure configurations, and the kind of deep-dive investigations that require sustained load over hours or days. It is where we ran the throughput tests described later in this post.

Test Realistic Workloads

Realistic workload design is one of the most important investments a performance engineering team can make, and one of the easiest to shortcut. A test that does not model real user behavior will tell you almost nothing useful about how a platform behaves under real traffic.

We maintain over a dozen workload scenarios, each constructed from a combination of platform expertise and analysis of actual production traffic logs. These scripts are tunable across a range of parameters: conversion rates, product complexity, promotion usage, guest versus registered checkout flows, cache hit ratios, and the mix of concurrent merchandiser and publishing activity running alongside shopper traffic.

A typical low-conversion B2C guest checkout workload consists of 85% browsing, 10% cart additions, and 5% checkout completion, layered across multiple simultaneous stores with concurrent merchandiser operations that simulate the catalog editing, price updates, and publish activity that runs in production alongside real shopper traffic. That layering matters. A platform that performs well under isolated shopper load but degrades when a catalog publish is running in parallel is not behaving the way it will for a real customer.

As our B2B customer base has grown, we have extended our workload library to cover the scenarios that matter most for enterprise buyers: large carts with dozens to hundreds of line items, account-based checkout flows, and the high-volume order ingestion patterns typical of marketplace integrations and OMS-connected deployments.

Test Realistic Datasets

Our test catalogs are built to match the complexity of real customer production deployments, not to make the numbers look good. A performance test run against a small, simple catalog tells you almost nothing about how the platform will behave for an enterprise customer with a million-product catalog and a decade of order history.

Our standard load test environment includes datasets with up to:

  • 3.1 million products
  • 101,000 hierarchy containers
  • 100,000 promotions
  • 40 million promotion codes
  • 250 custom fields per product
  • 30,000 accounts
  • 250,000 registered users
  • 80 million orders

These dimensions were chosen based on the actual production configurations of our largest customers, and we revisit them regularly as customer usage patterns evolve.

Test for Reliability

We treat reliability as revenue-critical. Our engineering process follows Agile Development, Continuous Delivery, and DevOps principles. Code must pass thousands of automated quality checks, unit tests, end-to-end tests, security tests, and performance evaluations before it is eligible for production deployment. No exceptions.

Extended reliability tests, running twelve or more hours under sustained load, regularly demonstrate error rates below 0.01%. This is not a target we set and hope for; it is a bar we verify repeatedly, not one we simply declare and move on from.

Pre-production environments process over 15 million API calls daily, providing continuous validation of the platform under conditions that closely approximate production behavior. That volume of pre-production traffic means we surface problems early, in environments where they can be fixed before they affect customers.

Our production infrastructure carries a 99.99% uptime SLA. We have exceeded it, maintaining 100% uptime over the past two years.

Constant Improvement

Improvement is never finished at Elastic Path.

When we first published this post in 2022, we had recently completed a multi-year journey from 130,000 to 300,000+ orders per hour, achieved while simultaneously scaling catalog complexity from 5,000-product test datasets to catalogs with over one million products. Autoscaling capabilities had been tuned to bring a full cluster to 10x capacity in under 5 minutes. We described it as a tenfold improvement in scalability over three years.

Since then, the work has continued on every dimension.

Infrastructure and architecture. We identified and resolved Kubernetes Horizontal Pod Autoscaler configuration issues that had been creating an artificial ceiling on our application tier's ability to scale under heavy load. The root cause was subtle: with the average CPU utilization threshold set too high, pods consumed more CPU than they had requested, leading to resource contention that prevented Kubernetes from triggering additional nodes. The cluster appeared to be scaling when it was actually saturating. Resolving this unlocked the full capacity of our database infrastructure and was a necessary prerequisite for the throughput results described below.

We validated and deployed MongoDB 8, which delivers measurably improved write performance under load, particularly relevant for high-throughput order scenarios where write operations are on the critical path. We also began migrating our compute workloads to ARM-based instances. Testing showed a 34% throughput improvement for our API gateway and 40% improvement for our Orders service compared to equivalent AMD instances, driven largely by the performance characteristics of newer ARM silicon. Full migration across all services is still in progress.

B2B and large cart performance. As enterprise and B2B workloads have become more central to our roadmap, we invested directly in the performance of larger, more complex transactions. A bulk files endpoint optimization delivered a 40% improvement in response time for 50-item carts, with gains that increase proportionally as cart size grows. A separate change replaced an O(n²) stock validation loop in the Orders service with an O(n) map-based approach, eliminating a CPU bottleneck that became significant under inventory-enabled large cart loads. Both changes are in production.

Observability. Our regression alerting algorithm was rebuilt from the ground up using robust Z-scoring, replacing a simpler percentage-deviation approach. The new algorithm calculates a Z-score from a rolling window of historical results and adjusts dynamically to the natural variability of each metric. The practical result is that we can now alert on individual API call response times across the full test suite, not just on aggregate throughput, giving us a much earlier and more precise signal when something regresses. We also extended distributed tracing through our full payment infrastructure stack, providing end-to-end visibility from the API gateway into third-party payment provider calls for the first time.

Large catalog publishing. We also tested how the platform handles publishing at enterprise catalog scales, motivated by prospects with catalogs in the millions of SKUs. Testing across catalog sizes from 310,000 to 3.1 million products showed publish times scale linearly with catalog size throughout. On an M50 instance with 8,000 IOPS, a catalog of roughly one million products publishes in under 30 minutes even under full production load, with concurrent shopper traffic, merchandiser activity, and publishing housekeeping all running simultaneously. Infrastructure sizing directly determines publish throughput, and the relationship is predictable and proportional.

Maximum throughput. In late 2025, we ran a series of tests to answer a question we hadn't fully answered before: if a customer needs to push our platform as hard as it will go, what is the actual ceiling? We designed two scenarios, each intended to answer a different version of that question.

Test 1 - Realistic workload, scaled infrastructure. We ran an 80% conversion rate workload including full cart flows, active promotions, and a complex catalog: the kind of load profile that reflects a real high-volume commerce deployment, not a synthetic benchmark stripped of everything that makes commerce hard. With MongoDB scaled to M60 instances at 16,000 IOPS and maximum replica counts increased to support the load, we achieved 255 orders per second per production deployment, approximately 918,000 orders per hour.

The infrastructure configuration required to reach this number is not exceptional. Scaling a database instance and provisioning additional replicas is the kind of capacity planning any enterprise operation undertakes ahead of a major peak season. What this result demonstrates is that when a customer needs headroom, the platform can provide it, and the relationship between infrastructure investment and throughput is direct and predictable.

Test 2 - Optimized workload, maximum hardware. We designed a second scenario for a different set of use cases: headless B2B commerce platforms, marketplace order ingestion pipelines, and OMS integrations where order volume is high but the cart and promotion complexity of a typical retail storefront is absent. With a streamlined 10,000-product catalog, no cart flows, no promotions, 100% conversion rate, and Kubernetes pre-scaled ahead of the test, we achieved 1,137 orders per second per production deployment, approximately 4 million orders per hour. This is not a general claim about what the platform does in production. It is a demonstration of what is possible for a specific architectural pattern that is increasingly common in B2B and marketplace contexts.

Both results point to the same architectural conclusion: the platform scales linearly with resources. There is no hidden ceiling, no architectural constraint that emerges at scale and caps what additional infrastructure investment can unlock.

Production Performance Monitoring

Testing environments tell you a great deal, but they cannot fully replicate the complexity of production. We invest heavily in production observability using best-in-class monitoring tooling that tracks SLA adherence, individual API response times, error rates, and system metrics across the full infrastructure stack.

Distributed tracing spans the complete request lifecycle, from the moment a request enters our API gateway, through the microservices that process it, and into third-party payment providers. This end-to-end visibility means that when latency appears somewhere in the system, we can identify exactly where it is and what it is waiting for, rather than reasoning backward from aggregate metrics.

AI-assisted anomaly detection monitors thousands of data points continuously, identifying patterns that do not cross a fixed alert threshold but deviate meaningfully from historical behavior. This catches a class of slowly developing issues, such as gradual drift in response times or subtle increases in database query latency, that rule-based alerting tends to miss until they become obvious. Alerts trigger responses from 24-hour on-call operations staff, though systems are designed to self-correct through automated health checks and service cycling before human intervention is required.

Results

Our 2026 benchmarks:

  • 30 ms average API response times (100 ms at the 95th percentile)
  • 99.99% SLA compliance, with 100% uptime over the past two years
  • Support for 3M+ SKU catalogs
  • 255+ orders per second per production deployment under realistic workloads (~918,000 orders per hour)
  • 1,137+ orders per second per production deployment under optimized workloads (~4M orders per hour)
  • 15M+ API calls processed per hour per deployment
  • Autoscale to 10x capacity in under 5 minutes

Get Started with Elastic Path

Schedule a demo to see how Elastic Path delivers unified commerce for leading global brands.