Incident Report – Degradation in checkout performance
Date of Incident: Monday, 2nd June 2025
Time: 12:30 PM to 1:15 PM IST
Status: Resolved
Impact: Slowdown in cart and checkout services
Summary
On Monday, June 2nd, 2025, a service degradation occurred between 12:30 PM and 1:15 PM IST, impacting the performance of cart and checkout services. Early signs of slowness were detected at 12:10 PM, leading to an immediate status update.
The incident stemmed from a significant slowdown in a database query responsible for retrieving cart details. This issue caused high database load, affecting both cart and checkout functionalities due to the shared database infrastructure. Fallback mechanisms were employed to maintain core functionality during the disruption.
What Happened
- A critical database query responsible for fetching cart details experienced a significant performance degradation. On June 1st, the query consistently executed in approximately 9 milliseconds. However, by June 2nd, the execution time spiked to around 45 milliseconds — a 5x increase — despite no code deployments, configuration changes, infrastructure updates, or traffic spikes during that period.
- This caused a high load on our database, affecting both the cart and checkout services.
- To stabilize the system, we temporarily disabled the cart feature while maintaining access to the checkout.
- Customers were still able to complete purchases, although cart functionality was briefly paused.
What We Did to Fix It
- Temporarily disabled the cart feature to immediately reduce database load and safeguard the checkout process, ensuring transaction completion.
- Rapidly investigated and identified the specific slow database query, then applied optimizations by adding new indexes to the database.
- Successfully optimized the identified query, resulting in a significant performance improvement from 45 milliseconds to 0.9 milliseconds.
- After verifying database stability and performance, the cart feature was re-enabled for all merchants, restoring full functionality.
Preventive Measures
- Conduct a full audit of all high-impact database queries to identify and optimize potential bottlenecks.
- Set up a recurring check to monitor query design and indexing practices to prevent similar issues from arising.
- Explore service-level isolation between cart and checkout to limit shared infrastructure impact in the future.
- Enhance fallback strategies to maintain user experience during localized slowdowns.
If you need further details or have any questions, please reach out to us at support@shopflo.com