Skip to content

Pause indexing completely in serverless when throttling #127173

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 34 commits into from
May 8, 2025

Conversation

ankikuma
Copy link
Contributor

@ankikuma ankikuma commented Apr 22, 2025

This PR seeks to address stability issues seen in some CSPs. See ES-11516 for details. Serverless PR#3801 associated with this change.

When throttling is enabled for indexing, we limit indexing to 1 thread per shard. However, this might not be sufficient throttling in serverless where we might have a large number of shards. With this change we pause indexing completely when throttling is enabled.

@elasticsearchmachine elasticsearchmachine added needs:triage Requires assignment of a team area label v9.1.0 serverless-linked Added by automation, don't add manually labels Apr 22, 2025
@ankikuma ankikuma added the :Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. label Apr 22, 2025
@elasticsearchmachine elasticsearchmachine added Team:Distributed Indexing Meta label for Distributed Indexing team and removed needs:triage Requires assignment of a team area label labels Apr 22, 2025
@elasticsearchmachine
Copy link
Collaborator

Pinging @elastic/es-distributed-indexing (Team:Distributed Indexing)

@ankikuma ankikuma requested a review from a team as a code owner May 2, 2025 03:01
@ankikuma ankikuma requested review from tlrx and henningandersen May 2, 2025 03:10
Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a suggestion.

private volatile ReleasableLock lock = NOOP_LOCK;

public Releasable acquireThrottle() {
return lock.acquire();
if (lock == pauseLockReference) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems counter to how the current structure is, special casing this pause lock. Can we do something like following instead? We could also go all in on the Semaphore and remove the Lock interface from the equation.

Subject: [PATCH] phase2 variable name
---
Index: server/src/main/java/org/elasticsearch/index/engine/Engine.java
IDEA additional info:
Subsystem: com.intellij.openapi.diff.impl.patch.CharsetEP
<+>UTF-8
===================================================================
diff --git a/server/src/main/java/org/elasticsearch/index/engine/Engine.java b/server/src/main/java/org/elasticsearch/index/engine/Engine.java
--- a/server/src/main/java/org/elasticsearch/index/engine/Engine.java	(revision 10a60a55f949e04fb0ea513a877e8bd0ea4f8baf)
+++ b/server/src/main/java/org/elasticsearch/index/engine/Engine.java	(date 1746355091402)
@@ -110,6 +110,7 @@
 import java.util.Set;
 import java.util.concurrent.CountDownLatch;
 import java.util.concurrent.ExecutionException;
+import java.util.concurrent.Semaphore;
 import java.util.concurrent.TimeUnit;
 import java.util.concurrent.atomic.AtomicBoolean;
 import java.util.concurrent.locks.Condition;
@@ -454,67 +455,31 @@
         private final CounterMetric throttleTimeMillisMetric = new CounterMetric();
         private volatile long startOfThrottleNS;
         private static final ReleasableLock NOOP_LOCK = new ReleasableLock(new NoOpLock());
-        private final ReleasableLock lockReference = new ReleasableLock(new ReentrantLock());
-        private final Lock pauseIndexingLock = new ReentrantLock();
-        private final Condition pauseCondition = pauseIndexingLock.newCondition();
-        private final ReleasableLock pauseLockReference = new ReleasableLock(pauseIndexingLock);
-        private volatile AtomicBoolean pauseIndexing = new AtomicBoolean();
+        private final PauseLock throttlingLock;
+        private final ReleasableLock lockReference;
         private volatile ReleasableLock lock = NOOP_LOCK;
 
+        public IndexThrottle(boolean pause) {
+            throttlingLock = new PauseLock(pause ? 0 : 1);
+            lockReference = new ReleasableLock(throttlingLock);
+        }
+
         public Releasable acquireThrottle() {
-            if (lock == pauseLockReference) {
-                pauseLockReference.acquire();
-                try {
-                    while (pauseIndexing.getAcquire()) {
-                        // System.out.println("Waiting on pause indexing lock");
-                        logger.trace("Waiting on pause indexing lock");
-                        pauseCondition.await();
-                    }
-                } catch (InterruptedException e) {
-                    Thread.currentThread().interrupt();
-                    throw new RuntimeException(e);
-                } finally {
-                    // System.out.println("Acquired pause indexing lock");
-                    logger.trace("Acquired pause indexing lock");
-                }
-                return pauseLockReference;
-            } else {
-                return lock.acquire();
-            }
+            return lock.acquire();
         }
 
         /** Activate throttling, which switches the lock to be a real lock */
         public void activate() {
             assert lock == NOOP_LOCK : "throttling activated while already active";
             startOfThrottleNS = System.nanoTime();
+            throttlingLock.throttle();
             lock = lockReference;
         }
 
-        public void activatePause() {
-            assert lock == NOOP_LOCK : "throttling activated while already active";
-            startOfThrottleNS = System.nanoTime();
-            // System.out.println("Activate pause");
-            pauseIndexing.setRelease(true);
-            lock = pauseLockReference;
-        }
-
         /** Deactivate throttling, which switches the lock to be an always-acquirable NoOpLock */
         public void deactivate() {
             assert lock != NOOP_LOCK : "throttling deactivated but not active";
-
-            if (lock == pauseLockReference) {
-                logger.trace("Deactivate index throttling pause");
-
-                // Signal the threads that are waiting on pauseCondition
-                pauseLockReference.acquire();
-                try {
-                    // System.out.println("Deactivate pause");
-                    pauseIndexing.setRelease(false);
-                    pauseCondition.signalAll();
-                } finally {
-                    pauseLockReference.close();
-                }
-            }
+            throttlingLock.unthrottle();
             lock = NOOP_LOCK;
 
             assert startOfThrottleNS > 0 : "Bad state of startOfThrottleNS";
@@ -604,6 +569,53 @@
         }
     }
 
+    protected static final class PauseLock implements Lock {
+        private final Semaphore semaphore = new Semaphore(Integer.MAX_VALUE);
+        private final int allowThreads;
+
+        public PauseLock(int allowThreads) {
+            this.allowThreads = allowThreads;
+        }
+
+        public void lock() {} {
+            semaphore.acquireUninterruptibly();
+        }
+
+        @Override
+        public void lockInterruptibly() throws InterruptedException {
+            semaphore.acquire();
+        }
+
+        @Override
+        public void unlock() {
+            semaphore.release();
+        }
+
+        @Override
+        public boolean tryLock() {
+            throw new UnsupportedOperationException();
+        }
+
+        @Override
+        public boolean tryLock(long time, TimeUnit unit) throws InterruptedException {
+            throw new UnsupportedOperationException();
+        }
+
+        @Override
+        public Condition newCondition() {
+            throw new UnsupportedOperationException();
+        }
+
+        public void throttle() {
+            assert semaphore.availablePermits() == Integer.MAX_VALUE;
+            semaphore.acquireUninterruptibly(Integer.MAX_VALUE - allowThreads);
+        }
+
+        public void unthrottle() {
+            assert semaphore.availablePermits() <= allowThreads;
+            semaphore.release(Integer.MAX_VALUE - allowThreads);
+        }
+    }
     /**
      * Perform document index operation on the engine
      * @param index operation to perform

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I like this and it works so I can switch to this. I guess the only thing we loose with this is Re-entrancy, but I don't think we need that for the way we use this lock right now.

Copy link
Contributor

@henningandersen henningandersen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

It would be good to follow-up with a unittest - and also an integration test in this repo to ensure we have it tested here. But given the testing elsewhere we can merge this now to start experimenting.

@ankikuma ankikuma merged commit e29bee5 into elastic:main May 8, 2025
17 checks passed
ywangd pushed a commit to ywangd/elasticsearch that referenced this pull request May 9, 2025
This PR seeks to address stability issues seen in some CSPs. See ES-11516 for details. Serverless PR#3801 associated with this change.

When throttling is enabled for indexing, we limit indexing to 1 thread per shard. However, this might not be sufficient throttling in serverless where we might have a large number of shards. With this change we pause indexing completely when throttling is enabled.
jfreden pushed a commit to jfreden/elasticsearch that referenced this pull request May 12, 2025
This PR seeks to address stability issues seen in some CSPs. See ES-11516 for details. Serverless PR#3801 associated with this change.

When throttling is enabled for indexing, we limit indexing to 1 thread per shard. However, this might not be sufficient throttling in serverless where we might have a large number of shards. With this change we pause indexing completely when throttling is enabled.
arteam added a commit to arteam/elasticsearch that referenced this pull request May 13, 2025
`Engine.PauseLock#throttle` can be called when the lock is being throttled,
so we can't guarantee that all permits are available before throttling.

Resolve elastic#126359
See elastic#127173
arteam added a commit to arteam/elasticsearch that referenced this pull request May 13, 2025
`Engine.PauseLock#throttle` can be called when the lock is being throttled,
so we can't guarantee that all permits are available before throttling.

Resolve elastic#126359
See elastic#127173
arteam added a commit to arteam/elasticsearch that referenced this pull request May 13, 2025
`Engine.PauseLock#throttle` can be called when the lock is being throttled,
so we can't guarantee that all permits are available before throttling.

Resolve elastic#126359
See elastic#127173
ankikuma added a commit that referenced this pull request May 30, 2025
This PR changes the mechanism to pause indexing which was introduced in #127173. The original PR caused IndexStatsIT#testThrottleStats to fail. See #126359.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Distributed Indexing/Distributed A catch all label for anything in the Distributed Indexing Area. Please avoid if you can. >non-issue serverless-linked Added by automation, don't add manually Team:Distributed Indexing Meta label for Distributed Indexing team v9.1.0
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants