Skip to content

[FSSDK-11776] fixing memory leak with project #372

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
126 changes: 126 additions & 0 deletions .github/copilot-instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,126 @@
# Optimizely Ruby SDK - AI Coding Agent Instructions

## Architecture Overview

This is the **Optimizely Ruby SDK** for A/B testing and feature flag management. Key architectural patterns:

### Core Components
- **`Optimizely::Project`** (`lib/optimizely.rb`): Main client class, entry point for all SDK operations
- **`DecisionService`** (`lib/optimizely/decision_service.rb`): Core bucketing logic with 7-step decision flow (status→forced→whitelist→sticky→audience→CMAB→hash)
- **`ProjectConfig`** (`lib/optimizely/config/`): Manages datafile parsing and experiment/feature configuration
- **Config Managers** (`lib/optimizely/config_manager/`): Handle datafile fetching - `HTTPProjectConfigManager` for polling, `StaticProjectConfigManager` for static files

### Data Flow Patterns
- **User Context**: `OptimizelyUserContext` wraps user ID + attributes for decision APIs
- **Decision Pipeline**: All feature flags/experiments flow through `DecisionService.get_variation_for_feature()`
- **Event Processing**: `BatchEventProcessor` queues and batches impression/conversion events
- **CMAB Integration**: Contextual Multi-Armed Bandit system in `lib/optimizely/cmab/` for advanced optimization

## Development Workflows

### Testing
```bash
# Run all tests
bundle exec rake spec

# Run specific test file
bundle exec rspec spec/decision_service_spec.rb

# Run linting
bundle exec rubocop
```

### Build & Gem Management
```bash
# Build gem locally
rake build # outputs to /pkg/

# Run benchmarks
rake benchmark
```

## Project-Specific Conventions

### Ruby Patterns
- **Frozen string literals**: All files start with `# frozen_string_literal: true`
- **Module namespacing**: Everything under `Optimizely::` module
- **Struct for data**: Use `Struct.new()` for decision results (e.g., `DecisionService::Decision`)
- **Factory pattern**: `OptimizelyFactory` provides preset SDK configurations

### Naming Conventions
- **Experiment/Feature keys**: String identifiers from Optimizely dashboard
- **User bucketing**: `bucketing_id` vs `user_id` (can be different for sticky bucketing)
- **Decision sources**: `'EXPERIMENT'`, `'FEATURE_TEST'`, `'ROLLOUT'` constants in `DecisionService`

### Error Handling
- **Graceful degradation**: Invalid configs return `nil`/default values, never crash
- **Logging levels**: Use `Logger::ERROR`, `Logger::INFO`, `Logger::DEBUG` consistently
- **User input validation**: `Helpers::Validator` validates all public API inputs

## Critical Integration Points

### Datafile Management
- **HTTPProjectConfigManager**: Polls Optimizely CDN every 5 minutes by default
- **Notification system**: Subscribe to `OPTIMIZELY_CONFIG_UPDATE` for datafile changes
- **ODP integration**: On datafile update, call `update_odp_config_on_datafile_update()`

### Event Architecture
- **BatchEventProcessor**: Default 10 events/batch, 30s flush interval
- **Thread safety**: SDK spawns background threads for polling and event processing
- **Graceful shutdown**: Always call `optimizely.close()` to flush events

### Feature Flag Decision API
- **New pattern**: Use `decide()` API, not legacy `is_feature_enabled()`
- **Decision options**: `OptimizelyDecideOption` constants control behavior (exclude variables, include reasons, etc.)
- **Forced decisions**: Override bucketing via `set_forced_variation()` or user profile service

## Test Patterns
- **Spec location**: One spec file per class in `/spec/` directory
- **WebMock**: Mock HTTP requests in tests (`require 'webmock/rspec'`)
- **Test data**: Use `spec_params.rb` for shared test constants
- **Coverage**: Coveralls integration via GitHub Actions

## Common Gotchas
- **Thread spawning**: Initialize SDK after forking processes in web servers
- **Bucketing consistency**: Use same `bucketing_id` across SDK calls for sticky behavior
- **CMAB caching**: Context-aware ML decisions cache by user - use appropriate cache invalidation options
- **ODP events**: Require at least one identifier, auto-add default event data

## Critical Memory Leak Prevention

### **NEVER** Initialize Project Per Request
```ruby
# ❌ MEMORY LEAK - Creates new threads every request
def get_optimizely_client
Optimizely::Project.new(datafile: fetch_datafile)
end

# ✅ CORRECT - Singleton pattern with proper cleanup
class OptimizelyService
def self.instance
@instance ||= Optimizely::Project.new(datafile: fetch_datafile)
end

def self.reset!
@instance&.close # Critical: stops background threads
@instance = nil
end
end
```

### Thread Management
- **Background threads**: `Project.new()` spawns multiple threads (`BatchEventProcessor`, `OdpEventManager`, config polling)
- **Memory accumulation**: Each initialization creates new threads that persist until explicitly stopped
- **Proper cleanup**: Always call `optimizely_instance.close()` before dereferencing
- **Rails deployment**: Use singleton pattern or application-level initialization, never per-request

### Production Patterns
```ruby
# Application initialization (config/initializers/optimizely.rb)
OPTIMIZELY_CLIENT = Optimizely::Project.new(
sdk_key: ENV['OPTIMIZELY_SDK_KEY']
)

# Graceful shutdown (config/application.rb)
at_exit { OPTIMIZELY_CLIENT&.close }
```
102 changes: 102 additions & 0 deletions lib/optimizely.rb
Original file line number Diff line number Diff line change
Expand Up @@ -54,11 +54,90 @@ class Project
DEFAULT_CMAB_CACHE_TIMEOUT = (30 * 60 * 1000)
DEFAULT_CMAB_CACHE_SIZE = 1000

# Class-level instance cache to prevent memory leaks from repeated initialization
@instance_cache = {}
@cache_mutex = Mutex.new

class << self
attr_accessor :instance_cache, :cache_mutex
end

attr_reader :notification_center
# @api no-doc
attr_reader :config_manager, :decision_service, :error_handler, :event_dispatcher,
:event_processor, :logger, :odp_manager, :stopped

# Get or create a cached Project instance for static datafile configurations
# This prevents memory leaks when the same datafile is used repeatedly
#
# @param datafile [String] JSON string representing the project
# @param options - Hash of initialization options (optional)
# @return [Project] Cached or new Project instance
def self.get_or_create_instance(datafile: nil, **options)
# Only cache static datafile configurations (no sdk_key, no custom managers)
return new(datafile: datafile, **options) if should_skip_cache?(datafile, options)

cache_key = generate_cache_key(datafile, options)

cache_mutex.synchronize do
# Return existing instance if available and not stopped
return instance_cache[cache_key] if instance_cache[cache_key] && !instance_cache[cache_key].stopped

# Create new instance and cache it
instance = new(datafile: datafile, **options)
instance_cache[cache_key] = instance
instance
end
end

# Clear all cached instances and properly close them
def self.clear_instance_cache!
cache_mutex.synchronize do
# First stop all instances without removing them from cache to avoid deadlock
instance_cache.each_value do |instance|
next if instance.stopped

instance.instance_variable_set(:@stopped, true)
Copy link
Preview

Copilot AI Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using instance_variable_set to modify internal state bypasses encapsulation and could lead to inconsistent state. Consider adding a private method to handle this state change properly.

Suggested change
instance.instance_variable_set(:@stopped, true)
instance.send(:mark_stopped!)

Copilot uses AI. Check for mistakes.

instance.config_manager.stop! if instance.config_manager.respond_to?(:stop!)
instance.event_processor.stop! if instance.event_processor.respond_to?(:stop!)
instance.odp_manager.stop!
end
# Then clear the cache
instance_cache.clear
end
end

# Get count of cached instances (for testing/monitoring)
def self.cached_instance_count
cache_mutex.synchronize { instance_cache.size }
end

private_class_method def self.should_skip_cache?(datafile, options)
# Don't cache if using dynamic features that would make sharing unsafe
return true if options[:sdk_key] ||
options[:config_manager] ||
options[:event_processor] ||
options[:user_profile_service] ||
datafile.nil? || datafile.empty?

# Also don't cache if custom loggers or error handlers that might have state
return true if options[:logger] || options[:error_handler] || options[:event_dispatcher]

false
end

private_class_method def self.generate_cache_key(datafile, options)
# Create cache key from datafile content and relevant options
require 'digest'
content_hash = Digest::SHA256.hexdigest(datafile)
options_hash = {
skip_json_validation: options[:skip_json_validation],
default_decide_options: options[:default_decide_options]&.sort,
event_processor_options: options[:event_processor_options]
}
"#{content_hash}_#{Digest::SHA256.hexdigest(options_hash.to_s)}"
end

# Constructor for Projects.
#
# @param datafile - JSON string representing the project.
Expand Down Expand Up @@ -163,6 +242,21 @@ def initialize(
flush_interval: event_processor_options[:flush_interval] || BatchEventProcessor::DEFAULT_BATCH_INTERVAL
)
end

# Set up finalizer to ensure cleanup if close() is not called explicitly
ObjectSpace.define_finalizer(self, self.class.create_finalizer(@config_manager, @event_processor, @odp_manager))
end

# Create finalizer proc to clean up background threads
# This ensures cleanup even if close() is not explicitly called
def self.create_finalizer(config_manager, event_processor, odp_manager)
proc do
config_manager.stop! if config_manager.respond_to?(:stop!)
event_processor.stop! if event_processor.respond_to?(:stop!)
odp_manager.stop! if odp_manager.respond_to?(:stop!)
rescue
Copy link
Preview

Copilot AI Aug 22, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bare rescue clause catches all exceptions including SystemExit and Interrupt. Consider rescuing StandardError instead to avoid catching system-level exceptions.

Suggested change
rescue
rescue StandardError

Copilot uses AI. Check for mistakes.

# Suppress errors during finalization to avoid issues during GC
end
end

# Create a context of the user for which decision APIs will be called.
Expand Down Expand Up @@ -936,6 +1030,14 @@ def close
@config_manager.stop! if @config_manager.respond_to?(:stop!)
@event_processor.stop! if @event_processor.respond_to?(:stop!)
@odp_manager.stop!

# Remove this instance from the cache if it exists
# Note: we don't synchronize here to avoid deadlock when called from clear_instance_cache!
self.class.send(:remove_from_cache_unsafe, self)
end

private_class_method def self.remove_from_cache_unsafe(instance)
instance_cache.delete_if { |_key, cached_instance| cached_instance == instance }
end

def get_optimizely_config
Expand Down
Loading