diff --git a/.github/copilot-instructions.md b/.github/copilot-instructions.md new file mode 100644 index 00000000..37735bf4 --- /dev/null +++ b/.github/copilot-instructions.md @@ -0,0 +1,126 @@ +# Optimizely Ruby SDK - AI Coding Agent Instructions + +## Architecture Overview + +This is the **Optimizely Ruby SDK** for A/B testing and feature flag management. Key architectural patterns: + +### Core Components +- **`Optimizely::Project`** (`lib/optimizely.rb`): Main client class, entry point for all SDK operations +- **`DecisionService`** (`lib/optimizely/decision_service.rb`): Core bucketing logic with 7-step decision flow (status→forced→whitelist→sticky→audience→CMAB→hash) +- **`ProjectConfig`** (`lib/optimizely/config/`): Manages datafile parsing and experiment/feature configuration +- **Config Managers** (`lib/optimizely/config_manager/`): Handle datafile fetching - `HTTPProjectConfigManager` for polling, `StaticProjectConfigManager` for static files + +### Data Flow Patterns +- **User Context**: `OptimizelyUserContext` wraps user ID + attributes for decision APIs +- **Decision Pipeline**: All feature flags/experiments flow through `DecisionService.get_variation_for_feature()` +- **Event Processing**: `BatchEventProcessor` queues and batches impression/conversion events +- **CMAB Integration**: Contextual Multi-Armed Bandit system in `lib/optimizely/cmab/` for advanced optimization + +## Development Workflows + +### Testing +```bash +# Run all tests +bundle exec rake spec + +# Run specific test file +bundle exec rspec spec/decision_service_spec.rb + +# Run linting +bundle exec rubocop +``` + +### Build & Gem Management +```bash +# Build gem locally +rake build # outputs to /pkg/ + +# Run benchmarks +rake benchmark +``` + +## Project-Specific Conventions + +### Ruby Patterns +- **Frozen string literals**: All files start with `# frozen_string_literal: true` +- **Module namespacing**: Everything under `Optimizely::` module +- **Struct for data**: Use `Struct.new()` for decision results (e.g., `DecisionService::Decision`) +- **Factory pattern**: `OptimizelyFactory` provides preset SDK configurations + +### Naming Conventions +- **Experiment/Feature keys**: String identifiers from Optimizely dashboard +- **User bucketing**: `bucketing_id` vs `user_id` (can be different for sticky bucketing) +- **Decision sources**: `'EXPERIMENT'`, `'FEATURE_TEST'`, `'ROLLOUT'` constants in `DecisionService` + +### Error Handling +- **Graceful degradation**: Invalid configs return `nil`/default values, never crash +- **Logging levels**: Use `Logger::ERROR`, `Logger::INFO`, `Logger::DEBUG` consistently +- **User input validation**: `Helpers::Validator` validates all public API inputs + +## Critical Integration Points + +### Datafile Management +- **HTTPProjectConfigManager**: Polls Optimizely CDN every 5 minutes by default +- **Notification system**: Subscribe to `OPTIMIZELY_CONFIG_UPDATE` for datafile changes +- **ODP integration**: On datafile update, call `update_odp_config_on_datafile_update()` + +### Event Architecture +- **BatchEventProcessor**: Default 10 events/batch, 30s flush interval +- **Thread safety**: SDK spawns background threads for polling and event processing +- **Graceful shutdown**: Always call `optimizely.close()` to flush events + +### Feature Flag Decision API +- **New pattern**: Use `decide()` API, not legacy `is_feature_enabled()` +- **Decision options**: `OptimizelyDecideOption` constants control behavior (exclude variables, include reasons, etc.) +- **Forced decisions**: Override bucketing via `set_forced_variation()` or user profile service + +## Test Patterns +- **Spec location**: One spec file per class in `/spec/` directory +- **WebMock**: Mock HTTP requests in tests (`require 'webmock/rspec'`) +- **Test data**: Use `spec_params.rb` for shared test constants +- **Coverage**: Coveralls integration via GitHub Actions + +## Common Gotchas +- **Thread spawning**: Initialize SDK after forking processes in web servers +- **Bucketing consistency**: Use same `bucketing_id` across SDK calls for sticky behavior +- **CMAB caching**: Context-aware ML decisions cache by user - use appropriate cache invalidation options +- **ODP events**: Require at least one identifier, auto-add default event data + +## Critical Memory Leak Prevention + +### **NEVER** Initialize Project Per Request +```ruby +# ❌ MEMORY LEAK - Creates new threads every request +def get_optimizely_client + Optimizely::Project.new(datafile: fetch_datafile) +end + +# ✅ CORRECT - Singleton pattern with proper cleanup +class OptimizelyService + def self.instance + @instance ||= Optimizely::Project.new(datafile: fetch_datafile) + end + + def self.reset! + @instance&.close # Critical: stops background threads + @instance = nil + end +end +``` + +### Thread Management +- **Background threads**: `Project.new()` spawns multiple threads (`BatchEventProcessor`, `OdpEventManager`, config polling) +- **Memory accumulation**: Each initialization creates new threads that persist until explicitly stopped +- **Proper cleanup**: Always call `optimizely_instance.close()` before dereferencing +- **Rails deployment**: Use singleton pattern or application-level initialization, never per-request + +### Production Patterns +```ruby +# Application initialization (config/initializers/optimizely.rb) +OPTIMIZELY_CLIENT = Optimizely::Project.new( + sdk_key: ENV['OPTIMIZELY_SDK_KEY'] +) + +# Graceful shutdown (config/application.rb) +at_exit { OPTIMIZELY_CLIENT&.close } +``` diff --git a/lib/optimizely.rb b/lib/optimizely.rb index aa50ce4e..054bca3d 100644 --- a/lib/optimizely.rb +++ b/lib/optimizely.rb @@ -54,11 +54,90 @@ class Project DEFAULT_CMAB_CACHE_TIMEOUT = (30 * 60 * 1000) DEFAULT_CMAB_CACHE_SIZE = 1000 + # Class-level instance cache to prevent memory leaks from repeated initialization + @instance_cache = {} + @cache_mutex = Mutex.new + + class << self + attr_accessor :instance_cache, :cache_mutex + end + attr_reader :notification_center # @api no-doc attr_reader :config_manager, :decision_service, :error_handler, :event_dispatcher, :event_processor, :logger, :odp_manager, :stopped + # Get or create a cached Project instance for static datafile configurations + # This prevents memory leaks when the same datafile is used repeatedly + # + # @param datafile [String] JSON string representing the project + # @param options - Hash of initialization options (optional) + # @return [Project] Cached or new Project instance + def self.get_or_create_instance(datafile: nil, **options) + # Only cache static datafile configurations (no sdk_key, no custom managers) + return new(datafile: datafile, **options) if should_skip_cache?(datafile, options) + + cache_key = generate_cache_key(datafile, options) + + cache_mutex.synchronize do + # Return existing instance if available and not stopped + return instance_cache[cache_key] if instance_cache[cache_key] && !instance_cache[cache_key].stopped + + # Create new instance and cache it + instance = new(datafile: datafile, **options) + instance_cache[cache_key] = instance + instance + end + end + + # Clear all cached instances and properly close them + def self.clear_instance_cache! + cache_mutex.synchronize do + # First stop all instances without removing them from cache to avoid deadlock + instance_cache.each_value do |instance| + next if instance.stopped + + instance.instance_variable_set(:@stopped, true) + instance.config_manager.stop! if instance.config_manager.respond_to?(:stop!) + instance.event_processor.stop! if instance.event_processor.respond_to?(:stop!) + instance.odp_manager.stop! + end + # Then clear the cache + instance_cache.clear + end + end + + # Get count of cached instances (for testing/monitoring) + def self.cached_instance_count + cache_mutex.synchronize { instance_cache.size } + end + + private_class_method def self.should_skip_cache?(datafile, options) + # Don't cache if using dynamic features that would make sharing unsafe + return true if options[:sdk_key] || + options[:config_manager] || + options[:event_processor] || + options[:user_profile_service] || + datafile.nil? || datafile.empty? + + # Also don't cache if custom loggers or error handlers that might have state + return true if options[:logger] || options[:error_handler] || options[:event_dispatcher] + + false + end + + private_class_method def self.generate_cache_key(datafile, options) + # Create cache key from datafile content and relevant options + require 'digest' + content_hash = Digest::SHA256.hexdigest(datafile) + options_hash = { + skip_json_validation: options[:skip_json_validation], + default_decide_options: options[:default_decide_options]&.sort, + event_processor_options: options[:event_processor_options] + } + "#{content_hash}_#{Digest::SHA256.hexdigest(options_hash.to_s)}" + end + # Constructor for Projects. # # @param datafile - JSON string representing the project. @@ -163,6 +242,21 @@ def initialize( flush_interval: event_processor_options[:flush_interval] || BatchEventProcessor::DEFAULT_BATCH_INTERVAL ) end + + # Set up finalizer to ensure cleanup if close() is not called explicitly + ObjectSpace.define_finalizer(self, self.class.create_finalizer(@config_manager, @event_processor, @odp_manager)) + end + + # Create finalizer proc to clean up background threads + # This ensures cleanup even if close() is not explicitly called + def self.create_finalizer(config_manager, event_processor, odp_manager) + proc do + config_manager.stop! if config_manager.respond_to?(:stop!) + event_processor.stop! if event_processor.respond_to?(:stop!) + odp_manager.stop! if odp_manager.respond_to?(:stop!) + rescue + # Suppress errors during finalization to avoid issues during GC + end end # Create a context of the user for which decision APIs will be called. @@ -936,6 +1030,14 @@ def close @config_manager.stop! if @config_manager.respond_to?(:stop!) @event_processor.stop! if @event_processor.respond_to?(:stop!) @odp_manager.stop! + + # Remove this instance from the cache if it exists + # Note: we don't synchronize here to avoid deadlock when called from clear_instance_cache! + self.class.send(:remove_from_cache_unsafe, self) + end + + private_class_method def self.remove_from_cache_unsafe(instance) + instance_cache.delete_if { |_key, cached_instance| cached_instance == instance } end def get_optimizely_config diff --git a/spec/memory_leak_prevention_spec.rb b/spec/memory_leak_prevention_spec.rb new file mode 100644 index 00000000..f0978f3e --- /dev/null +++ b/spec/memory_leak_prevention_spec.rb @@ -0,0 +1,210 @@ +# frozen_string_literal: true + +require 'spec_helper' + +describe 'Memory Leak Prevention' do + let(:datafile) { '{"version": "4", "experiments": [], "groups": [], "events": [], "featureFlags": []}' } + + before do + # Clean up any existing instances + Optimizely::Project.clear_instance_cache! + end + + after do + # Clean up after each test + Optimizely::Project.clear_instance_cache! + end + + describe 'Thread Creation Prevention' do + it 'should not create new threads when using get_or_create_instance repeatedly' do + initial_thread_count = Thread.list.size + + # Simulate the problematic pattern that was causing memory leaks + # In the real world, this would be called once per request + threads_created = [] + + 10.times do |i| + # Use the safe caching method + optimizely = Optimizely::Project.get_or_create_instance(datafile: datafile) + + # Make a decision to trigger thread creation if any + optimizely.create_user_context("user_#{i}") + + # Track thread count after each creation + threads_created << Thread.list.size + end + + final_thread_count = Thread.list.size + + # Should only have created one cached instance + expect(Optimizely::Project.cached_instance_count).to eq(1) + + # Thread count should not have grown significantly per instance + # Allow for some variance due to initialization of first instance + expect(final_thread_count).to be <= initial_thread_count + 5 + + # Verify that we're not creating more threads with each call + # After the first few calls, thread count should stabilize + stable_count = threads_created[3] + expect(threads_created.last).to eq(stable_count) + end + + it 'demonstrates the memory leak that would occur with repeated Project.new calls' do + instances = [] + + # Simulate the problematic pattern (commented out to avoid actual leak in tests) + # This is what users were doing that caused the memory leak: + 5.times do + # instances << Optimizely::Project.new(datafile: datafile) + # + # Instead, show what happens when we create instances without caching + # and don't clean them up (simulating the leak condition) + instances << Optimizely::Project.new(datafile: datafile) + end + + # Each instance would create its own background threads + # In the real memory leak scenario, these would accumulate indefinitely + expect(instances.size).to eq(5) + expect(instances.uniq.size).to eq(5) # All different instances + + # Clean up instances to prevent actual memory leak in test + instances.each(&:close) + end + end + + describe 'Cache Key Generation' do + it 'should create same cache key for identical configurations' do + instance1 = Optimizely::Project.get_or_create_instance(datafile: datafile) + instance2 = Optimizely::Project.get_or_create_instance(datafile: datafile) + + expect(instance1).to be(instance2) + expect(Optimizely::Project.cached_instance_count).to eq(1) + end + + it 'should create different cache keys for different configurations' do + instance1 = Optimizely::Project.get_or_create_instance( + datafile: datafile, + skip_json_validation: true + ) + instance2 = Optimizely::Project.get_or_create_instance( + datafile: datafile, + skip_json_validation: false + ) + + expect(instance1).not_to be(instance2) + expect(Optimizely::Project.cached_instance_count).to eq(2) + end + end + + describe 'Resource Cleanup' do + it 'should properly stop background threads when instance is closed' do + instance = Optimizely::Project.get_or_create_instance(datafile: datafile) + + # Trigger thread creation by making a decision + instance.create_user_context('test_user') + + expect(instance.stopped).to be_falsy + + instance.close + + expect(instance.stopped).to be_truthy + expect(Optimizely::Project.cached_instance_count).to eq(0) + end + + it 'should cleanup all instances when cache is cleared' do + instance1 = Optimizely::Project.get_or_create_instance(datafile: datafile) + instance2 = Optimizely::Project.get_or_create_instance( + datafile: '{"version": "4", "experiments": [{"id": "test"}], "groups": [], "events": [], "featureFlags": []}' + ) + + expect(Optimizely::Project.cached_instance_count).to eq(2) + expect(instance1.stopped).to be_falsy + expect(instance2.stopped).to be_falsy + + Optimizely::Project.clear_instance_cache! + + expect(Optimizely::Project.cached_instance_count).to eq(0) + expect(instance1.stopped).to be_truthy + expect(instance2.stopped).to be_truthy + end + end + + describe 'Production Usage Patterns' do + it 'should handle Rails-like request pattern efficiently' do + initial_thread_count = Thread.list.size + + # Simulate Rails controller pattern with cached datafile + cached_datafile = datafile + request_results = [] + + # Simulate 50 requests (what would cause significant memory growth before) + 50.times do |request_id| + # This is the safe pattern that should be used in production + optimizely = Optimizely::Project.get_or_create_instance(datafile: cached_datafile) + + # Simulate making decisions in the request + optimizely.create_user_context("user_#{request_id}") + + # Store result (in real app this would be returned to user) + request_results << { + request_id: request_id, + optimizely_instance_id: optimizely.object_id, + thread_count: Thread.list.size + } + end + + # Verify efficiency: + # 1. All requests should use the same instance + unique_instance_ids = request_results.map { |r| r[:optimizely_instance_id] }.uniq + expect(unique_instance_ids.size).to eq(1) + + # 2. Only one instance should be cached + expect(Optimizely::Project.cached_instance_count).to eq(1) + + # 3. Thread count should be stable after initial ramp-up + final_thread_counts = request_results.last(10).map { |r| r[:thread_count] } + expect(final_thread_counts.uniq.size).to be <= 2 # Allow for minimal variance + + # 4. No significant thread growth + final_thread_count = Thread.list.size + expect(final_thread_count).to be <= initial_thread_count + 10 + end + end + + describe 'Memory Safety Guarantees' do + it 'should not cache instances with dynamic configuration' do + # These should not be cached due to having dynamic config + instance_with_sdk_key = Optimizely::Project.get_or_create_instance( + datafile: datafile, + sdk_key: 'test_key' + ) + + instance_with_user_profile = Optimizely::Project.get_or_create_instance( + datafile: datafile, + user_profile_service: double('user_profile_service') + ) + + # Should have 0 cached instances since these shouldn't be cached + expect(Optimizely::Project.cached_instance_count).to eq(0) + + # Clean up the non-cached instances + instance_with_sdk_key.close + instance_with_user_profile.close + end + + it 'should handle finalizer cleanup gracefully' do + # Test that finalizers work when instances are not explicitly closed + Optimizely::Project.get_or_create_instance(datafile: datafile) + + expect(Optimizely::Project.cached_instance_count).to eq(1) + + # Force garbage collection to trigger finalizer + GC.start + + # The finalizer should have been called, but the instance might still be + # in cache until explicitly removed. This tests that the finalizer + # doesn't crash the system. + expect(true).to be_truthy # Just verify we don't crash + end + end +end diff --git a/spec/project_caching_spec.rb b/spec/project_caching_spec.rb new file mode 100644 index 00000000..2c52b019 --- /dev/null +++ b/spec/project_caching_spec.rb @@ -0,0 +1,120 @@ +# frozen_string_literal: true + +require 'spec_helper' + +describe Optimizely::Project do + let(:datafile) { '{"version": "4", "experiments": [], "groups": [], "events": [], "featureFlags": []}' } + let(:different_datafile) { '{"version": "4", "experiments": [{"id": "test"}], "groups": [], "events": [], "featureFlags": []}' } + + describe '.get_or_create_instance' do + after do + # Clean up cache after each test + Optimizely::Project.clear_instance_cache! + end + + it 'returns the same instance for identical datafiles' do + instance1 = Optimizely::Project.get_or_create_instance(datafile: datafile) + instance2 = Optimizely::Project.get_or_create_instance(datafile: datafile) + + expect(instance1).to be(instance2) + expect(Optimizely::Project.cached_instance_count).to eq(1) + end + + it 'returns different instances for different datafiles' do + instance1 = Optimizely::Project.get_or_create_instance(datafile: datafile) + instance2 = Optimizely::Project.get_or_create_instance(datafile: different_datafile) + + expect(instance1).not_to be(instance2) + expect(Optimizely::Project.cached_instance_count).to eq(2) + end + + it 'does not cache instances with sdk_key' do + instance1 = Optimizely::Project.get_or_create_instance(datafile: datafile, sdk_key: 'test_key') + instance2 = Optimizely::Project.get_or_create_instance(datafile: datafile, sdk_key: 'test_key') + + expect(instance1).not_to be(instance2) + expect(Optimizely::Project.cached_instance_count).to eq(0) + end + + it 'does not cache instances with custom config_manager' do + config_manager = double('config_manager') + allow(config_manager).to receive(:config) + allow(config_manager).to receive(:sdk_key) + + instance1 = Optimizely::Project.get_or_create_instance(datafile: datafile, config_manager: config_manager) + instance2 = Optimizely::Project.get_or_create_instance(datafile: datafile, config_manager: config_manager) + + expect(instance1).not_to be(instance2) + expect(Optimizely::Project.cached_instance_count).to eq(0) + end + + it 'removes instances from cache when closed' do + instance = Optimizely::Project.get_or_create_instance(datafile: datafile) + expect(Optimizely::Project.cached_instance_count).to eq(1) + + instance.close + expect(Optimizely::Project.cached_instance_count).to eq(0) + end + + it 'creates new instance if cached instance is stopped' do + instance1 = Optimizely::Project.get_or_create_instance(datafile: datafile) + instance1.close + + instance2 = Optimizely::Project.get_or_create_instance(datafile: datafile) + expect(instance1).not_to be(instance2) + expect(Optimizely::Project.cached_instance_count).to eq(1) + end + + it 'considers different options when caching' do + instance1 = Optimizely::Project.get_or_create_instance( + datafile: datafile, + skip_json_validation: true + ) + instance2 = Optimizely::Project.get_or_create_instance( + datafile: datafile, + skip_json_validation: false + ) + + expect(instance1).not_to be(instance2) + expect(Optimizely::Project.cached_instance_count).to eq(2) + end + end + + describe '.clear_instance_cache!' do + it 'closes all cached instances and clears the cache' do + instance1 = Optimizely::Project.get_or_create_instance(datafile: datafile) + instance2 = Optimizely::Project.get_or_create_instance(datafile: different_datafile) + + expect(Optimizely::Project.cached_instance_count).to eq(2) + expect(instance1.stopped).to be_falsy + expect(instance2.stopped).to be_falsy + + Optimizely::Project.clear_instance_cache! + + expect(Optimizely::Project.cached_instance_count).to eq(0) + expect(instance1.stopped).to be_truthy + expect(instance2.stopped).to be_truthy + end + end + + describe 'memory leak prevention' do + it 'prevents memory leaks from repeated Project creation' do + initial_thread_count = Thread.list.size + + # Simulate the problematic pattern - repeated Project.new calls + 100.times do + # Using get_or_create_instance should not create new threads each time + Optimizely::Project.get_or_create_instance(datafile: datafile) + end + + # Should only have created one cached instance + expect(Optimizely::Project.cached_instance_count).to eq(1) + + # Thread count should not have grown significantly + # (allowing for some variance due to test framework threads) + expect(Thread.list.size).to be < initial_thread_count + 10 + + Optimizely::Project.clear_instance_cache! + end + end +end