Skip to content

Commit 77cb0dc

Browse files
authored
Separate IOSource#ensure_buffer from IOSource#match. (#118)
## Why? It would affect performance to do a read check in `IOSource#match` every time, Separate read processing from `IOSource#ensure_buffer`. Use `IOSource#ensure_buffer` in the following cases where `@source.buffer` is empty. 1. at the start of pull_event 2. If a trailing `'>'` pattern matches, as in `@source.match(/\s*>/um)`. ## Benchmark ``` RUBYLIB= BUNDLER_ORIG_RUBYLIB= /Users/naitoh/.rbenv/versions/3.3.0/bin/ruby -v -S benchmark-driver /Users/naitoh/ghq/github.com/naitoh/rexml/benchmark/parse.yaml ruby 3.3.0 (2023-12-25 revision 5124f9ac75) [arm64-darwin22] Calculating ------------------------------------- before after before(YJIT) after(YJIT) dom 10.278 10.986 16.430 16.941 i/s - 100.000 times in 9.729858s 9.102574s 6.086579s 5.902885s sax 30.166 30.496 49.851 51.596 i/s - 100.000 times in 3.315008s 3.279069s 2.005961s 1.938123s pull 35.459 36.380 60.266 63.134 i/s - 100.000 times in 2.820181s 2.748745s 1.659301s 1.583928s stream 33.762 34.636 55.173 55.859 i/s - 100.000 times in 2.961948s 2.887131s 1.812485s 1.790218s Comparison: dom after(YJIT): 16.9 i/s before(YJIT): 16.4 i/s - 1.03x slower after: 11.0 i/s - 1.54x slower before: 10.3 i/s - 1.65x slower sax after(YJIT): 51.6 i/s before(YJIT): 49.9 i/s - 1.04x slower after: 30.5 i/s - 1.69x slower before: 30.2 i/s - 1.71x slower pull after(YJIT): 63.1 i/s before(YJIT): 60.3 i/s - 1.05x slower after: 36.4 i/s - 1.74x slower before: 35.5 i/s - 1.78x slower stream after(YJIT): 55.9 i/s before(YJIT): 55.2 i/s - 1.01x slower after: 34.6 i/s - 1.61x slower before: 33.8 i/s - 1.65x slower ``` - YJIT=ON : 1.01x - 1.05x faster - YJIT=OFF : 1.01x - 1.06x faster
1 parent d146162 commit 77cb0dc

File tree

2 files changed

+12
-1
lines changed

2 files changed

+12
-1
lines changed

lib/rexml/parsers/baseparser.rb

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -210,6 +210,8 @@ def pull_event
210210
return @stack.shift if @stack.size > 0
211211
#STDERR.puts @source.encoding
212212
#STDERR.puts "BUFFER = #{@source.buffer.inspect}"
213+
214+
@source.ensure_buffer
213215
if @document_status == nil
214216
start_position = @source.position
215217
if @source.match("<?", true)
@@ -236,6 +238,7 @@ def pull_event
236238
elsif @source.match(/\s*>/um, true)
237239
id = [nil, nil, nil]
238240
@document_status = :after_doctype
241+
@source.ensure_buffer
239242
else
240243
id = parse_id(base_error_message,
241244
accept_external_id: true,
@@ -248,6 +251,7 @@ def pull_event
248251
@document_status = :in_doctype
249252
elsif @source.match(/\s*>/um, true)
250253
@document_status = :after_doctype
254+
@source.ensure_buffer
251255
else
252256
message = "#{base_error_message}: garbage after external ID"
253257
raise REXML::ParseException.new(message, @source)
@@ -646,6 +650,7 @@ def parse_attributes(prefixes, curr_ns)
646650
raise REXML::ParseException.new(message, @source)
647651
end
648652
unless scanner.scan(/.*#{Regexp.escape(quote)}/um)
653+
@source.ensure_buffer
649654
match_data = @source.match(/^(.*?)(\/)?>/um, true)
650655
if match_data
651656
scanner << "/" if closed

lib/rexml/source.rb

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -68,6 +68,9 @@ def encoding=(enc)
6868
def read
6969
end
7070

71+
def ensure_buffer
72+
end
73+
7174
def match(pattern, cons=false)
7275
if cons
7376
@scanner.scan(pattern).nil? ? nil : @scanner
@@ -165,11 +168,14 @@ def read
165168
end
166169
end
167170

171+
def ensure_buffer
172+
read if @scanner.eos? && @source
173+
end
174+
168175
# Note: When specifying a string for 'pattern', it must not include '>' except in the following formats:
169176
# - ">"
170177
# - "XXX>" (X is any string excluding '>')
171178
def match( pattern, cons=false )
172-
read if @scanner.eos? && @source
173179
while true
174180
if cons
175181
md = @scanner.scan(pattern)

0 commit comments

Comments
 (0)