-
Notifications
You must be signed in to change notification settings - Fork 461
feat(vllm): add vLLM integration #14732
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
|
Bootstrap import analysisComparison of import times between this PR and base. SummaryThe average import time from this PR is: 236 ± 1 ms. The average import time from base is: 240 ± 3 ms. The import time difference between this PR and base is: -3.9 ± 0.1 ms. Import time breakdownThe following import paths have shrunk:
|
Performance SLOsComparing candidate alex/feat/vllm (3db04ff) with baseline main (57b137d) 📈 Performance Regressions (3 suites)📈 iast_aspects - 40/40✅ re_expand_aspectTime: ✅ 32.417µs (SLO: <40.000µs 📉 -19.0%) vs baseline: +2.1% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.7% ✅ re_expand_noaspectTime: ✅ 28.966µs (SLO: <40.000µs 📉 -27.6%) vs baseline: +1.4% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.8% ✅ re_findall_aspectTime: ✅ 2.901µs (SLO: <10.000µs 📉 -71.0%) vs baseline: -0.9% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.9% ✅ re_findall_noaspectTime: ✅ 1.412µs (SLO: <10.000µs 📉 -85.9%) vs baseline: -1.3% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ re_finditer_aspectTime: ✅ 4.408µs (SLO: <10.000µs 📉 -55.9%) vs baseline: -0.6% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ re_finditer_noaspectTime: ✅ 1.415µs (SLO: <10.000µs 📉 -85.8%) vs baseline: -0.6% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +5.0% ✅ re_fullmatch_aspectTime: ✅ 2.661µs (SLO: <10.000µs 📉 -73.4%) vs baseline: -0.8% Memory: ✅ 37.749MB (SLO: <39.000MB -3.2%) vs baseline: +5.0% ✅ re_fullmatch_noaspectTime: ✅ 1.295µs (SLO: <10.000µs 📉 -87.1%) vs baseline: -0.1% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ re_group_aspectTime: ✅ 3.137µs (SLO: <10.000µs 📉 -68.6%) vs baseline: +6.7% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.8% ✅ re_group_noaspectTime: ✅ 1.607µs (SLO: <10.000µs 📉 -83.9%) vs baseline: +0.2% Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +4.7% ✅ re_groups_aspectTime: ✅ 3.283µs (SLO: <10.000µs 📉 -67.2%) vs baseline: +6.6% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ re_groups_noaspectTime: ✅ 1.690µs (SLO: <10.000µs 📉 -83.1%) vs baseline: -0.4% Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +4.6% ✅ re_match_aspectTime: ✅ 3.199µs (SLO: <10.000µs 📉 -68.0%) vs baseline: 📈 +18.1% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +5.0% ✅ re_match_noaspectTime: ✅ 1.303µs (SLO: <10.000µs 📉 -87.0%) vs baseline: -0.1% Memory: ✅ 37.591MB (SLO: <39.000MB -3.6%) vs baseline: +4.5% ✅ re_search_aspectTime: ✅ 2.552µs (SLO: <10.000µs 📉 -74.5%) vs baseline: -0.1% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ re_search_noaspectTime: ✅ 1.203µs (SLO: <10.000µs 📉 -88.0%) vs baseline: ~same Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.7% ✅ re_sub_aspectTime: ✅ 3.572µs (SLO: <10.000µs 📉 -64.3%) vs baseline: +4.8% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.6% ✅ re_sub_noaspectTime: ✅ 1.539µs (SLO: <10.000µs 📉 -84.6%) vs baseline: -0.5% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ re_subn_aspectTime: ✅ 3.683µs (SLO: <10.000µs 📉 -63.2%) vs baseline: +0.3% Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +5.2% ✅ re_subn_noaspectTime: ✅ 1.616µs (SLO: <10.000µs 📉 -83.8%) vs baseline: +0.3% Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +5.0% 📈 iastaspects - 118/118✅ add_aspectTime: ✅ 0.404µs (SLO: <10.000µs 📉 -96.0%) vs baseline: -0.8% Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +4.0% ✅ add_inplace_aspectTime: ✅ 0.408µs (SLO: <10.000µs 📉 -95.9%) vs baseline: +0.5% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +3.7% ✅ add_inplace_noaspectTime: ✅ 0.318µs (SLO: <10.000µs 📉 -96.8%) vs baseline: -0.6% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +3.9% ✅ add_noaspectTime: ✅ 0.278µs (SLO: <10.000µs 📉 -97.2%) vs baseline: -0.8% Memory: ✅ 37.572MB (SLO: <39.000MB -3.7%) vs baseline: +4.6% ✅ bytearray_aspectTime: ✅ 1.358µs (SLO: <10.000µs 📉 -86.4%) vs baseline: +2.1% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +3.8% ✅ bytearray_extend_aspectTime: ✅ 1.517µs (SLO: <10.000µs 📉 -84.8%) vs baseline: +0.6% Memory: ✅ 37.788MB (SLO: <39.000MB -3.1%) vs baseline: +4.0% ✅ bytearray_extend_noaspectTime: ✅ 0.616µs (SLO: <10.000µs 📉 -93.8%) vs baseline: +0.4% Memory: ✅ 38.004MB (SLO: <39.000MB -2.6%) vs baseline: +5.1% ✅ bytearray_noaspectTime: ✅ 0.484µs (SLO: <10.000µs 📉 -95.2%) vs baseline: +0.6% Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +3.5% ✅ bytes_aspectTime: ✅ 1.521µs (SLO: <10.000µs 📉 -84.8%) vs baseline: 📈 +17.2% Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +3.9% ✅ bytes_noaspectTime: ✅ 0.494µs (SLO: <10.000µs 📉 -95.1%) vs baseline: +1.4% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +3.9% ✅ bytesio_aspectTime: ✅ 1.372µs (SLO: <10.000µs 📉 -86.3%) vs baseline: ~same Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.0% ✅ bytesio_noaspectTime: ✅ 0.502µs (SLO: <10.000µs 📉 -95.0%) vs baseline: +0.3% Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +3.8% ✅ capitalize_aspectTime: ✅ 0.732µs (SLO: <10.000µs 📉 -92.7%) vs baseline: -1.3% Memory: ✅ 37.847MB (SLO: <39.000MB -3.0%) vs baseline: +4.1% ✅ capitalize_noaspectTime: ✅ 0.435µs (SLO: <10.000µs 📉 -95.7%) vs baseline: +0.4% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +3.9% ✅ casefold_aspectTime: ✅ 0.734µs (SLO: <10.000µs 📉 -92.7%) vs baseline: -1.1% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +3.7% ✅ casefold_noaspectTime: ✅ 0.372µs (SLO: <10.000µs 📉 -96.3%) vs baseline: +1.0% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +3.8% ✅ decode_aspectTime: ✅ 0.721µs (SLO: <10.000µs 📉 -92.8%) vs baseline: -0.4% Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +3.8% ✅ decode_noaspectTime: ✅ 0.423µs (SLO: <10.000µs 📉 -95.8%) vs baseline: +1.6% Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +3.8% ✅ encode_aspectTime: ✅ 0.712µs (SLO: <10.000µs 📉 -92.9%) vs baseline: +0.7% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +3.7% ✅ encode_noaspectTime: ✅ 0.403µs (SLO: <10.000µs 📉 -96.0%) vs baseline: ~same Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +3.8% ✅ format_aspectTime: ✅ 3.366µs (SLO: <10.000µs 📉 -66.3%) vs baseline: +0.3% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.2% ✅ format_map_aspectTime: ✅ 3.609µs (SLO: <10.000µs 📉 -63.9%) vs baseline: -0.8% Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +3.5% ✅ format_map_noaspectTime: ✅ 0.781µs (SLO: <10.000µs 📉 -92.2%) vs baseline: +1.0% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +3.7% ✅ format_noaspectTime: ✅ 0.594µs (SLO: <10.000µs 📉 -94.1%) vs baseline: -0.5% Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +4.0% ✅ index_aspectTime: ✅ 0.361µs (SLO: <10.000µs 📉 -96.4%) vs baseline: +1.5% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.0% ✅ index_noaspectTime: ✅ 0.278µs (SLO: <10.000µs 📉 -97.2%) vs baseline: ~same Memory: ✅ 37.591MB (SLO: <39.000MB -3.6%) vs baseline: +3.6% ✅ join_aspectTime: ✅ 1.370µs (SLO: <10.000µs 📉 -86.3%) vs baseline: -0.7% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +3.7% ✅ join_noaspectTime: ✅ 0.492µs (SLO: <10.000µs 📉 -95.1%) vs baseline: +0.5% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.7% ✅ ljust_aspectTime: ✅ 2.562µs (SLO: <20.000µs 📉 -87.2%) vs baseline: +2.0% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +3.8% ✅ ljust_noaspectTime: ✅ 0.413µs (SLO: <10.000µs 📉 -95.9%) vs baseline: +2.5% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +3.6% ✅ lower_aspectTime: ✅ 2.187µs (SLO: <10.000µs 📉 -78.1%) vs baseline: -0.4% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +3.7% ✅ lower_noaspectTime: ✅ 0.369µs (SLO: <10.000µs 📉 -96.3%) vs baseline: -0.6% Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +4.2% ✅ lstrip_aspectTime: ✅ 2.224µs (SLO: <20.000µs 📉 -88.9%) vs baseline: +0.7% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +3.9% ✅ lstrip_noaspectTime: ✅ 0.381µs (SLO: <10.000µs 📉 -96.2%) vs baseline: -0.5% Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +3.9% ✅ modulo_aspectTime: ✅ 1.004µs (SLO: <10.000µs 📉 -90.0%) vs baseline: +0.7% Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +3.6% ✅ modulo_aspect_for_bytearray_bytearrayTime: ✅ 1.537µs (SLO: <10.000µs 📉 -84.6%) vs baseline: ~same Memory: ✅ 38.004MB (SLO: <39.000MB -2.6%) vs baseline: +4.8% ✅ modulo_aspect_for_bytesTime: ✅ 0.986µs (SLO: <10.000µs 📉 -90.1%) vs baseline: ~same Memory: ✅ 37.768MB (SLO: <39.000MB -3.2%) vs baseline: +4.2% ✅ modulo_aspect_for_bytes_bytearrayTime: ✅ 1.239µs (SLO: <10.000µs 📉 -87.6%) vs baseline: +0.4% Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +4.6% ✅ modulo_noaspectTime: ✅ 0.629µs (SLO: <10.000µs 📉 -93.7%) vs baseline: ~same Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +3.8% ✅ replace_aspectTime: ✅ 4.876µs (SLO: <10.000µs 📉 -51.2%) vs baseline: +0.7% Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +3.7% ✅ replace_noaspectTime: ✅ 0.461µs (SLO: <10.000µs 📉 -95.4%) vs baseline: ~same Memory: ✅ 37.749MB (SLO: <39.000MB -3.2%) vs baseline: +4.0% ✅ repr_aspectTime: ✅ 0.910µs (SLO: <10.000µs 📉 -90.9%) vs baseline: ~same Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.0% ✅ repr_noaspectTime: ✅ 0.422µs (SLO: <10.000µs 📉 -95.8%) vs baseline: +0.7% Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +5.0% ✅ rstrip_aspectTime: ✅ 1.926µs (SLO: <20.000µs 📉 -90.4%) vs baseline: +0.5% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +3.8% ✅ rstrip_noaspectTime: ✅ 0.380µs (SLO: <10.000µs 📉 -96.2%) vs baseline: +0.7% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +3.8% ✅ slice_aspectTime: ✅ 0.497µs (SLO: <10.000µs 📉 -95.0%) vs baseline: -0.3% Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +4.0% ✅ slice_noaspectTime: ✅ 0.447µs (SLO: <10.000µs 📉 -95.5%) vs baseline: ~same Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +3.9% ✅ stringio_aspectTime: ✅ 1.573µs (SLO: <10.000µs 📉 -84.3%) vs baseline: +0.6% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +3.6% ✅ stringio_noaspectTime: ✅ 0.730µs (SLO: <10.000µs 📉 -92.7%) vs baseline: +1.0% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +3.7% ✅ strip_aspectTime: ✅ 2.204µs (SLO: <20.000µs 📉 -89.0%) vs baseline: +0.3% Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +3.7% ✅ strip_noaspectTime: ✅ 0.387µs (SLO: <10.000µs 📉 -96.1%) vs baseline: +1.8% Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +3.8% ✅ swapcase_aspectTime: ✅ 2.415µs (SLO: <10.000µs 📉 -75.8%) vs baseline: ~same Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +3.8% ✅ swapcase_noaspectTime: ✅ 0.535µs (SLO: <10.000µs 📉 -94.7%) vs baseline: -1.3% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +3.9% ✅ title_aspectTime: ✅ 2.322µs (SLO: <10.000µs 📉 -76.8%) vs baseline: -0.6% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +3.7% ✅ title_noaspectTime: ✅ 0.503µs (SLO: <10.000µs 📉 -95.0%) vs baseline: +0.2% Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +3.8% ✅ translate_aspectTime: ✅ 3.249µs (SLO: <10.000µs 📉 -67.5%) vs baseline: +0.9% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +3.7% ✅ translate_noaspectTime: ✅ 1.041µs (SLO: <10.000µs 📉 -89.6%) vs baseline: +0.6% Memory: ✅ 37.591MB (SLO: <39.000MB -3.6%) vs baseline: +3.6% ✅ upper_aspectTime: ✅ 2.201µs (SLO: <10.000µs 📉 -78.0%) vs baseline: +0.4% Memory: ✅ 37.709MB (SLO: <39.000MB -3.3%) vs baseline: +3.8% ✅ upper_noaspectTime: ✅ 0.372µs (SLO: <10.000µs 📉 -96.3%) vs baseline: -0.3% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +3.7% 📈 iastaspectsospath - 24/24✅ ospathbasename_aspectTime: ✅ 4.189µs (SLO: <10.000µs 📉 -58.1%) vs baseline: +1.1% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.7% ✅ ospathbasename_noaspectTime: ✅ 1.072µs (SLO: <10.000µs 📉 -89.3%) vs baseline: ~same Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.9% ✅ ospathjoin_aspectTime: ✅ 6.124µs (SLO: <10.000µs 📉 -38.8%) vs baseline: +0.7% Memory: ✅ 37.591MB (SLO: <39.000MB -3.6%) vs baseline: +4.5% ✅ ospathjoin_noaspectTime: ✅ 2.289µs (SLO: <10.000µs 📉 -77.1%) vs baseline: -0.8% Memory: ✅ 37.729MB (SLO: <39.000MB -3.3%) vs baseline: +5.3% ✅ ospathnormcase_aspectTime: ✅ 3.418µs (SLO: <10.000µs 📉 -65.8%) vs baseline: -1.5% Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +4.6% ✅ ospathnormcase_noaspectTime: ✅ 0.569µs (SLO: <10.000µs 📉 -94.3%) vs baseline: +0.6% Memory: ✅ 37.650MB (SLO: <39.000MB -3.5%) vs baseline: +4.7% ✅ ospathsplit_aspectTime: ✅ 4.718µs (SLO: <10.000µs 📉 -52.8%) vs baseline: -0.5% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ ospathsplit_noaspectTime: ✅ 1.587µs (SLO: <10.000µs 📉 -84.1%) vs baseline: +0.4% Memory: ✅ 37.611MB (SLO: <39.000MB -3.6%) vs baseline: +4.7% ✅ ospathsplitdrive_aspectTime: ✅ 3.625µs (SLO: <10.000µs 📉 -63.8%) vs baseline: -0.5% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.9% ✅ ospathsplitdrive_noaspectTime: ✅ 0.692µs (SLO: <10.000µs 📉 -93.1%) vs baseline: +0.5% Memory: ✅ 37.670MB (SLO: <39.000MB -3.4%) vs baseline: +4.7% ✅ ospathsplitext_aspectTime: ✅ 5.142µs (SLO: <10.000µs 📉 -48.6%) vs baseline: 📈 +14.5% Memory: ✅ 37.690MB (SLO: <39.000MB -3.4%) vs baseline: +4.8% ✅ ospathsplitext_noaspectTime: ✅ 1.378µs (SLO: <10.000µs 📉 -86.2%) vs baseline: -1.0% Memory: ✅ 37.631MB (SLO: <39.000MB -3.5%) vs baseline: +4.9% 🟡 Near SLO Breach (4 suites)🟡 djangosimple - 30/30✅ appsecTime: ✅ 20.478ms (SLO: <22.300ms -8.2%) vs baseline: -0.1% Memory: ✅ 65.333MB (SLO: <67.000MB -2.5%) vs baseline: +4.5% ✅ exception-replay-enabledTime: ✅ 1.351ms (SLO: <1.450ms -6.8%) vs baseline: -0.7% Memory: ✅ 64.489MB (SLO: <67.000MB -3.7%) vs baseline: +4.9% ✅ iastTime: ✅ 20.563ms (SLO: <22.250ms -7.6%) vs baseline: +0.2% Memory: ✅ 65.274MB (SLO: <67.000MB -2.6%) vs baseline: +4.4% ✅ profilerTime: ✅ 15.266ms (SLO: <16.550ms -7.8%) vs baseline: -0.2% Memory: ✅ 53.669MB (SLO: <54.500MB 🟡 -1.5%) vs baseline: +4.6% ✅ resource-renamingTime: ✅ 20.623ms (SLO: <21.750ms -5.2%) vs baseline: ~same Memory: ✅ 65.477MB (SLO: <67.000MB -2.3%) vs baseline: +4.9% ✅ span-code-originTime: ✅ 26.213ms (SLO: <28.200ms -7.0%) vs baseline: ~same Memory: ✅ 67.541MB (SLO: <69.500MB -2.8%) vs baseline: +4.9% ✅ tracerTime: ✅ 20.488ms (SLO: <21.750ms -5.8%) vs baseline: ~same Memory: ✅ 65.303MB (SLO: <67.000MB -2.5%) vs baseline: +4.6% ✅ tracer-and-profilerTime: ✅ 22.027ms (SLO: <23.500ms -6.3%) vs baseline: -0.2% Memory: ✅ 66.689MB (SLO: <67.500MB 🟡 -1.2%) vs baseline: +5.0% ✅ tracer-dont-create-db-spansTime: ✅ 19.325ms (SLO: <21.500ms 📉 -10.1%) vs baseline: -0.5% Memory: ✅ 65.327MB (SLO: <66.000MB 🟡 -1.0%) vs baseline: +4.6% ✅ tracer-minimalTime: ✅ 16.603ms (SLO: <17.500ms -5.1%) vs baseline: -0.4% Memory: ✅ 65.352MB (SLO: <66.000MB 🟡 -1.0%) vs baseline: +4.7% ✅ tracer-nativeTime: ✅ 20.458ms (SLO: <21.750ms -5.9%) vs baseline: -0.3% Memory: ✅ 71.347MB (SLO: <72.500MB 🟡 -1.6%) vs baseline: +4.8% ✅ tracer-no-cachesTime: ✅ 18.479ms (SLO: <19.650ms -6.0%) vs baseline: +0.2% Memory: ✅ 65.284MB (SLO: <67.000MB -2.6%) vs baseline: +4.5% ✅ tracer-no-databasesTime: ✅ 18.841ms (SLO: <20.100ms -6.3%) vs baseline: ~same Memory: ✅ 65.274MB (SLO: <67.000MB -2.6%) vs baseline: +4.8% ✅ tracer-no-middlewareTime: ✅ 20.225ms (SLO: <21.500ms -5.9%) vs baseline: +0.4% Memory: ✅ 65.287MB (SLO: <67.000MB -2.6%) vs baseline: +4.6% ✅ tracer-no-templatesTime: ✅ 20.377ms (SLO: <22.000ms -7.4%) vs baseline: +0.3% Memory: ✅ 65.323MB (SLO: <67.000MB -2.5%) vs baseline: +4.6% 🟡 errortrackingdjangosimple - 6/6✅ errortracking-enabled-allTime: ✅ 18.025ms (SLO: <19.850ms -9.2%) vs baseline: -0.4% Memory: ✅ 65.330MB (SLO: <66.500MB 🟡 -1.8%) vs baseline: +4.9% ✅ errortracking-enabled-userTime: ✅ 18.389ms (SLO: <19.400ms -5.2%) vs baseline: +1.4% Memory: ✅ 65.294MB (SLO: <66.500MB 🟡 -1.8%) vs baseline: +4.8% ✅ tracer-enabledTime: ✅ 18.030ms (SLO: <19.450ms -7.3%) vs baseline: ~same Memory: ✅ 65.333MB (SLO: <66.500MB 🟡 -1.8%) vs baseline: +4.9% 🟡 flasksimple - 18/18✅ appsec-getTime: ✅ 4.587ms (SLO: <4.750ms -3.4%) vs baseline: +0.3% Memory: ✅ 61.991MB (SLO: <65.000MB -4.6%) vs baseline: +4.9% ✅ appsec-postTime: ✅ 6.565ms (SLO: <6.750ms -2.7%) vs baseline: -0.3% Memory: ✅ 61.853MB (SLO: <65.000MB -4.8%) vs baseline: +4.6% ✅ appsec-telemetryTime: ✅ 4.576ms (SLO: <4.750ms -3.7%) vs baseline: -0.5% Memory: ✅ 62.049MB (SLO: <65.000MB -4.5%) vs baseline: +5.1% ✅ debuggerTime: ✅ 1.855ms (SLO: <2.000ms -7.2%) vs baseline: -0.2% Memory: ✅ 45.338MB (SLO: <47.000MB -3.5%) vs baseline: +4.7% ✅ iast-getTime: ✅ 1.863ms (SLO: <2.000ms -6.9%) vs baseline: ~same Memory: ✅ 42.369MB (SLO: <49.000MB 📉 -13.5%) vs baseline: +5.0% ✅ profilerTime: ✅ 1.915ms (SLO: <2.100ms -8.8%) vs baseline: +0.2% Memory: ✅ 46.419MB (SLO: <47.000MB 🟡 -1.2%) vs baseline: +4.6% ✅ resource-renamingTime: ✅ 3.379ms (SLO: <3.650ms -7.4%) vs baseline: -0.2% Memory: ✅ 52.219MB (SLO: <53.500MB -2.4%) vs baseline: +4.9% ✅ tracerTime: ✅ 3.372ms (SLO: <3.650ms -7.6%) vs baseline: -0.2% Memory: ✅ 52.258MB (SLO: <53.500MB -2.3%) vs baseline: +4.9% ✅ tracer-nativeTime: ✅ 3.367ms (SLO: <3.650ms -7.8%) vs baseline: -0.2% Memory: ✅ 58.233MB (SLO: <60.000MB -2.9%) vs baseline: +4.7% 🟡 otelspan - 22/22✅ add-eventTime: ✅ 45.216ms (SLO: <47.150ms -4.1%) vs baseline: -0.4% Memory: ✅ 45.229MB (SLO: <47.000MB -3.8%) vs baseline: +4.7% ✅ add-metricsTime: ✅ 319.113ms (SLO: <344.800ms -7.4%) vs baseline: -0.2% Memory: ✅ 551.838MB (SLO: <562.000MB 🟡 -1.8%) vs baseline: +4.7% ✅ add-tagsTime: ✅ 290.429ms (SLO: <314.000ms -7.5%) vs baseline: ~same Memory: ✅ 554.144MB (SLO: <563.500MB 🟡 -1.7%) vs baseline: +4.8% ✅ get-contextTime: ✅ 83.893ms (SLO: <92.350ms -9.2%) vs baseline: ~same Memory: ✅ 40.366MB (SLO: <46.500MB 📉 -13.2%) vs baseline: +4.9% ✅ is-recordingTime: ✅ 42.892ms (SLO: <44.500ms -3.6%) vs baseline: -0.2% Memory: ✅ 44.596MB (SLO: <47.500MB -6.1%) vs baseline: +4.8% ✅ record-exceptionTime: ✅ 61.781ms (SLO: <67.650ms -8.7%) vs baseline: ~same Memory: ✅ 40.627MB (SLO: <47.000MB 📉 -13.6%) vs baseline: +4.7% ✅ set-statusTime: ✅ 48.822ms (SLO: <50.400ms -3.1%) vs baseline: +0.1% Memory: ✅ 44.620MB (SLO: <47.000MB -5.1%) vs baseline: +4.8% ✅ startTime: ✅ 42.337ms (SLO: <43.450ms -2.6%) vs baseline: +0.2% Memory: ✅ 44.646MB (SLO: <47.000MB -5.0%) vs baseline: +5.0% ✅ start-finishTime: ✅ 84.983ms (SLO: <88.000ms -3.4%) vs baseline: +0.3% Memory: ✅ 34.603MB (SLO: <46.500MB 📉 -25.6%) vs baseline: +4.9% ✅ start-finish-telemetryTime: ✅ 86.659ms (SLO: <89.000ms -2.6%) vs baseline: +0.2% Memory: ✅ 34.564MB (SLO: <46.500MB 📉 -25.7%) vs baseline: +4.7% ✅ update-nameTime: ✅ 44.241ms (SLO: <45.150ms -2.0%) vs baseline: +0.5% Memory: ✅ 44.955MB (SLO: <47.000MB -4.4%) vs baseline: +4.8%
|
bf30414
to
0af046e
Compare
5627244
to
494f936
Compare
d970650
to
2c22b68
Compare
@PROFeNoM probably worth updating the codeowners file as well to make llmobs the owner of this integration, will help require less people to review it (after the codeowners change is merged) |
23026f8
to
e64073f
Compare
e64073f
to
46d0ac8
Compare
46d0ac8
to
3db04ff
Compare
Description
This PR adds a new Datadog tracing integration for vLLM, targeting the V1 engine exclusively. V0 is deprecated and being removed from vLLM (see vLLM Q3 2025 Roadmap), so we're building for the future.
Request Flow and Instrumentation Points
The integration traces requests at the engine level rather than wrapping high-level APIs. This gives us a single integration point for all operations (completion, chat, embedding, classification) with complete access to internal engine metadata and enables profiling the engine process.
Here's how a request flows through vLLM V1 and where we instrument:
1. Engine Initialization (once per engine)
2. Request Submission (per request)
3. Output Processing (when request finishes)
The key insight is that
OutputProcessor.process_outputs
has everything we need in one place: request metadata fromreq_state
, output data fromengine_core_output
, and parent context fromtrace_headers
. We wrap three specific points because each serves a distinct purpose:__init__
for setup,process_inputs
for context injection, andprocess_outputs
for span creation.Version Support
This integration requires vLLM >= 0.10.2 for V1 engine support. Version 0.10.2 includes vLLM PR #20372 which added the
trace_headers
parameter that we rely on for trace context propagation through the engine.We don't support V0 at all. It's deprecated and being removed from vLLM. Even if we had supported v0.10.1 and earlier, we'd have to drop it in the next major tracer release anyway, so there's no point building and maintaining for a dead engine.
The integration includes a version check that gracefully skips instrumentation on older versions with a warning log, just in case a customer uses vLLM <= 0.10.1. The instrumentation would otherwise make their application raise an error because of the trace header injection.
Metadata Captured
The following metadata is captured:
For chat requests where vLLM doesn't preserve the prompt string (only token IDs), we decode the token IDs back to text using the model's tokenizer to ensure
input_messages
are correctly captured.Testing
Tests run on GPU hardware using the new
gpu:a10-amd64
runner tag in GitLab CI (internal docs: GPU Runners). These cannot be run locally on our Macs. We need actual GPU hardware. During dev and testing, I ssh'ed into ag6.8xlarge
EC2 instance.Tests:
The tests converge on the same instrumentation points (as shown in the request flow), so while we could add more operation combinations, the current coverage should be solid for a first release.
Test infrastructure notes:
Runners take ~5-10 minutes to start on the CI, making test iterations slow. I've added module-scoped fixtures cache LLM instances to reduce overall test time; however, caching adds memory pressure, hence, I increased Kubernetes memory allocation to 12 Gi to handle it
On the EC2 engine, tests run in ~1 mn.
Risks
V1 maturity: V1 is production-ready for most workloads but still evolving toward vLLM 1.0. The engine architecture is stabilizing, but future V1 changes may require integration updates. Our instrumentation points (
process_inputs
andprocess_outputs
) are core to V1's design and unlikely to change significantly.No V0 support: Customers still on V0 won't get tracing. However, V0 is deprecated and most production deployments have already migrated (V0 doesn't even support pooling models anymore).
Version requirement: Requiring 0.10.2+ may exclude some users, but 0.10.2 is the current latest release and the trace header propagation mechanism is essential to a simple, maintainable design tbh.
High span burst at startup in RAG scenarios: RAG applications that process large document collections can generate significant span volumes during initial indexing. For example, indexing 1000 document chunks creates 1000
vllm.request
embedding spans. This is expected behavior (each embedding request to the engine is traced), but may impact:We could add an integration-specific config like
DD_VLLM_TRACE_EMBEDDINGS=false
to selectively disable embedding span creation. However, for now, I believe we should monitor customer feedback and add operation-specific filtering in a follow-up if needed, rather than straight-up over-engineer a solution to a problem that may or may not exist.Additional Notes
Code Architecture
patch.py
: Main entry point, wraps vLLM engine methods, as per usualextractors.py
: Extracts request/response data from vLLM structuresutils.py
: Span creation, context injection, metrics utilitiesllmobs/_integrations/vllm.py
: LLMObs-specific tagging and event building, as per usual