-
Notifications
You must be signed in to change notification settings - Fork 38
Open
Description
Code Version
2.1.2
Expected Behavior
The memory used by pyff is properly freed up after a request finishes.
Current Behavior
Each request that leads to a 500 HTTP error results in a memory increase by 300MB.
Possible Solution
To alleviate the issue the parsed tree needs to be cleared explicitly as shown in the diff below.
diff --git i/src/pyff/api.py w/src/pyff/api.py
index 1050efb..2f17438 100644
--- i/src/pyff/api.py
+++ w/src/pyff/api.py
@@ -4,6 +4,7 @@ from datetime import datetime, timedelta
from json import dumps
from typing import Any, Dict, Generator, Iterable, List, Mapping, Optional, Tuple
+import lxml.etree
import pkg_resources
import pyramid.httpexceptions as exc
import pytz
@@ -297,12 +298,18 @@ def process_handler(request: Request) -> Response:
except ResourceException as ex:
import traceback
+ if isinstance(r, (lxml.etree._Element, lxml.etree._ElementTree)):
+ r.clear()
+
log.debug(traceback.format_exc())
log.warning(f'Exception from processing pipeline: {ex}')
raise exc.exception_response(409)
except BaseException as ex:
import traceback
+ if isinstance(r, (lxml.etree._Element, lxml.etree._ElementTree)):
+ r.clear()
+
log.debug(traceback.format_exc())
log.error(f'Exception from processing pipeline: {ex}')
raise exc.exception_response(500)Steps to Reproduce
XML files which are stored under tmp/dynamic are 50MB in total in our case and that seems to lead to higher memory usage since pyff parses them into Python representation using lxml. Each request results roughly in a 300MB increase in memory which is not then freed up properly.
To reproduce the issue use the following pipeline file:
- when update:
- load:
- tmp/dynamic
- tmp/static
- when request:
- select:
- pipe:
- when accept application/samlmetadata+xml application/xml:
- first
- finalize:
cacheDuration: PT12H
validUntil: P10D
- sign:
key: tmp/default.key
cert: tmp/default.crt
- emit application/samlmetadata+xml
- break
- when accept application/json:
- discojson
- emit application/json
- breakRun pyff with caching disabled:
PYFF_CACHING_ENABLED=False pyffd -f --frequency=1200 --loglevel=INFO -H 0.0.0.0 -P 8080 --pid_file $PWD/tmp/pyff.pid --dir=$PWD/tmp/ $PWD/tmp/mdx.fdAnd run the following:
for i in `seq 1 20 `;
do
http --print hH 0.0.0.0:8080 'Accept: text/plain'
doneHigh memory consumption is most likely related to lxml not freeing up the memory properly.
Metadata
Metadata
Assignees
Labels
No labels