Skip to content

Conversation

abhinavsingh
Copy link
Owner

@abhinavsingh abhinavsingh commented Oct 4, 2020

Introduce a mechanism within HttpParser class that maintains a running hash of request being processed. Fingerprinting feature is currently being used by cache plugin and might be an overkill to introduce within HttpParser class. As request hashes will be calculated even without cache plugin enabled.

Fingerprinting removes the need of maintaining an index structure. Here is how it will work:

  1. Cache plugin uses a file named after fingerprint of the request to store cached data.
  2. For a new incoming request, cache file that matches incoming request fingerprint is looked up on disk. This operation will be O(1), either the cache file exists or not. If it exists, request is served out of the cache.

Note that, serving out of cache only works for:

  1. http requests
  2. https requests with TLS Interception

For the 2nd case i.e. with TLS Interception, https request fingerprint is a calculated based upon both the CONNECT request and the actual cached request.

Additionally:

  1. Fix some tests due to change in default cache directory
  2. Fix event propagation using multiprocessing manager
  3. Adds ReplayTestCase which decouples replay functionality from normal test case. This also removes the need of dynamically enabling cache plugin.

TODO:

  1. Revisit placement of hash calculation in case it reflects into higher CPU/RAM consumption for high traffic load.
  2. Fix cache plugin for TLS interception mode. Is currently broken.

1) Fix some tests due to change in default cache directory
2) Fix event propogation using multiprocessing manager

I/O error on closed resource is still an issue.  Doesn't result in
test error but a warning is thrown (tested on Python 3.8).
@codecov
Copy link

codecov bot commented Oct 4, 2020

Codecov Report

Merging #441 into cache-server will increase coverage by 0.53%.
The diff coverage is 100.00%.

Impacted file tree graph

@@               Coverage Diff                @@
##           cache-server     #441      +/-   ##
================================================
+ Coverage         81.51%   82.04%   +0.53%     
================================================
  Files                80       82       +2     
  Lines              3083     3046      -37     
================================================
- Hits               2513     2499      -14     
+ Misses              570      547      -23     
Impacted Files Coverage Δ
proxy/dashboard/plugin.py 75.00% <ø> (+5.76%) ⬆️
proxy/__init__.py 100.00% <100.00%> (ø)
proxy/common/flags.py 100.00% <100.00%> (ø)
proxy/http/parser.py 99.39% <100.00%> (+3.14%) ⬆️
proxy/http/proxy/server.py 78.14% <100.00%> (-0.81%) ⬇️
proxy/plugin/cache/base.py 92.30% <100.00%> (+7.49%) ⬆️
proxy/plugin/cache/cache_responses.py 100.00% <100.00%> (ø)
proxy/plugin/cache/store/base.py 100.00% <100.00%> (+22.72%) ⬆️
proxy/plugin/cache/store/disk.py 97.50% <100.00%> (+6.19%) ⬆️
proxy/testing/__init__.py 100.00% <100.00%> (ø)
... and 7 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update ff4026b...bcba970. Read the comment docs.

@abhinavsingh

This comment has been minimized.

@pasccom
Copy link
Contributor

pasccom commented Oct 4, 2020

Hello, I think these warnings come from the fact that we output the message in the cache __del__() method

def close(self) -> None:
if self.cache_file:
logger.debug("Closing cache file")
which is executed after test ends (when the garbage collector releases the cache).
When running isolated tests (using python -m unitest), it is not a problem, but it seems pytest does not like this.
I did not bother to fix this because it gives useful information when using python -m unitest.

@abhinavsingh
Copy link
Owner Author

When running isolated tests (using python -m unitest), it is not a problem, but it seems pytest does not like this.
I did not bother to fix this because it gives useful information when using python -m unitest.

Thank you for the context. Yes, I wasn't sure why these warnings are being thrown. IIRC, these warning come from http proxy plugin testing with TLS interception on. Don't see these warnings otherwise. Will dig further into it this week.

@abhinavsingh
Copy link
Owner Author

abhinavsingh commented Oct 5, 2020

Need a cleaner way to dynamically enable / disable a plugin. Current approach used by cache response plugin doesn't work universally. Problems being:

  1. multiprocessing.Event which is declared as a global variable gets forked instead of being shared between parent and child. This only impacts Python 3.8 on MacOS. Likely will fail on Windows too.

  2. By replacing multiprocessing.Event with multiprocessing.Manager().Event fixes the issue but then we start to run into following runtime errors:

> python -m proxy --num-workers 1
2020-10-05 10:18:55,090 - pid:1685 [I] load_plugins:318 - Loaded plugin proxy.http.proxy.HttpProxyPlugin
2020-10-05 10:18:55,091 - pid:1685 [I] listen:116 - Listening on ::1:8899
2020-10-05 10:18:55,093 - pid:1685 [I] start_workers:139 - Started 1 workers
Traceback (most recent call last):
  File "<string>", line 1, in <module>
  File "/usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 116, in spawn_main
    exitcode = _main(fd, parent_sentinel)
  File "/usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 126, in _main
    self = reduction.pickle.load(from_parent)
  File "/Users/abhinavsingh/Dev/proxy.py/proxy/__init__.py", line 14, in <module>
    from .testing.test_case import TestCase
  File "/Users/abhinavsingh/Dev/proxy.py/proxy/testing/test_case.py", line 19, in <module>
    from ..plugin import CacheResponsesPlugin
  File "/Users/abhinavsingh/Dev/proxy.py/proxy/plugin/__init__.py", line 11, in <module>
    from .cache import CacheResponsesPlugin, BaseCacheResponsesPlugin
  File "/Users/abhinavsingh/Dev/proxy.py/proxy/plugin/cache/__init__.py", line 11, in <module>
    from .base import BaseCacheResponsesPlugin
  File "/Users/abhinavsingh/Dev/proxy.py/proxy/plugin/cache/base.py", line 44, in <module>
    class BaseCacheResponsesPlugin(HttpProxyBasePlugin):
  File "/Users/abhinavsingh/Dev/proxy.py/proxy/plugin/cache/base.py", line 52, in BaseCacheResponsesPlugin
    enabled = Event()
  File "/Users/abhinavsingh/Dev/proxy.py/proxy/plugin/cache/base.py", line 29, in __init__
    self.manager = multiprocessing.Manager()
  File "/usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 57, in Manager
    m.start()
  File "/usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/managers.py", line 579, in start
    self._process.start()
  File "/usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/process.py", line 121, in start
    self._popen = self._Popen(self)
  File "/usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/context.py", line 283, in _Popen
    return Popen(process_obj)
  File "/usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 32, in __init__
    super().__init__(process_obj)
  File "/usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_fork.py", line 19, in __init__
    self._launch(process_obj)
  File "/usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/popen_spawn_posix.py", line 42, in _launch
    prep_data = spawn.get_preparation_data(process_obj._name)
  File "/usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 154, in get_preparation_data
    _check_not_importing_main()
  File "/usr/local/Cellar/[email protected]/3.8.3/Frameworks/Python.framework/Versions/3.8/lib/python3.8/multiprocessing/spawn.py", line 134, in _check_not_importing_main
    raise RuntimeError('''
RuntimeError: 
        An attempt has been made to start a new process before the
        current process has finished its bootstrapping phase.

        This probably means that you are not using fork to start your
        child processes and you have forgotten to use the proper idiom
        in the main module:

            if __name__ == '__main__':
                freeze_support()
                ...

        The "freeze_support()" line can be omitted if the program
        is not going to be frozen to produce an executable.

Because being a global variable, event tries to initialize itself as soon as it is imported. Manager creates a process of it's own resulting into freeze_support issues.

@abhinavsingh
Copy link
Owner Author

abhinavsingh commented Oct 5, 2020

@pasccom Some notable changes:

  1. Removed need of a dynamic plugin enable / disable functionality which was flaky in some environment
  2. Using fingerprint we can remove need of maintaining an index file. See PR description. (Yet to deprecate usage of list.txt).

1) Temporarily disable cache response plugin with tls interception. These
   are broken due to fingerprint mismatch.  For a https request, fingerprint
   must capture both CONNECT and then the sub-request state.
@abhinavsingh abhinavsingh changed the title Add request fingerprint Add request fingerprint as cached filename Oct 5, 2020
@pasccom
Copy link
Contributor

pasccom commented Oct 5, 2020

Hello @abhinavsingh, I introduced list.txt because I needed to have a means to know which requests were cached and which cache file corresponds to which request (sometimes upstream sends random server errors, and I need to remove them manually from the test cache).
On second thought, I think it is not a bad idea to get rid of it. I think I can make do with the log file. Maybe we will have to improve the logging messages, I do not know how accurate they are.

@abhinavsingh
Copy link
Owner Author

Hello @abhinavsingh, I introduced list.txt because I needed to have a means to know which requests were cached and which cache file corresponds to which request (sometimes upstream sends random server errors, and I need to remove them manually from the test cache).

Agree about various server response codes. We'll definitely disable caching for certain http codes. For replay testing, caching can be enabled for all response codes.

On second thought, I think it is not a bad idea to get rid of it. I think I can make do with the log file. Maybe we will have to improve the logging messages, I do not know how accurate they are.

Yes, log file can work very well here. We'll have to add fingerprint as one of the logging format parameters.

Currently we are only caching responses. But we can also cache requests (most like add this feature behind a flag). Example, fingerprint.request and fingerprint.response. We can add some utility code to build index out out of cached files either offline or in-realtime using eventing.

@abhinavsingh
Copy link
Owner Author

@pasccom Created #443 to track it separately. I don't think we can cleanly proceed forward for https + TLS interception mode. Problem being, as you mentioned before, core expects an active upstream connection. I have discussed other problem arising with no upstream connection (see issue description).

Overall, it's unclear how to approach this for now. Ideally caching should work w/o upstream connection. Kind of defeats the purpose of cache if we have to re-connect with upstream even for serving out of cache. At the same time, we have some edge cases to handle if upstream connection is not made (see issue description).

Merging this one for now.

@abhinavsingh abhinavsingh merged commit 2a37cc8 into cache-server Oct 6, 2020
@abhinavsingh abhinavsingh deleted the caching branch October 6, 2020 08:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants