Skip to content

bpo-40705: Fix use-after-free in _zoneinfo's module_free #20280

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
May 22, 2020

Conversation

ammaraskar
Copy link
Member

@ammaraskar ammaraskar commented May 21, 2020

initialize_caches currently seems to be designed to account for being called twice but it's only callee is the Py_mod_exec. This might have been future planning for heap allocated types but I'll wait for a response from @pganssle on their intentions with this code there. Until then this seems like a straight-forward fix.

Couldn't really think of an easy way to test this since it doesn't seem like there's a codepath where a reference to TIMEDELTA_CACHE or ZONEINFO_WEAK_CACHE could be held by someone outside the module.

https://bugs.python.org/issue40705

@pganssle
Copy link
Member

Thanks for the report and the PR.

I am mildly uncomfortable with this way of solving the issue, because it seems like both the "module is created two times" and the "someone manages to use one of these globals after it is free" code paths are basically theoretical, and in my judgement the "module exists twice" is much more likely to occur in the near future. Since, as far as I can tell, this is not exploitable, I'd prefer to take a little time to come to a solution for both, if possible.

That said, I don't really understand how this can cause use-after-free, so I am having trouble coming up with alternate solutions. Would this work?

if (TIMEDELTA_CACHE != NULL && Py_REFCNT(TIMEDELTA_CACHE) == 1) {
    Py_CLEAR(TIMEDELTA_CACHE);
} else {
    Py_XDECREF(TIMEDELTA_CACHE);
}

We'd need to do some locking if we ever get to a state where the GIL is removed or where there are additional GILs, but that's not the current state of things and even if it were the failure mode is a small memory leak in very unusual circumstances rather than any sort of crash.

@tiran
Copy link
Member

tiran commented May 21, 2020

The caches are static globals. Is it safe to mix multi-phase initialization with globals? I was under the impression that multi-phase initialization and multi-module instances require per-module state so objects cannot leak between subinterpreters.

@ammaraskar
Copy link
Member Author

That said, I don't really understand how this can cause use-after-free, so I am having trouble coming up with alternate solutions. Would this work?

Yeah, that solution would work. Essentially what's happening is that there's only one instance of TIMEDELTA_CACHE when _zoneinfo gets imported. When the module falls out of scope, this code first decreases the refcnt and since there's only one instance of it at this point it gets free'd. Then it tries to access the refcnt on the freed object which is the use-after-free.

I think @tiran is right though, if you want to account for multiple modules then this cache needs to be at the module level instead of a global. See the example of posixmodule:

typedef struct {
PyObject *billion;
PyObject *DirEntryType;
PyObject *ScandirIteratorType;
#if defined(HAVE_SCHED_SETPARAM) || defined(HAVE_SCHED_SETSCHEDULER) || defined(POSIX_SPAWN_SETSCHEDULER) || defined(POSIX_SPAWN_SETSCHEDPARAM)
PyObject *SchedParamType;
#endif
PyObject *StatResultType;
PyObject *StatVFSResultType;
PyObject *TerminalSizeType;
PyObject *TimesResultType;
PyObject *UnameResultType;
#if defined(HAVE_WAITID) && !defined(__APPLE__)
PyObject *WaitidResultType;
#endif
#if defined(HAVE_WAIT3) || defined(HAVE_WAIT4)
PyObject *struct_rusage;
#endif
PyObject *st_mode;
} _posixstate;
static inline _posixstate*
get_posix_state(PyObject *module)
{
void *state = PyModule_GetState(module);
assert(state != NULL);
return (_posixstate *)state;
}

@pganssle
Copy link
Member

The caches are static globals. Is it safe to mix multi-phase initialization with globals? I was under the impression that multi-phase initialization and multi-module instances require per-module state so objects cannot leak between subinterpreters.

Yeah, this code is essentially a weird intermediate stage where it has the form of multi-phase initialization, but it's not actually safe to use with multi-module instances. I gave storing the caches on the module a try but it was annoying complicated and not important anyway since zoneinfo depends intimately on datetime, which itself is not safe for subinterpreter use.

I think that when I originally wrote this, I needed PEP 573 to properly implement this in a subinterpreter-safe way, but I was targeting Python 3.8, because the reference implementation was destined to become a backport. Now that it's in the standard library we can migrate over to using module state (assuming we can do it without major performance regressions), but I think we'll have to target 3.10 for that (unless big refactors like that are allowed during the beta period).

When the module falls out of scope, this code first decreases the refcnt and since there's only one instance of it at this point it gets free'd. Then it tries to access the refcnt on the freed object which is the use-after-free.

Ah, that's pretty obvious now that you say it.

In that case, then yes I think we should use the conditional, since I believe that — despite the fact that nothing else will ever have a reference to this at the moment — everything else assumes that these are ref-counted and that more than one reference to it can exist.

@tiran
Copy link
Member

tiran commented May 21, 2020

The caches are static globals. Is it safe to mix multi-phase initialization with globals? I was under the impression that multi-phase initialization and multi-module instances require per-module state so objects cannot leak between subinterpreters.

Yeah, this code is essentially a weird intermediate stage where it has the form of multi-phase initialization, but it's not actually safe to use with multi-module instances. I gave storing the caches on the module a try but it was annoying complicated and not important anyway since zoneinfo depends intimately on datetime, which itself is not safe for subinterpreter use.

I think that when I originally wrote this, I needed PEP 573 to properly implement this in a subinterpreter-safe way, but I was targeting Python 3.8, because the reference implementation was destined to become a backport. Now that it's in the standard library we can migrate over to using module state (assuming we can do it without major performance regressions), but I think we'll have to target 3.10 for that (unless big refactors like that are allowed during the beta period).

At first glance it looked easy to port the new module to PEP 489 multi-phase init. After I found more than eight globals I pretty much gave up. The endeavor turned out to be a lengthy and dull task. I didn't even realize that datetime needs to be fixed first!

How about use use the safe singleton API PyState_FindModule() / PyModule_Create() in 3.9 and move to the new multi-phase init in 3.10? I'm using a similar approach in #20180

static struct PyModuleDef zoneinfomodule = {
    PyModuleDef_HEAD_INIT,
    .m_name = "_zoneinfo",
    .m_doc = "C implementation of the zoneinfo module",
    .m_size = 0,
    .m_methods = module_methods,
    .m_slots = NULL,
    .m_free = (freefunc)module_free
};

PyMODINIT_FUNC
PyInit__zoneinfo(void)
{
    PyObject *module;

    module = PyState_FindModule(&zoneinfomodule);
    if (module != NULL) {
        Py_INCREF(module);
        return module;
    }

    module = PyModule_Create(&zoneinfomodule);
    if (module == NULL) {
        return NULL;
    }

    if (zoneinfomodule_exec(module) < 0) {
        Py_DECREF(module);
        return NULL;
    }
    return module;
}

@ammaraskar ammaraskar force-pushed the zoneinfo_use_after_free branch from f5bc283 to 296ec86 Compare May 21, 2020 20:13
@ammaraskar
Copy link
Member Author

I updated the diff to be the short minimal fix that accounts for multiple modules for now, one of the branches won't ever be taken right now. We could go with the singleton module approach and remove the in-limbo code for 3.9 but that might make it harder to back-port stuff from 3.10 in the future.

@pganssle pganssle merged commit 06a1b89 into python:master May 22, 2020
@miss-islington
Copy link
Contributor

Thanks @ammaraskar for the PR, and @pganssle for merging it 🌮🎉.. I'm working now to backport this PR to: 3.9.
🐍🍒⛏🤖

@bedevere-bot
Copy link

GH-20319 is a backport of this pull request to the 3.9 branch.

miss-islington pushed a commit to miss-islington/cpython that referenced this pull request May 22, 2020
TIMEDELTA_CACHE = NULL;
if (TIMEDELTA_CACHE != NULL && Py_REFCNT(TIMEDELTA_CACHE) > 1) {
Py_DECREF(TIMEDELTA_CACHE);
} else {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ugh, I just realize this is a PEP 7 violation, it should be:

    if (TIMEDELTA_CACHE != NULL && Py_REFCNT(TIMEDELTA_CACHE) > 1) {
        Py_DECREF(TIMEDELTA_CACHE);
    }
    else {
        Py_CLEAR(TIMEDELTA_CACHE);
    }

Since we're probably going to re-write this code soon-ish anyway, I guess we'll just leave it here and in the backport until someone touches it again.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah sorry about that :(

@pganssle
Copy link
Member

one of the branches won't ever be taken right now

Interestingly, it seems like it's the Py_CLEAR branch that's never hit, not the Py_DECREF branch. Not sure why that is. https://codecov.io/gh/pganssle/zoneinfo/src/d60d46e43b3e59d15e65d43527f977d8890a8edc/lib/zoneinfo_module.c#L2604...2609

@ammaraskar
Copy link
Member Author

Huh, that's weird. Even in that situation I would expect the other branch to be hit at least once, when the last reference to the module is lost.

@ammaraskar
Copy link
Member Author

The one on master shows the opposite coverage: https://codecov.io/gh/python/cpython/src/master/Modules/_zoneinfo.c#L2607

@pganssle
Copy link
Member

@ammaraskar I think the difference is that because of the different module organization (_zoneinfo vs. zoneinfo._czoneinfo), the way I'm getting side-by-side versions of the module with and without the C extension is different. I guess this version only creates one copy of _czoneinfo whereas that version creates 2 or 3 versions. When I added a print statement to check the refcounts in the backport, I see that this gets called with 3 and 2 but not 1 — not sure if that is just the interpreter shutdown cutting some corners in freeing up memory or something else holding a reference to the timedelta cache.

Either way, I'm kinda glad we did it this way after all, because it seems likely that semi-insignificant refactoring could end up leading to segfaults in some situations. I'm going to merge the backport.

miss-islington added a commit that referenced this pull request May 24, 2020
arturoescaip pushed a commit to arturoescaip/cpython that referenced this pull request May 24, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants