Skip to content

Conversation

@nicholascar
Copy link
Member

Removes requests from RDFlib requirements entirely to

Tidies requirements.txt & requirements-dev.txt

Closes #1122

@coveralls
Copy link

Coverage Status

Coverage decreased (-0.3%) to 75.398% when pulling 67b8058 on remove_requests into 56dc420 on master.

@coveralls
Copy link

coveralls commented Oct 3, 2020

Coverage Status

Coverage decreased (-0.3%) to 75.402% when pulling c923eb5 on remove_requests into 56dc420 on master.

res = urlopen(Request(self.query_endpoint + qsa, headers=args["headers"]))
elif self.method == "POST":
args["headers"].update({"Content-Type": "application/sparql-query"})
args["data"] = params
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we need params in args["data"] anymore?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only params allowed for a POST SPARQL query is the query itself, as per the spec, so we don't allow any others. Auth params for secure endpoints I've catered for with a separate auth param.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to submit default-graph-uri and named-graph-uri parameters. So the params is still required.

Also I think we should support both application/x-www-form-urlencoded and application/sparql-query content-types. But I'm unsure how we would determine which content-type is supported by the store we are speaking with.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The params attribute is still required or at least the information about the default-graph.

If you consider the following:

Load some context aware quad store (e.g. the QuitStore) with the following data:

graph <http://example.org/> {
    <http://example.org/ExampleInstance> a <http://example.org/Example>
}
graph <http://othergraph.org/> {
    <http://example.org/OtherInstance> a <http://example.org/Example>
}

The query endpoint is assumed to be available at http://localhost:5000/sparql

class SPARQLStoreQuitStoreTestCase(unittest.TestCase):
    store_name = "SPARQLStore"
    path = "http://localhost:5000/sparql"
    create = False

    def setUp(self):
        store = SPARQLStore(query_endpoint=self.path, method="POST")
        self.conjunctivegraph = ConjunctiveGraph(store=store)

    def tearDown(self):
        self.conjunctivegraph.close()

    def test_Query(self):
        query = "select distinct ?inst where {?inst a <http://example.org/Example>}"
        graph = self.conjunctivegraph.get_context(URIRef("http://example.org/"))
        res = graph.query(query, initNs={})
        assert len(res) == 1, len(res)
        for i in res:
            assert type(i[0]) == URIRef, i[0].n3()
            assert i[0] == URIRef("http://example.org/ExampleInstance"), i[0].n3()

The query is executed as POST request but does not convey the information about the default graph.

This issue is a combination of removing this line and sending the wrong Content-Type.
Actually, what the request was doing before this change and the merge of #1022 was sending a proper POST request with a application/x-www-form-urlencoded content type according to the SPARQL 1.1 Protocol (query via URL-encoded POST; https://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/#query-operation).


res.raise_for_status()

def close(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we remove close() method?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because the Store is actually stateless with each operation and there's nothing to close. I originally dropped open() too but it's needed if the Store is called by Graph("SPARQLStore") as opposed to SPARQLStore()

@ashleysommer
Copy link
Contributor

@nicholascar
Why did you put the Python3.6+ changes into this requests removal PR?

@nicholascar
Copy link
Member Author

Me being lazy. I saw Python 3.5 failures and wanted to jump over those! I thought we agreed to drop 3.5 so I was really just testing that out.

Also, since this PR is all about dropping dependencies, I wanted to reduce the encumbrances as much as possible.

Since there is a remaining Python 3.6 error, I'm going to have to address that in further edits here so could reintroduce Python 3.5 if you like.

We did agree to drop Python 3.5 right?

@nicholascar
Copy link
Member Author

Weird that it fails tests here as it passed everything locally on Python 3.6: I ran on that version in particular to test. I think it's just a naming collision with json packages so perhaps easily solvable in the week.

@ashleysommer
Copy link
Contributor

Yes, we did agree to drop python 3.5 for RDFLib v6.0.0, and have a minimum 3.6. Just wasn't expecting that change to come as part of the "Remove requests" PR. But it should be fine.

@nicholascar
Copy link
Member Author

Regarding the single failure here, the message is OSError: [Errno 99] Cannot assign requested address. This is due to something in the setting of the testing machine, not the content of RDFlib code I think as the error doesn't appear running tests locally. It's something about setting ports or colliding with ports in use. Any thoughts on this one?

Copy link
Member

@white-gecko white-gecko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is hard for me to understand the complete pull request, as it mixes a lot of different things. Would it be possible for you to split this into several unrelated pull-requests?

res = urlopen(Request(self.query_endpoint + qsa, headers=args["headers"]))
elif self.method == "POST":
args["headers"].update({"Content-Type": "application/sparql-query"})
args["data"] = params
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We still need to submit default-graph-uri and named-graph-uri parameters. So the params is still required.

Also I think we should support both application/x-www-form-urlencoded and application/sparql-query content-types. But I'm unsure how we would determine which content-type is supported by the store we are speaking with.

def predicate_objects(self, subject=None):
"""A generator of (predicate, object) tuples for the given subject"""
for t, c in self.triples((subject, None, None)):
yield t[1], t[2]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need these methods?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are all just convenience methods that are present in the normal Graph() interface and they are commonly used, so I thought they could just be present in the SPARQLStore too. This store returns quads, not triples, so I had to copy these methods into the store's own list of methods to re-implement them, else they would just break when someone called them.

@white-gecko
Copy link
Member

Regarding the single failure here, the message is OSError: [Errno 99] Cannot assign requested address. This is due to something in the setting of the testing machine, not the content of RDFlib code I think as the error doesn't appear running tests locally. It's something about setting ports or colliding with ports in use. Any thoughts on this one?

This might show that there are to many open connections in to short time: https://stackoverflow.com/questions/11981933/python-urllib2-cannot-assign-requested-address?noredirect=1

This could be a limitation on the travis machine. Maybe we can also perform the tests with some local endpoint. Which might not be related to that issue.

@nicholascar
Copy link
Member Author

Regarding the single failure here, the message is OSError: [Errno 99] Cannot assign requested address. This is due to something in the setting of the testing machine, not the content of RDFlib code I think as the error doesn't appear running tests locally. It's something about setting ports or colliding with ports in use. Any thoughts on this one?

This might show that there are to many open connections in to short time: https://stackoverflow.com/questions/11981933/python-urllib2-cannot-assign-requested-address?noredirect=1

This could be a limitation on the travis machine. Maybe we can also perform the tests with some local endpoint. Which might not be related to that issue.

When I perform the test locally, they all pass so I do think it is a Travis limitation.

@nicholascar
Copy link
Member Author

We still need to submit default-graph-uri and named-graph-uri parameters.

OK, let me think about that! I might have missed something there.

@nicholascar
Copy link
Member Author

nicholascar commented Oct 5, 2020

Would it be possible for you to split this into several unrelated pull-requests?

Not really: the focus here was just to get rid of requests and requests is only use in 2 places in the entire codebase: a couple of tests and by SPARQLStore. This PR's really only just removing those references and then patching SPARQLStore to ensure that everything works.

The only optional things not about that are the convenience methods such as Graph().subjects() that you asked about in another comment.

Oh, and some very tiny updated to metadata such as adding us to CONTRIBUTORS.rst, but those are really tiny one-liners.

@nicholascar
Copy link
Member Author

Open/Close to trigger Travis to see if we can overcome the port hogging test failure

@nicholascar nicholascar closed this Oct 6, 2020
@nicholascar nicholascar reopened this Oct 6, 2020
@nicholascar
Copy link
Member Author

Here's a demo fo default-graph-uri working , compared with the NG in the query:

from rdflib.plugins.stores.sparqlstore import SPARQLStore
ENDPOINT = "http://cgi.surroundaustralia.com:7200/repositories/cgi-vocs"
NAMED_GRAPH = "http://resource.geosciml.org/classifierscheme/cgi/2016.01/classification-method-used"
# NG in query, Graph then SPARQLStore
q = """
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    SELECT (COUNT(?s) AS ?c)
    WHERE {{
        GRAPH <{}> {{
            ?s a skos:Concept .
        }}
    }}
    """.format(NAMED_GRAPH)

g = Graph("SPARQLStore")
g.open(ENDPOINT)
for r in g.query(q):
    assert int(r['c']) == 14

st = SPARQLStore(query_endpoint=ENDPOINT)
for r in st.query(q):
    assert int(r['c']) == 14

# NG separate from query, Graph then SPARQLStore
q2 = """
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    SELECT (COUNT(?s) AS ?c)
    WHERE {
        ?s a skos:Concept .
    }
    """

g = Graph("SPARQLStore", identifier=NAMED_GRAPH)
g.open(ENDPOINT)
for r in g.query(q2):
    assert int(r['c']) == 14

st = SPARQLStore(query_endpoint=ENDPOINT)
for r in st.query(q2, queryGraph=NAMED_GRAPH):
    assert int(r['c']) == 14

@white-gecko
Copy link
Member

Are you sure it is executed via POST?

@nicholascar
Copy link
Member Author

nicholascar commented Oct 6, 2020

Are you sure it is executed via POST?

No, there seems to be no logic that allows SPARQLStore to ever use POST in SparqlConnector. SPARQLUpdateStore has a postAsEncoded variable but this also seems never to be used. In testing I manually set SparqlConnector's self.method value to POST to try it out - it works fine for most queries - but I haven't done as much testing as for GET.

I will write some POST tests next.

I think there were quite a few combinations of things not tested in the original SPARQLStore code but fixing them all wasn't the purpose of this PR, just to remove requests! I've had to fix lots of things to get it all working without requests though.

@nicholascar
Copy link
Member Author

@ashleysommer @white-gecko please can you approve this PR? The single error in Travis above is, again urlopen error [Errno 99] Cannot assign requested address which is a Travis machine capacity issue, not a "real" test failure.

There are an endless number of things that I could keep fixing in SPARQLStore but that's not the goal of this PR - just the removal of requests.

Actually I'm using SPARQLStore in a production system now so will keep working on it. If this PR can be merged, I'll use master in the system (as opposed to 5.0.0) and that will let me keep testing SPARQLStore.

@ashleysommer
Copy link
Contributor

@nicholascar
I'm not concerned about the one failing Travis issue. That is something I'll have to look into going forward, as I'm pretty sure it is bad practice to stand up a real web server in a testing environment for running tests against, and its bad practice to make real web requests in a testing environment tool.

If you're confident the POST issue raised by @white-gecko above isn't related to these changes (and it was like that already when using requests) then we can accept and merge this.
We'll have to create a new issue for the POST-related problem.

@nicholascar
Copy link
Member Author

its bad practice to make real web requests in a testing environment tool.

Yes, I think we are going to have to better mock up things like the SPARQL endpoints that are needed for some of the tests so things like Fuseki aren't needed to be bundled into the tests.

If you're confident the POST issue raised by @white-gecko above isn't related to these changes (and it was like that already when using requests) then we can accept and merge this.

Yes I am confident: that's a different thing and part of the incomplete or now slightly broken SPARQLStore implementation that needs love and attention.

I'd like to raise a bunch of Issues for SPARQLStore and then fix them with some PRs straight after this one which I'll discuss this arvo. Having this PR through will set a base layer for me to test behaviour with. I can think of a whole bunch more tests like the ones for GET I implemented but for POST and also things like transactions testing etc. I'm concerned that transaction rollback etc isn't actually working but we don't normally notice since SPARQL Services that I use are highly available and thus don't often disappear mid way through a session.

@ashleysommer ashleysommer merged commit 7a53c61 into master Oct 8, 2020
@ashleysommer
Copy link
Contributor

@nicholascar
Did you run black on these files after these changes?
I just ran black on the codebase for my PR, and I'm seeing lots of black errors in plugins/stores/sparqlstore and in sparqlwrapper and other sparql-related files.

return self._session.__dict__[k]
if auth is not None:
assert type(auth) == tuple, "auth must be a tuple"
assert len(auth) == 2, "auth must be a tuple (user, password)"
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In my eyes it is no good practice to have assert in production code, so it should not be in a library.
The Python reference states:

The current code generator emits no code for an assert statement when optimization is requested at compile time.
https://docs.python.org/3/reference/simple_stmts.html#assert

So I think it should be replaced by if and raise and exception.

res = urlopen(Request(self.query_endpoint + qsa, headers=args["headers"]))
elif self.method == "POST":
args["headers"].update({"Content-Type": "application/sparql-query"})
args["data"] = params
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The params attribute is still required or at least the information about the default-graph.

If you consider the following:

Load some context aware quad store (e.g. the QuitStore) with the following data:

graph <http://example.org/> {
    <http://example.org/ExampleInstance> a <http://example.org/Example>
}
graph <http://othergraph.org/> {
    <http://example.org/OtherInstance> a <http://example.org/Example>
}

The query endpoint is assumed to be available at http://localhost:5000/sparql

class SPARQLStoreQuitStoreTestCase(unittest.TestCase):
    store_name = "SPARQLStore"
    path = "http://localhost:5000/sparql"
    create = False

    def setUp(self):
        store = SPARQLStore(query_endpoint=self.path, method="POST")
        self.conjunctivegraph = ConjunctiveGraph(store=store)

    def tearDown(self):
        self.conjunctivegraph.close()

    def test_Query(self):
        query = "select distinct ?inst where {?inst a <http://example.org/Example>}"
        graph = self.conjunctivegraph.get_context(URIRef("http://example.org/"))
        res = graph.query(query, initNs={})
        assert len(res) == 1, len(res)
        for i in res:
            assert type(i[0]) == URIRef, i[0].n3()
            assert i[0] == URIRef("http://example.org/ExampleInstance"), i[0].n3()

The query is executed as POST request but does not convey the information about the default graph.

This issue is a combination of removing this line and sending the wrong Content-Type.
Actually, what the request was doing before this change and the merge of #1022 was sending a proper POST request with a application/x-www-form-urlencoded content type according to the SPARQL 1.1 Protocol (query via URL-encoded POST; https://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/#query-operation).

@white-gecko
Copy link
Member

Are you sure the urlopen error [Errno 99] Cannot assign requested address is not introduced by removing the session? As the requests.Session() is implementing connection pooling (https://2.python-requests.org/en/master/user/advanced/). It might be that by removing the session there are to many open connections which bring travis to the limit of pots to be used. On your local machine you might have more open ports available. I'm unsure if this could also happen in some use cases, where a lot of queries are sent.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

requests not included in setup.py

5 participants