Remove requests #1175

nicholascar · 2020-10-03T11:19:36Z

Removes requests from RDFlib requirements entirely to

Tidies requirements.txt & requirements-dev.txt

Closes #1122

…e relevant e.g.s

coveralls · 2020-10-03T11:53:58Z

Coverage decreased (-0.3%) to 75.398% when pulling 67b8058 on remove_requests into 56dc420 on master.

coveralls · 2020-10-03T11:53:59Z

Coverage decreased (-0.3%) to 75.402% when pulling c923eb5 on remove_requests into 56dc420 on master.

rdflib/plugins/stores/sparqlconnector.py

ashleysommer · 2020-10-03T22:36:04Z

rdflib/plugins/stores/sparqlconnector.py

+            res = urlopen(Request(self.query_endpoint + qsa, headers=args["headers"]))
        elif self.method == "POST":
            args["headers"].update({"Content-Type": "application/sparql-query"})
-            args["data"] = params


Why don't we need params in args["data"] anymore?

The only params allowed for a POST SPARQL query is the query itself, as per the spec, so we don't allow any others. Auth params for secure endpoints I've catered for with a separate auth param.

We still need to submit default-graph-uri and named-graph-uri parameters. So the params is still required.

Also I think we should support both application/x-www-form-urlencoded and application/sparql-query content-types. But I'm unsure how we would determine which content-type is supported by the store we are speaking with.

The params attribute is still required or at least the information about the default-graph.

If you consider the following:

Load some context aware quad store (e.g. the QuitStore) with the following data:

graph <http://example.org/> { <http://example.org/ExampleInstance> a <http://example.org/Example> } graph <http://othergraph.org/> { <http://example.org/OtherInstance> a <http://example.org/Example> }

The query endpoint is assumed to be available at http://localhost:5000/sparql

class SPARQLStoreQuitStoreTestCase(unittest.TestCase): store_name = "SPARQLStore" path = "http://localhost:5000/sparql" create = False def setUp(self): store = SPARQLStore(query_endpoint=self.path, method="POST") self.conjunctivegraph = ConjunctiveGraph(store=store) def tearDown(self): self.conjunctivegraph.close() def test_Query(self): query = "select distinct ?inst where {?inst a <http://example.org/Example>}" graph = self.conjunctivegraph.get_context(URIRef("http://example.org/")) res = graph.query(query, initNs={}) assert len(res) == 1, len(res) for i in res: assert type(i[0]) == URIRef, i[0].n3() assert i[0] == URIRef("http://example.org/ExampleInstance"), i[0].n3()

The query is executed as POST request but does not convey the information about the default graph.

This issue is a combination of removing this line and sending the wrong Content-Type.
Actually, what the request was doing before this change and the merge of #1022 was sending a proper POST request with a application/x-www-form-urlencoded content type according to the SPARQL 1.1 Protocol (query via URL-encoded POST; https://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/#query-operation).

ashleysommer · 2020-10-03T22:37:30Z

rdflib/plugins/stores/sparqlconnector.py

-
-        res.raise_for_status()
-
-    def close(self):


Why do we remove close() method?

Because the Store is actually stateless with each operation and there's nothing to close. I originally dropped open() too but it's needed if the Store is called by Graph("SPARQLStore") as opposed to SPARQLStore()

ashleysommer · 2020-10-03T22:40:12Z

@nicholascar
Why did you put the Python3.6+ changes into this requests removal PR?

nicholascar · 2020-10-04T10:13:24Z

Me being lazy. I saw Python 3.5 failures and wanted to jump over those! I thought we agreed to drop 3.5 so I was really just testing that out.

Also, since this PR is all about dropping dependencies, I wanted to reduce the encumbrances as much as possible.

Since there is a remaining Python 3.6 error, I'm going to have to address that in further edits here so could reintroduce Python 3.5 if you like.

We did agree to drop Python 3.5 right?

nicholascar · 2020-10-04T13:22:55Z

Weird that it fails tests here as it passed everything locally on Python 3.6: I ran on that version in particular to test. I think it's just a naming collision with json packages so perhaps easily solvable in the week.

ashleysommer · 2020-10-04T13:39:10Z

Yes, we did agree to drop python 3.5 for RDFLib v6.0.0, and have a minimum 3.6. Just wasn't expecting that change to come as part of the "Remove requests" PR. But it should be fine.

nicholascar · 2020-10-04T22:06:00Z

Regarding the single failure here, the message is OSError: [Errno 99] Cannot assign requested address. This is due to something in the setting of the testing machine, not the content of RDFlib code I think as the error doesn't appear running tests locally. It's something about setting ports or colliding with ports in use. Any thoughts on this one?

white-gecko

It is hard for me to understand the complete pull request, as it mixes a lot of different things. Would it be possible for you to split this into several unrelated pull-requests?

white-gecko · 2020-10-05T09:31:56Z

rdflib/plugins/stores/sparqlconnector.py

+            res = urlopen(Request(self.query_endpoint + qsa, headers=args["headers"]))
        elif self.method == "POST":
            args["headers"].update({"Content-Type": "application/sparql-query"})
-            args["data"] = params


We still need to submit default-graph-uri and named-graph-uri parameters. So the params is still required.

Also I think we should support both application/x-www-form-urlencoded and application/sparql-query content-types. But I'm unsure how we would determine which content-type is supported by the store we are speaking with.

white-gecko · 2020-10-05T09:36:56Z

rdflib/plugins/stores/sparqlstore.py

+    def predicate_objects(self, subject=None):
+        """A generator of (predicate, object) tuples for the given subject"""
+        for t, c in self.triples((subject, None, None)):
+            yield t[1], t[2]


Why do we need these methods?

These are all just convenience methods that are present in the normal Graph() interface and they are commonly used, so I thought they could just be present in the SPARQLStore too. This store returns quads, not triples, so I had to copy these methods into the store's own list of methods to re-implement them, else they would just break when someone called them.

white-gecko · 2020-10-05T09:53:57Z

Regarding the single failure here, the message is OSError: [Errno 99] Cannot assign requested address. This is due to something in the setting of the testing machine, not the content of RDFlib code I think as the error doesn't appear running tests locally. It's something about setting ports or colliding with ports in use. Any thoughts on this one?

This might show that there are to many open connections in to short time: https://stackoverflow.com/questions/11981933/python-urllib2-cannot-assign-requested-address?noredirect=1

This could be a limitation on the travis machine. Maybe we can also perform the tests with some local endpoint. Which might not be related to that issue.

nicholascar · 2020-10-05T11:30:13Z

Regarding the single failure here, the message is OSError: [Errno 99] Cannot assign requested address. This is due to something in the setting of the testing machine, not the content of RDFlib code I think as the error doesn't appear running tests locally. It's something about setting ports or colliding with ports in use. Any thoughts on this one?

This might show that there are to many open connections in to short time: https://stackoverflow.com/questions/11981933/python-urllib2-cannot-assign-requested-address?noredirect=1

This could be a limitation on the travis machine. Maybe we can also perform the tests with some local endpoint. Which might not be related to that issue.

When I perform the test locally, they all pass so I do think it is a Travis limitation.

nicholascar · 2020-10-05T11:31:24Z

We still need to submit default-graph-uri and named-graph-uri parameters.

OK, let me think about that! I might have missed something there.

nicholascar · 2020-10-05T11:33:11Z

Would it be possible for you to split this into several unrelated pull-requests?

Not really: the focus here was just to get rid of requests and requests is only use in 2 places in the entire codebase: a couple of tests and by SPARQLStore. This PR's really only just removing those references and then patching SPARQLStore to ensure that everything works.

The only optional things not about that are the convenience methods such as Graph().subjects() that you asked about in another comment.

Oh, and some very tiny updated to metadata such as adding us to CONTRIBUTORS.rst, but those are really tiny one-liners.

nicholascar · 2020-10-06T11:10:47Z

Open/Close to trigger Travis to see if we can overcome the port hogging test failure

nicholascar · 2020-10-06T14:16:53Z

Here's a demo fo default-graph-uri working , compared with the NG in the query:

from rdflib.plugins.stores.sparqlstore import SPARQLStore
ENDPOINT = "http://cgi.surroundaustralia.com:7200/repositories/cgi-vocs"
NAMED_GRAPH = "http://resource.geosciml.org/classifierscheme/cgi/2016.01/classification-method-used"
# NG in query, Graph then SPARQLStore
q = """
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    SELECT (COUNT(?s) AS ?c)
    WHERE {{
        GRAPH <{}> {{
            ?s a skos:Concept .
        }}
    }}
    """.format(NAMED_GRAPH)

g = Graph("SPARQLStore")
g.open(ENDPOINT)
for r in g.query(q):
    assert int(r['c']) == 14

st = SPARQLStore(query_endpoint=ENDPOINT)
for r in st.query(q):
    assert int(r['c']) == 14

# NG separate from query, Graph then SPARQLStore
q2 = """
    PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
    SELECT (COUNT(?s) AS ?c)
    WHERE {
        ?s a skos:Concept .
    }
    """

g = Graph("SPARQLStore", identifier=NAMED_GRAPH)
g.open(ENDPOINT)
for r in g.query(q2):
    assert int(r['c']) == 14

st = SPARQLStore(query_endpoint=ENDPOINT)
for r in st.query(q2, queryGraph=NAMED_GRAPH):
    assert int(r['c']) == 14

white-gecko · 2020-10-06T15:03:19Z

Are you sure it is executed via POST?

nicholascar · 2020-10-06T22:49:07Z

Are you sure it is executed via POST?

No, there seems to be no logic that allows SPARQLStore to ever use POST in SparqlConnector. SPARQLUpdateStore has a postAsEncoded variable but this also seems never to be used. In testing I manually set SparqlConnector's self.method value to POST to try it out - it works fine for most queries - but I haven't done as much testing as for GET.

I will write some POST tests next.

I think there were quite a few combinations of things not tested in the original SPARQLStore code but fixing them all wasn't the purpose of this PR, just to remove requests! I've had to fix lots of things to get it all working without requests though.

nicholascar · 2020-10-06T22:52:24Z

@ashleysommer @white-gecko please can you approve this PR? The single error in Travis above is, again urlopen error [Errno 99] Cannot assign requested address which is a Travis machine capacity issue, not a "real" test failure.

There are an endless number of things that I could keep fixing in SPARQLStore but that's not the goal of this PR - just the removal of requests.

Actually I'm using SPARQLStore in a production system now so will keep working on it. If this PR can be merged, I'll use master in the system (as opposed to 5.0.0) and that will let me keep testing SPARQLStore.

ashleysommer · 2020-10-07T05:54:16Z

@nicholascar
I'm not concerned about the one failing Travis issue. That is something I'll have to look into going forward, as I'm pretty sure it is bad practice to stand up a real web server in a testing environment for running tests against, and its bad practice to make real web requests in a testing environment tool.

If you're confident the POST issue raised by @white-gecko above isn't related to these changes (and it was like that already when using requests) then we can accept and merge this.
We'll have to create a new issue for the POST-related problem.

nicholascar · 2020-10-07T23:07:05Z

its bad practice to make real web requests in a testing environment tool.

Yes, I think we are going to have to better mock up things like the SPARQL endpoints that are needed for some of the tests so things like Fuseki aren't needed to be bundled into the tests.

If you're confident the POST issue raised by @white-gecko above isn't related to these changes (and it was like that already when using requests) then we can accept and merge this.

Yes I am confident: that's a different thing and part of the incomplete or now slightly broken SPARQLStore implementation that needs love and attention.

I'd like to raise a bunch of Issues for SPARQLStore and then fix them with some PRs straight after this one which I'll discuss this arvo. Having this PR through will set a base layer for me to test behaviour with. I can think of a whole bunch more tests like the ones for GET I implemented but for POST and also things like transactions testing etc. I'm concerned that transaction rollback etc isn't actually working but we don't normally notice since SPARQL Services that I use are highly available and thus don't often disappear mid way through a session.

ashleysommer · 2020-10-08T07:09:52Z

@nicholascar
Did you run black on these files after these changes?
I just ran black on the codebase for my PR, and I'm seeing lots of black errors in plugins/stores/sparqlstore and in sparqlwrapper and other sparql-related files.

white-gecko · 2020-10-07T06:44:20Z

rdflib/plugins/stores/sparqlconnector.py

-        return self._session.__dict__[k]
+        if auth is not None:
+            assert type(auth) == tuple, "auth must be a tuple"
+            assert len(auth) == 2, "auth must be a tuple (user, password)"


In my eyes it is no good practice to have assert in production code, so it should not be in a library.
The Python reference states:

The current code generator emits no code for an assert statement when optimization is requested at compile time.
https://docs.python.org/3/reference/simple_stmts.html#assert

So I think it should be replaced by if and raise and exception.

white-gecko · 2020-10-08T11:02:02Z

rdflib/plugins/stores/sparqlconnector.py

+            res = urlopen(Request(self.query_endpoint + qsa, headers=args["headers"]))
        elif self.method == "POST":
            args["headers"].update({"Content-Type": "application/sparql-query"})
-            args["data"] = params


The params attribute is still required or at least the information about the default-graph.

If you consider the following:

Load some context aware quad store (e.g. the QuitStore) with the following data:

graph <http://example.org/> { <http://example.org/ExampleInstance> a <http://example.org/Example> } graph <http://othergraph.org/> { <http://example.org/OtherInstance> a <http://example.org/Example> }

The query endpoint is assumed to be available at http://localhost:5000/sparql

class SPARQLStoreQuitStoreTestCase(unittest.TestCase): store_name = "SPARQLStore" path = "http://localhost:5000/sparql" create = False def setUp(self): store = SPARQLStore(query_endpoint=self.path, method="POST") self.conjunctivegraph = ConjunctiveGraph(store=store) def tearDown(self): self.conjunctivegraph.close() def test_Query(self): query = "select distinct ?inst where {?inst a <http://example.org/Example>}" graph = self.conjunctivegraph.get_context(URIRef("http://example.org/")) res = graph.query(query, initNs={}) assert len(res) == 1, len(res) for i in res: assert type(i[0]) == URIRef, i[0].n3() assert i[0] == URIRef("http://example.org/ExampleInstance"), i[0].n3()

The query is executed as POST request but does not convey the information about the default graph.

This issue is a combination of removing this line and sending the wrong Content-Type.
Actually, what the request was doing before this change and the merge of #1022 was sending a proper POST request with a application/x-www-form-urlencoded content type according to the SPARQL 1.1 Protocol (query via URL-encoded POST; https://www.w3.org/TR/2013/REC-sparql11-protocol-20130321/#query-operation).

white-gecko · 2020-10-08T11:21:02Z

Are you sure the urlopen error [Errno 99] Cannot assign requested address is not introduced by removing the session? As the requests.Session() is implementing connection pooling (https://2.python-requests.org/en/master/user/advanced/). It might be that by removing the session there are to many open connections which bring travis to the limit of pots to be used. On your local machine you might have more open ports available. I'm unsure if this could also happen in some use cases, where a lot of queries are sent.

Nicholas Car added 5 commits October 3, 2020 13:53

removing flake8, requests, doctest-ignore-unicode

6eb4ccd

replace requests with urllib

a30fb38

replace requests with urllib for SPARQLStore/SPARQLUpdateStore, updat…

7cb6bb4

…e relevant e.g.s

add nose to run tests, doctest-... moved from requirements.txt

e354353

removal of comments referring to requests

0e03ec8

nicholascar requested review from ashleysommer and white-gecko October 3, 2020 11:19

Nicholas Car added 6 commits October 3, 2020 21:21

removed black as it's not a requirement (nor flake8)

9461e55

re-adding flake8 for Travis

80713a6

apparently flake8 is needed in requirements.txt

b7025f5

more requirements shuffling for Travis

3c63d05

update Fuseki version

b8a2647

update CONTRIBUTORS with maintainers, update LICENSE to 2020

67b8058

updated to support Python 3.6+ only (not 3.5)

f473263

ashleysommer reviewed Oct 3, 2020

View reviewed changes

white-gecko reviewed Oct 5, 2020

View reviewed changes

nicholascar closed this Oct 6, 2020

nicholascar reopened this Oct 6, 2020

Nicholas Car added 2 commits October 6, 2020 22:13

de-duplicate SPARQLStore injected PREFIXes

4665306

fixed SPARQLStore Graph().query() GET, with tests

1ddb2a3

update SPARQLStore test due to Error type change

c923eb5

ashleysommer approved these changes Oct 8, 2020

View reviewed changes

ashleysommer merged commit 7a53c61 into master Oct 8, 2020

white-gecko reviewed Oct 8, 2020

View reviewed changes

This was referenced Oct 8, 2020

Fix usage of default-graph for POST and introduce POST_FORM #1185

Merged

Remove the usage of assert in the SPARQLConnector #1186

Merged

FlorianLudwig mentioned this pull request Oct 23, 2020

code formatting guidelines #1159

Closed

nicholascar mentioned this pull request Dec 17, 2020

Simple SPARQL queries not working on python 3.9.1 / RDFlib 5.0.0 #1215

Closed

nilutz mentioned this pull request Jan 19, 2021

support headers in SPARQLConnector, allow urlencoding and take care of long uri's in commit #1230

Closed

white-gecko added this to the rdflib 6.0.0 milestone Mar 22, 2021

nicholascar deleted the remove_requests branch July 12, 2021 22:52

delocalizer mentioned this pull request Jul 23, 2021

6.x: functionality regression from losing requests in SPARQLStore #1372

Open

aucampia mentioned this pull request Aug 21, 2022

Cannot parse charset parameter from "application/rdf+xml; charset=UTF-8" #1195

Closed

Remove requests #1175

Remove requests #1175

Uh oh!

Conversation

nicholascar commented Oct 3, 2020

Uh oh!

coveralls commented Oct 3, 2020

Uh oh!

coveralls commented Oct 3, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ashleysommer commented Oct 3, 2020

Uh oh!

nicholascar commented Oct 4, 2020

Uh oh!

nicholascar commented Oct 4, 2020

Uh oh!

ashleysommer commented Oct 4, 2020

Uh oh!

nicholascar commented Oct 4, 2020

Uh oh!

white-gecko left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

white-gecko commented Oct 5, 2020

Uh oh!

nicholascar commented Oct 5, 2020

Uh oh!

nicholascar commented Oct 5, 2020

Uh oh!

nicholascar commented Oct 5, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicholascar commented Oct 6, 2020

Uh oh!

nicholascar commented Oct 6, 2020

Uh oh!

white-gecko commented Oct 6, 2020

Uh oh!

nicholascar commented Oct 6, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nicholascar commented Oct 6, 2020

Uh oh!

ashleysommer commented Oct 7, 2020

Uh oh!

nicholascar commented Oct 7, 2020

Uh oh!

ashleysommer commented Oct 8, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

white-gecko commented Oct 8, 2020

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

coveralls commented Oct 3, 2020 •

edited

Loading

nicholascar commented Oct 5, 2020 •

edited

Loading

nicholascar commented Oct 6, 2020 •

edited

Loading