Skip to content

Conversation

@daspecster
Copy link
Contributor

Fixes #2956.

@daspecster daspecster added the api: storage Issues related to the Cloud Storage API. label Mar 16, 2017
@googlebot googlebot added the cla: yes This human has signed the Contributor License Agreement. label Mar 16, 2017
@dhermes
Copy link
Contributor

dhermes commented Mar 16, 2017

@daspecster I am 👎 on this change, not really sure what the right approach is but adding this check in the constructor makes EVERY constructed bucket pay the price


def __init__(self, name=None):
self.name = name
if name is None or (re.match(r'\w', name[0]) and

This comment was marked as spam.

This comment was marked as spam.

@daspecster
Copy link
Contributor Author

@dhermes you mean every existing bucket? I don't believe there could be any existing buckets that violate this rule? I think the API would have complained.

#2956 (comment)

@dhermes
Copy link
Contributor

dhermes commented Mar 16, 2017

@daspecster No I mean the local Bucket objects created by users of this library

@tseaver
Copy link
Contributor

tseaver commented Mar 16, 2017

Quoting my follow up to #2956:

If the API allows for / in the bucket name, then we shouldn't be prohibiting such names: instead, we should be URL-escaping the bucket name when constructing the API request URL.

@daspecster
Copy link
Contributor Author

@lukesneeringer @tseaver any alternative ideas?

ISTM that object creation would be the place to check this? Maybe there's a more efficient way to do it though?

@dhermes
Copy link
Contributor

dhermes commented Mar 16, 2017

I think we can "fix" by just more clearly documenting the acceptable values.

@tseaver
Copy link
Contributor

tseaver commented Mar 16, 2017

@dhermes If the API allows embedding / in a bucket name, then we have to honor that, which means that we need to be URL-escaping the bucket name when creating API URLs

@daspecster
Copy link
Contributor Author

@tseaver a user might be confused when then see their bucket names listed from some other library that may/maynot handle the urlencoding/urldecoding right?

@dhermes sure, but the error that is returned is a 404 which doesn't explain what the allowed values are.
I'll update the docs to be more clear about the allowable name values since that should be in here anyway.

@lukesneeringer
Copy link
Contributor

@tseaver, you are making an orthogonal point; in actuality, slashes are not permissible at the start or end of a bucket name (by the API), but the client library allows them to pass through.

@tseaver
Copy link
Contributor

tseaver commented Mar 16, 2017

@lukesneeringer the OP in #2956 reported having created a bucket with a trailing slash.

@lukesneeringer
Copy link
Contributor

lukesneeringer commented Mar 16, 2017

@tseaver Yes, in our library, and then the API dropped the slash for the bucket name.

@lukesneeringer
Copy link
Contributor

@daspecster I am 👎 on this change, not really sure what the right approach is but adding this check in the constructor makes EVERY constructed bucket pay the price

That seems fine to me. It is not terribly expensive.

Copy link
Contributor

@lukesneeringer lukesneeringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, pending @dhermes explaining his problem with it and resolving that.

@dhermes
Copy link
Contributor

dhermes commented Mar 17, 2017

My issue is just that RegEx is expensive. In terms of this specific approach, we don't even need a regex:

name[0].isalnum()
name[-1].isalnum()

Though this is a little strange with unicode (2) / str (3)

>>> u'\xff'.isalnum()
True

@lukesneeringer
Copy link
Contributor

lukesneeringer commented Mar 17, 2017

Sold. Change to that, and add len(name) <= 222.

EDIT: We should also add all([len(i) <= 63 for i in name.split('.')]).

That is good enough. The API can complain about the remaining errors (e.g. "not containing google or a common misspelling thereof").

@lukesneeringer
Copy link
Contributor

lukesneeringer commented Mar 17, 2017

@daspecster Suggestion:

class _PropertyMixin(object):
   def __init__(self, name=None):
        if name:
            self._validate_name(name)
        [...]

    def _validate_name(self, name):
        # Names must start and end with a letter or number.
        if not name[0].isalnum() or not name[-1].isalnum():
            raise ValueError('Bucket names must start and end with an alphanumeric character.')

        # Names must be between 3 and 222 total characters in length.
        if len(name) < 3:
            raise ValueError('Bucket names must be at least 3 characters.')
        if len(name) > 222:
            raise ValueError('Bucket names can not exceed 222 characters.')

        # Each bucket name component can not exceed 63 characters.
        if any([len(i) > 63 for i in name.split('.')]):
            raise ValueError('Each dot-separated component in a bucket name '
                             'can not exceed 63 characters.')

Copy link
Contributor

@lukesneeringer lukesneeringer left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated based on previous discussion.

@daspecster
Copy link
Contributor Author

@lukesneeringer where are you seeing the 63 char limit in the docs?

@lukesneeringer
Copy link
Contributor

Bucket names must contain 3 to 63 characters. Names containing dots can contain up to 222 characters, but each dot-separated component can be no longer than 63 characters.

@daspecster
Copy link
Contributor Author

@lukesneeringer found the link. Thanks!

@dhermes
Copy link
Contributor

dhermes commented Mar 17, 2017

There is just so much complexity here. Why can't we just punt to the server validation and document the rules in our docstring?

@daspecster
Copy link
Contributor Author

To echo @dhermes, there are more requirements for the names as well and I assume they're subject to change(with or without bumping the API version).

Your bucket names must meet the following requirements:

Bucket names must contain only lowercase letters, numbers, dashes (-), underscores (_), and dots (.). Names containing dots require verification.
Bucket names must start and end with a number or letter.
Bucket names must contain 3 to 63 characters. Names containing dots can contain up to 222 characters, but each dot-separated component can be no longer than 63 characters.
Bucket names cannot be represented as an IP address in dotted-decimal notation (for example, 192.168.5.4).
Bucket names cannot begin with the "goog" prefix.
Bucket names cannot contain "google" or close misspellings of "google".
Also, for DNS compliance and future compatibility, you should not use underscores (_) or have a period adjacent to another period or dash. For example, ".." or "-." or ".-" are not valid in DNS names.

@lukesneeringer
Copy link
Contributor

I guess what I was thinking was, "catch the easy ones, let the API fail on the rest".
The issue with the forward slash in particular is that it gets stripped by the API, which leads to non-obvious errors later. So, at minimum, we should catch the alphanumeric 0 and -1 indices.

I could go either way on the rest.

@daspecster daspecster force-pushed the add-bucket-name-exception branch from 99ce2e0 to 79f107d Compare March 17, 2017 19:56
@daspecster
Copy link
Contributor Author

@lukesneeringer did you have anything else for this?

@lukesneeringer lukesneeringer merged commit 1b5e35c into googleapis:master Mar 20, 2017
@daspecster daspecster deleted the add-bucket-name-exception branch March 20, 2017 18:05
richkadel pushed a commit to richkadel/google-cloud-python that referenced this pull request May 6, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

api: storage Issues related to the Cloud Storage API. cla: yes This human has signed the Contributor License Agreement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants