Skip to content

protoc-gen-go: consider applying M option to prefixes #992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
bufdev opened this issue Nov 26, 2019 · 14 comments
Open

protoc-gen-go: consider applying M option to prefixes #992

bufdev opened this issue Nov 26, 2019 · 14 comments
Milestone

Comments

@bufdev
Copy link

bufdev commented Nov 26, 2019

I was testing out the v2 plugin and got this:

2019/11/26 11:17:11 WARNING: Missing 'go_package' option in "bufbuild/buf/image/v1beta1/image.proto", please specify:
	option go_package = "bufbuild/buf/image/v1beta1;bufbuild_buf_image_v1beta1";
A future release of protoc-gen-go will require this be specified.

Can this be discussed? I would strongly argue against requiring long-form go_package values in the Protobuf schema definition. This ties a Protobuf schema to a specific generated code location, effectively, which is undesirable in many situations. For example, a lot of existing customers generate code on the fly in individual repositories with different base Golang modules, and this will effectively outlaw that (not that doing so is a good decision, but it's a widely used pattern).

This seems very restrictive, and will many a bunch of existing build tools around Protobuf not work. Personally, I've always strongly recommended against using long-form go_package values because of the multitude of ways that users consume Protobuf definitions.

@neild
Copy link
Contributor

neild commented Nov 26, 2019

When one .proto file imports another, the code generator needs to know the Go package associated with each file.

For example, consider:

// a.proto
syntax = "proto3";
import "b.proto";
message A {
  B field_b = 1;
}
// b.proto
syntax = "proto3";
message B {}

When we generate a.pb.go, we need to know if the types A and B are in the same Go package or not. If they are in the same package, a.pb.go will contain:

type A {
  FieldB B
}

But if they are not in the same package, a.pb.go will contain:

import "example.com/package/b"
type A {
  FieldB b.B
}

But how do we tell what the Go package associated with a .proto file is?

The protobuf language has a concept of a package (e.g., package google.protobuf;), but protobuf packages don't map directly to Go packages. In particular, the protobuf language permits cycles between packages (but not .proto files), while the Go language forbids cycles. In addition, we have no way to derive the Go import path from the protobuf package.

Relying on the filename doesn't work, because there's no guarantee that the filesystem layout of the .proto files matches that of the Go packages. For example, the file google/protobuf/any.proto corresponds to the Go package "github.com/golang/protobuf/ptypes/any".

Currently, protoc-gen-go has a bunch of heuristics that it attempts to apply, but they're complicated, difficult to describe, and still get things wrong:
https://go.googlesource.com/protobuf/+/refs/heads/master/compiler/protogen/protogen.go#212

The best way to tell the code generator the Go package associated with a .proto file is to just tell it.

@bufdev
Copy link
Author

bufdev commented Nov 26, 2019

Yes for sure - I know it's a difficult problem. I was actually on the original issue that added go_package haha #139 and back in the day I supported it.

However, I think while the original intention was good, it's had some really bad unintended consequences that effectively have led to many (including myself) not using long-term go_package values and recommending against them entirely.

The Protobuf schema shouldn't be used to dictate this so strongly, there should be flexibility in generation. However, I recognize that I'm saying this is a problem without proposing a solution. I think we should come up with something better in the long term.

Just as a simple idea, one option that I think covers many use cases is a variation on the M option (also could be thought of as a variation on the import_prefix option), of the form Mdir=prefix, where all .proto files inside DIR are prefixed with `PREFIX. For example:

protoc -I proto --go_out=gen/go --go_opt Mproto=github.com/foo/bar/gen/go $(find proto -name '*.proto')`

This doesn't handle the github.com/golang/protobuf/ptypes/any case, but this is the exception, not the rule, and generally packages should match the directory structure of the Protobuf files as a rule. For those who need to break glass, go_package is still available.

@dsnet
Copy link
Member

dsnet commented Nov 26, 2019

When it comes to deriving the Go package paths for generated packages, there's primarily two ways to sensibly do it: 1) encode it in the .proto file (which is what the go_package option is for), or 2) encode it in the command-line when calling protoc (which is what the -M flag is for). Both approaches have advantages and disadvantages.

The intention of the warning is to push the world towards either of those two. Inside Google, we primarily use the -M approach since we have a mono-repo. IIRC, the warning should not be present if the -M flag is present. If it is, I don't think that was my intention.

Also, the -M flag could be improved, but I think that's a separate discussion.

@bufdev
Copy link
Author

bufdev commented Nov 26, 2019

I'll double check if the warning is present with with M flag or not - I have not tested that. Yes, I primarily use the M flag as well.

Note that it would be worth having that discussion - the M flag is a pain to set up on an automated basis, and for large, single protoc invocations, it can actually hit an issue where the maximum length of a protoc argument list is 30,000 characters (I think, it's something around 30,000, doing this from memory). So for 1000 files with long paths, it can hit that limit (I've hit it before). The Mdir=prefix option would get around that.

@neild
Copy link
Contributor

neild commented Nov 26, 2019

I'd want to think about it, but Mdir=prefix sounds reasonable.

I definitely feel the pain of trying to use the M flag manually. It's really only practical if you've got a build tool like bazel handling it for you.

@bufdev
Copy link
Author

bufdev commented Nov 26, 2019

That would be great. I'm happy to lend a hand if you're open to PRs.

Another related note, but probably worthy of a new issue: I would love to be able to generate assets produced byprotoc-gen-go-grpc to a different Golang package than the assets produced by protoc-gen-go, so that the grpc-go dependency is not enforced for packages that do not use gRPC. This might already be done/this might already have been one of the driving factors behind splitting up the plugins, I haven't tested it yet, but if, for example, the protoc-gen-go-grpc plugin took a package path that it is being generated to (or if it could auto-derive this optimally), and then if the Golang package of the .proto files that contain the request/response types is different than this package, it did the import...this would be great.

Having this ability to specify the location of the core Golang types from protoc-gen-go as part of compiler/protogen would be great for arbitrary Golang protoc plugins as well, i.e. protoc-gen-validate, protoc-gen-twirp, protoc-gen-grpc-gateway, etc...I have a lot of hopes that compiler/protogen can be the base for a lot of plugins in the future (which I think is the intention).

@neild neild changed the title APIv2: Long-form go_package requirement protoc-gen-go: consider applying M option to prefixes Mar 3, 2020
gopherbot pushed a commit to protocolbuffers/protobuf-go that referenced this issue Mar 20, 2020
Add a generator option that strips a prefix from the generated
filenames.

Consider this case: We have google/protobuf/empty.proto, with a
go_package option of "google.golang.org/protobuf/types/known/emptypb".
We want to generate the code for this file, placing it into the
appropriate directory of our repository.

In the default mode used by the code generator (paths=import),
the generator outputs the file:

	google.golang.org/protobuf/types/known/emptypb/empty.pb.go

This is close to what we want, but has an unnecessary
"google.golang.org/protobuf/" prefix. In the GOPATH world, we could pass
--go_out=$GOPATH to protoc and get a generated file in the desired
location, but this path is not useful in the modules world.

The 'module' option allows us to strip off the module prefix, generating
the desired filename (types/known/emptypb/empty.pb.go):

	protoc --go_out=. --go_opt=module=google.golang.org/protobuf google/protobuf/empty.proto

The module name must be an exact, character-for-character match. This
matches protoc's file handling in general.

Default to and require the paths=import option when module= is
specified, since it only makes sense when combined with it.

Updates golang/protobuf#992.

Change-Id: Idbfe4826b6c0ece30d64dbc577131a4f16391936
Reviewed-on: https://go-review.googlesource.com/c/protobuf/+/219298
Reviewed-by: Joe Tsai <[email protected]>
@neild
Copy link
Contributor

neild commented Mar 23, 2020

A couple thoughts:

  1. What happens if you specify a prefix of ""? It seems like someone might want to do this to specify that files in the root directory are part of a certain Go package, but this would result in the prefix being applied to every .proto file including well-known ones like google/protobuf/timestamp.proto.

    Perhaps we could disallow the root prefix, but the general problem of a prefix possibly applying to more files than intended seems like an issue.

  2. The problem of per-file M options overrunning the maximum argument list size seems like it might be better solved by providing the import path mapping in a file. Using prefixes to work around argument length limits doesn't scale (what if you have too many prefixes?). The Objective C generator offers precedent for this.

@bufdev
Copy link
Author

bufdev commented May 8, 2020

Just to revitalize this discussion: where did we land on this?

Requiring go_package on the same major release, even if documented, is going to break tens of thousands of developers using golang/protobuf today, I don't think that can be done to the golang/protobuf community IMO

@bufdev
Copy link
Author

bufdev commented May 8, 2020

A couple thoughts:

1. What happens if you specify a prefix of ""? It seems like someone might want to do this to specify that files in the root directory are part of a certain Go package, but this would result in the prefix being applied to every `.proto` file including well-known ones like `google/protobuf/timestamp.proto`.
   Perhaps we could disallow the root prefix, but the general problem of a prefix possibly applying to more files than intended seems like an issue.

I think I'm missing something on prefix of "", do you mean M""=prefix or Mdir=""? I think you are referring to the former (correct me if I'm wrong). Otherwise, would dir=prefix affect google/protobuf/* files?

I feel like this only comes into play if you had google/protobuf vendored right? ie I had protoc -I proto --go_out=Mproto=github.com/foo/bar/gen/go:gen/go and then i.e. proto/google/protobuf/timestamp.proto existed. I think in this case, it would be on the user to have a separate vendor/proto or third_party/proto directory for such files, and then protoc -I vendor/proto -I proto.

2. The problem of per-file `M` options overrunning the maximum argument list size seems like it might be better solved by providing the import path mapping in a file. Using prefixes to work around argument length limits doesn't scale (what if you have too many prefixes?). The Objective C generator offers [precedent](https://github.com/protocolbuffers/protobuf/blob/master/src/google/protobuf/compiler/objectivec/objectivec_generator.cc#L108) for this. 

Might be nice, I still think dir=prefix solves the general case though - I've had to program around this in a former career, heh.

@neild
Copy link
Contributor

neild commented May 8, 2020

I mean M""=prefix.

The proto compiler doesn't provide local filesystem paths to protoc-gen-go. Let's say you run

protoc -Imyprotos --go_opt=M=prefix --go_out=. myprotos/foo.proto

If myprotos/foo.proto imports google/protobuf/timestamp.proto, we (protoc-gen-go) can't tell whether that came from myprotos/google/protobuf/timestamp.proto, /usr/share/include/google/protobuf/timestamp.proto, or some other location.

@bufdev
Copy link
Author

bufdev commented May 8, 2020

Yea I wasn't thinking clearly, definitely true re: local fs paths (and the bane of my existence).

Some other options:

  • Straight up exclude the WKTs for dir=prefix. This might be interesting if combined with protoc-gen-go also automatically implicitly setting i.e. Mgoogle/protobuf/timestamp.proto=google.golang.org/protobuf/types/known/timestamppb (it might be doing this already?). Then, if a user really wants to set a golang path for this (for example, say gogo/protobuf is updated to v2), if they explicitly provide Mgoogle/protobuf/timestamp.proto=github.com/my/fork/timestamppb, then it uses that.
  • Mimport_path=prefix instead, ie if you have -I proto, and proto/foo directory, using foo instead of proto/foo. This is probably better as a new option though, so there's no confusion.

@dcow
Copy link

dcow commented Sep 9, 2020

I whipped up https://go-review.googlesource.com/c/protobuf/+/240238 when sketching out a fix for #1158 a few months ago. @dsnet pointed me over here to this issue. It might not be the full solution being discussed here, but I'm interested if people think it's headed in the right direction, or if we need extensive rework to make sense of things.

@AgentCoop
Copy link

AgentCoop commented Feb 5, 2021

An expansion of environment variables would solve the problem:

option go_package = "$PKG/foo/bar";
protoc -e PKG=somedir ...

This mechanism could be used for providing values for other options as well.

By the way, for the time being I solve this problem using an old-school technique:

rm -rf $PROTO_BUILD/*
cp -r $PROTO_ROOT/* $PROTO_BUILD/
find $PROTO_BUILD -name '*.proto' -exec sed -r -i "s@option go_package = \"(.*)\";@option go_package = \"$GITHUB_PKG/\1\";@g" {} \;

Hope that might help someone.

@dsnet dsnet added this to the unplanned milestone Mar 29, 2021
@Jrenk
Copy link

Jrenk commented May 11, 2022

Is there a solution now that is something like the one @bufdev proposed Mdir=prefix?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants