-
-
Notifications
You must be signed in to change notification settings - Fork 336
Filters sort and uniq #7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, I'd like that too! Could you provide a short example program of the kind that you'd like to write in |
The use case is normally to process some kind of log file where I try to find out the frequency of some thing. So in
The |
Great! Can you turn this into a rough example of the Go program you'd like to write using |
One challenge is, that the unix tools like I think, something like this would be great (already combined with the functional options, which are placed in distinct packages for better readability): script.Stdin().Match("the log lines of interest").Col(" ", 5).Sort().Uniq(uniq.WithCount()).Sort(sort.Reverse(), sort.Numeric()).Stdout() |
I'm completely in sympathy with your ideas, but one of the design principles for script.Stdin().Match("the log lines of interest").Column(5).Uniq().Sort().CountLines().Stdout() What do you think? |
@bitfield In my opinion, the sample you have put together does not the same as my command, but there is also one simplification, that I like. The simplificationFor the unix command The differencesColumn(5)I can life with the fact, that the default behaviour of UniqThe default behaviour of the unix command
So if you want to have a "simple" API, you would need something like SortThe default behaviour of the unix command CountLinesThis does not work, because I am not interested in the number of lines after Alternative Proposal
So this boils down to the question, what exactly is considered an "simple" API. I feel like removing the posibility for options/flags leads to a very broad (but still not very flexible) API. Alone the case for
You can imagine your self, where this leads with more options added. The unix command Bottom lineI personally prefer the API with the functional options. But this is your package, so you have to give the guidelines. |
Very thoughtful contribution, thank you! You make an excellent point: that if you try to consider every possible way someone might want to use your functionality, you end up with a very complicated API (one way or the other). Either you have to provide lots of variants of no-config methods, or methods with lots of config. This is exactly why I'm being such a hardass about use cases. I've rejected a lot of my own feature proposals on the basis that they're just something I thought would be cool, without having an actual, real requirement for them in a production program. It sounds like the use case you're proposing is something roughly like this:
(That's quite complex! What specific real-world thing is it you're trying to do here? It's always helpful to know the concrete details.) Now, we don't necessarily have to do this in a one-liner. That's the nice thing about a Go library; we can just use Go for anything the library doesn't have built in. But if we were to do this entirely in a
It's worth noting that I'm not particularly interested in following the API of specific Unix commands, except where it's obvious enough that it doesn't matter if you're familiar with the Unix equivalent.) So here's a counter-counter-proposal, for five specific features:
Here's how your use case might be implemented using these: script.Stdin().Match("foo").Column(5).CountFreq().SortNumeric().Reverse().First(10).Stdout() |
By the way, one thing I often do which is similar to this is to analyse webserver logs to find the IP address generating the most requests. For example, I |
PS: maybe |
My real world example I had in mind, when you asked me about a specific use case was pretty similar to the use case you outline in your above comment. In my case it was the log of an ssh daemon and I was interested in the source IP addresses from where the most failed login attempts have been made. This use case I obviously solved with a bash one-liner. If I use a particular bash one-liner often, I put it in a script. Maybe later, some additional logic is added to process the logs in different ways. In the end I have a bash script, that would be better implemented in something like Go. So this is the way it normally takes in my case. Therefore I would prefere an API, that is similar to the unix commands I already use, because the translation process from the existing bash one-liner or even script to a Go program with this package would become easier. That being said, your counter-counter proposal covers my use case pretty good. Based on your last comment, it could be simplified to: script.Stdin().Match("foo").Column(5).CountFreq().First(10).Stdout() where |
Yes, understood. By and large the Unix API is clear, straightforward, and unambiguous, and where that's so, I'm happy to adopt it. Some things or options are called by weird names, for historical reasons, and where there's an alternative that's clearer to people who don't write shell scripts, I'll use that instead. Clarity first, Unix familiarity second. For example, |
script.Stdin().Match("foo").Column(5).CountFreq().First(10).Stdout() This looks very good to me. Are you interested in working on implementation of any of these? I might start putting a PR together with a log-line counting example, and the code to make it work. |
Hi, for a side note The name
So I think |
I understand why you might feel that way, but here's another way of looking at it: The primary thing that the method does is to count frequency. As a side effect of this it reduces the input to unique lines. But if you just wanted unique lines, you wouldn't use this method, because it prepends frequency counts. I can quite see that one might want a method which produces unique lines, and that method should be called |
Looks good to me now . |
@breml thanks for all the help on this one! Do you have everything you want now in v0.9.0? |
LGTM, thanks. |
First of all, this is a really interesting package, thanks for the effort.
In my scripts I often use the shell commands
sort
anduniq
so I feel these two would be a good additions for this package.The text was updated successfully, but these errors were encountered: