Skip to content

Improvements to vim plugin and LSP server #1144

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 36 commits into from
Aug 27, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
36 commits
Select commit Hold shift + click to select a range
aa73166
Initial proof of concept Vim plugin
AustinMroz Jul 23, 2023
40991c1
Support $WHISPER_CPP_HOME environment variable
AustinMroz Jul 23, 2023
e693528
Initial progress on LSP implementation
AustinMroz Jul 27, 2023
829a031
Rewrite audio windowing of guided transcription
AustinMroz Jul 27, 2023
5ef3b49
Add unguided_transcription. Cleanup.
AustinMroz Jul 27, 2023
208b980
Fix compilation.
AustinMroz Jul 27, 2023
708593c
Functional unguided_transcription
AustinMroz Jul 27, 2023
506eb07
Functional guided_transcription
AustinMroz Jul 27, 2023
95f582e
Minor changes before time fix
AustinMroz Jul 28, 2023
639adce
Swap timekeeping to use std::chrono
AustinMroz Jul 28, 2023
443b402
Add work in progress lsp backed whisper.vim plugin
AustinMroz Jul 28, 2023
da3a29a
Reworked vim plugin command loop
AustinMroz Jul 29, 2023
642e73f
Fix change inside
AustinMroz Jul 29, 2023
838bd1c
Forcibly set commandset_index to 0 after subinsert
AustinMroz Jul 30, 2023
fabacbb
Fix upper
AustinMroz Jul 30, 2023
4073197
Fix formatting
AustinMroz Jul 30, 2023
e38827d
Remove obsolete vim plugin
AustinMroz Jul 30, 2023
fce2768
Add json.hpp library
AustinMroz Jul 30, 2023
d2796a3
Minor cleanups
AustinMroz Aug 2, 2023
9caa032
Fix indentation. Fallback for subTranscription
AustinMroz Aug 4, 2023
b437941
Move audio polling logic to a subfunction
AustinMroz Aug 5, 2023
06e828a
Test for voice over subchunks if backlog > 1s
AustinMroz Aug 5, 2023
456942d
Limit the maximum length of audio input.
AustinMroz Aug 6, 2023
9b905d5
Unguided timestamp tracking, cleanup
AustinMroz Aug 6, 2023
cdc5b81
By default, maintain mode.
AustinMroz Aug 7, 2023
f088117
Add undo breaks before subtranscriptions
AustinMroz Aug 7, 2023
df05a4c
Append instead of insert for new undo sequence
AustinMroz Aug 8, 2023
19d26f1
Move undo sequence breaks to command execution
AustinMroz Aug 8, 2023
d4706ee
Fix repeat. Add space, carrot, dollar commands
AustinMroz Aug 9, 2023
f6b63ca
Return error on duplicate in commandset
AustinMroz Aug 10, 2023
fc0fcd6
Add support for user-defined commands
AustinMroz Aug 10, 2023
5bbdb8a
Add readme, update cmake
AustinMroz Aug 12, 2023
6270105
Add area commandset. Refactor spoken_dict
AustinMroz Aug 25, 2023
fae41de
Add mark, jump. Fix change under visual.
AustinMroz Aug 25, 2023
846ed47
Accommodate ignorecase. Fix change.
AustinMroz Aug 26, 2023
8a149f1
Support registers. Fix README typo
AustinMroz Aug 26, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ build-sanitize-thread/
/talk-llama
/bench
/quantize
/lsp

arm_neon.h
sync.sh
Expand Down
5 changes: 4 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -280,7 +280,7 @@ libwhisper.so: ggml.o $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) -shared -o libwhisper.so ggml.o $(WHISPER_OBJ) $(LDFLAGS)

clean:
rm -f *.o main stream command talk talk-llama bench quantize libwhisper.a libwhisper.so
rm -f *.o main stream command talk talk-llama bench quantize lsp libwhisper.a libwhisper.so

#
# Examples
Expand All @@ -307,6 +307,9 @@ stream: examples/stream/stream.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHIS
command: examples/command/command.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/command/command.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ) -o command $(CC_SDL) $(LDFLAGS)

lsp: examples/lsp/lsp.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/lsp/lsp.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ) -o lsp $(CC_SDL) $(LDFLAGS)

talk: examples/talk/talk.cpp examples/talk/gpt-2.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ)
$(CXX) $(CXXFLAGS) examples/talk/talk.cpp examples/talk/gpt-2.cpp $(SRC_COMMON) $(SRC_COMMON_SDL) ggml.o $(WHISPER_OBJ) -o talk $(CC_SDL) $(LDFLAGS)

Expand Down
1 change: 1 addition & 0 deletions examples/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -69,4 +69,5 @@ else()
add_subdirectory(quantize)
add_subdirectory(talk)
add_subdirectory(talk-llama)
add_subdirectory(lsp)
endif()
9 changes: 9 additions & 0 deletions examples/lsp/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
if (WHISPER_SDL2)
# stream
set(TARGET lsp)
add_executable(${TARGET} lsp.cpp)

include(DefaultTargetOptions)

target_link_libraries(${TARGET} PRIVATE common common-sdl whisper ${CMAKE_THREAD_LIBS_INIT})
endif ()
104 changes: 104 additions & 0 deletions examples/lsp/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,104 @@
# Language Server

This example consists of a simple language server to expose both unguided
and guided (command) transcriptions by sending json messages over stdout/stdin
as well as a rather robust vim plugin that makes use of the language server.

## Vim plugin quick start

Compile the language server with

```bash
make lsp
```
Install the plugin itself by copying or symlinking whisper.vim into ~/.vim/autoload/

In your vimrc, set the path of your whisper.cpp directory and optionally add some keybinds.

```vim
let g:whisper_dir = "~/whisper.cpp"
" Start listening for commands when Ctrl - g is pressed in normal mode
nnoremap <C-G> call whisper#requestCommands()<CR>
" Start unguided transcription when Ctrl - g is pressed in insert mode
inoremap <C-G> <Cmd>call whisper#doTranscription()<CR>
```

## Vim plugin usage

The vim plugin was designed to closely follow the mnemonics of vim

`s:spoken_dict` is used to translate keys to their spoken form.


Keys corresponding to a string use that spoken value normally and when a motion is expected, but use the key itself when a character is expected.
Keys corresponding to a dict, like `i`, can have manual difinitions given to each possible commandset.

0 is normal (insert), 1 is motion (inside), 2 is it's usage as a single key ([till] i), and 3 is it's usage in an area selection (s -> [around] sentence)

Some punctuation items, like `-` are explicitly given pronunciations to prevent them from being picked as punctuation instead of an actual command word.

Not all commands will tokenize to a single token and this can interfere with interpretation. "yank" as an example, takes multiple tokens and correspondingly, will give more accurate detection when only the first "ya" is used. While it could be changed to something else that is a single token (copy), value was placed on maintaining vim mnemonics.

Commands that would normally move the editor into insert mode (insert, append, open, change) will begin unguided transcription.
Unguided transcription will end when a speech segment ends in exit.
Presence of punctuation can be designated by whether or not you add a pause between the previous speech segment and exit.
Exiting only occurs if exit is the last word, so "Take the first exit on your right" would not cause transcription to end.

After a command is evaluated, the plugin will continue listening for the next command.

While in command mode, "Exit" will end listening.

A best effort approach is taken to keep track of audio that is recorded while a previous chunk is still processing and immediately interpret it afterwards, but the current voice detection still needs a fairly sizable gap to determine when a command has been spoken.

Log information is sent to a special `whisper_log` buffer and can be accessed with
```vim
:e whisper_log
```

## Vim plugin configuration

`g:whisper_dir`
A full path to the whisper.cpp repo. It can be expanded in the definition like so:
```vim
let g:whisper_dir = expand("~/whisper.cpp/")
```
(The WHISPER_CPP_HOME environment variable is also checked for users of the existing whisper.nvim script)

`g:whisper_lsp_path`
Can be used to manually set the path to the language server.
If not defined, it will be inferred from the above whisper_dir

`g:whisper_model_path`
A full path to the model to load. If not defined, it will default to ggml-base.en.bin

`g:whisper_user_commands`
A dictionary of spoken commands that correspond to either strings or funcrefs.
This can be used to create connections with other user plugins, for example
```vim
let g:whisper_user_commands = {"gen": "llama#doLlamaGen"}
```
will trigger the llama.cpp plugin to begin generation when "gen" is spoken

## Language server methods

`registerCommandset`
`params` is a list of strings that should be checked for with this commandset. The server prepends a space to these strings before tokenizing.
Responds with
`result.index` an integer index for the commandset registered, which should be included when initiating a guided transcription to select this commandset.
Will return an error if any of the commands in the commandset have duplicate tokenizations

`guided`
`params.commandset_index` An index returned by a corresponding commandset registration. If not set, the most recently registered commandset is used.
`params.timestamp` A positive unsigned integer which designates a point in time which audio should begin processing from. If left blank, the start point of audio processing will be the moment the message is recieved. This should be left blank unless you have a timestamp from a previous response.
Responds with
`result.command_index` The numerical index (starting from 0) of the detected command in the selected commandset
`result.command_text` A string containing the command as provided in the commandset
`result.timestamp` A positive unsigned integer that designates the point in time which audio stopped being processed at. Pass this timestamp back in a subsequent message to mask the latency of transcription.

`unguided`
`params.no_context` Sets the corresponding whisper `no_context` param. Defaults to true. Might provide more accurate results for consecutive unguided transcriptions if those after the first are set to false.
`params.prompt` If provided, sets the initial prompt used during transcription.
`params.timestamp` A positive unsigned integer which designates a point in time which audio should begin processing from. If left blank, the start point of audio processing will be the moment the message is recieved. This should be left blank unless you have a timestamp from a previous response.
Responds with
`result.transcription` A string containing the transcribed text. N.B. This will almost always start with a space due to how text is tokenized.
`result.timestamp` A positive unsigned integer that designates the point in time which audio stopped being processed at. Pass this timestamp back in a subsequent message to mask the latency of transcription.
Loading