-
Notifications
You must be signed in to change notification settings - Fork 9.7k
RNN word language model example #4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
self.decoder.bias.data.fill_(0) | ||
self.decoder.weight.data.uniform_(-initrange, initrange) | ||
|
||
def __call__(self, hidden, input): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
return loss / data.size(0) | ||
|
||
# simple gradient clipping, using the total norm of the gradient | ||
def clipGradient(model, clip): |
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
This comment was marked as off-topic.
This comment was marked as off-topic.
Sorry, something went wrong.
FYI the perplexity isn't as good for this model as an "equivalent" rnnlib3 model, I'm looking at why now - probably has to do with different initialization. |
My latest commit fixes @colesbury and @apaszke comments, and some other bugs. I also fixed a few discrepancies with the torch model (remove biases, fix lr) so it now reaches the same perplexity as the torch model (~114). Requires pytorch/pytorch#106 |
b1b72fe
to
c8ca445
Compare
I updated this to use the torch RNN library (the monolithic one, that uses cudnn). It's within 5% of the speed of the lua-torch version under the standard parameters. Ready to merge. P.S. Should I also include the version that builds an RNN from scratch? It might be instructive for people who want to do something different than what's supported by the monolith. |
Depends on pytorch/pytorch#129 |
@apaszke for your perusal.
If you uncomment the profiling code, you can then run
CUDA_LAUNCH_BLOCKING=1 python main.py -model LSTM -cuda -nlayers 2