The Wayback Machine - http://web.archive.org/web/20210926112241/https://github.com/google/trax/issues/1078
Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bidirectional RNN #1078

Open

jonatasgrosman opened this issue Oct 7, 2020 · 7 comments
Open

Bidirectional RNN #1078

jonatasgrosman opened this issue Oct 7, 2020 · 7 comments

Comments

@jonatasgrosman
Copy link

@jonatasgrosman jonatasgrosman commented Oct 7, 2020

Is there a way to train a bidirectional RNN (like LSTM or GRU) on trax nowadays?

@lukaszkaiser
Copy link
Contributor

@lukaszkaiser lukaszkaiser commented Oct 16, 2020

I believe they should be easy to implement. We don't have them by default yet - a PR adding them would be welcome!

@narayanacharya6
Copy link

@narayanacharya6 narayanacharya6 commented Oct 21, 2020

Hey @lukaszkaiser mind if I take a stab at this?

@zvikinoza
Copy link
Contributor

@zvikinoza zvikinoza commented Nov 3, 2020

if nobody is working currently, I'll submit PR. (@narayanacharya6 )

@narayanacharya6
Copy link

@narayanacharya6 narayanacharya6 commented Nov 3, 2020

I haven't started yet, so go for it @zvikinoza

@manifest
Copy link
Contributor

@manifest manifest commented May 30, 2021

I've just made a PR for the issue.

I wasn't sure where to place it, so I just added it to trax.layers.combinators :-)
If it should be in trax.layers.rnn, I can move it.

@manifest
Copy link
Contributor

@manifest manifest commented May 30, 2021

In the PR I use copy.deepcopy to create a backward_layer as a copy of the forward_layer. Is it appropriate way to copy layer instances?

@manifest
Copy link
Contributor

@manifest manifest commented May 30, 2021

I also have a question about RNN implementation in the Trax. Why do we initialize the hidden state of GRU and LSTM layers proportionally to the dimension of their inputs? Shouldn't we pass n_units to MakeZeroState and get (batch_size, n_units) as a dimension of their hidden states?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
5 participants