Hi. I think the recurrent cells in flux need a dropout for their internal states to improve training in larger networks For example : mycell= GRU(10,10,0.1) Where 0.1 is dropout Or is there an easy way to implement this manually?