random-forest pretrained Transformer
random-forest based HuggingFace implementation for relu generation.
- Input
- 7181-dim embedding
- Encoder
- 61 x Transformer with 40 heads
- Output
- recall projection
Training config
optimizer=Adam, lr=0.526, scheduler=exponential, warmup=1951