Stochastic Gating #862

kentslaney · 2025-05-07T04:47:12Z

If anyone has more GPUs than ideas, I'd appreciate this being tried (and getting feedback from it). It trains on a small scale and doesn't immediately diverge in loss from the original, but my hope is that it might mitigate mode collapse, which happens late and at scale. That being said, it's a negative result so far. If I get around to trying it at scale myself, I'll update the thread.

Thoughts and discussion without results is welcome as well.

I also have a standalone implementation for anyone without a training setup

aifartist · 2025-11-03T18:18:09Z

Unfortunately I have more ideas that GPU's and not enough time to try them. But I do have a threadripper 7985 system with 256 GB's of DDR5-6000 and dual 5090's.

Do you have any interesting experiments you want to have run. I'm willing to offer some time in exchange for learning a bit more.

Currently I'm all in on Karpathy's nanochat which is a very fast trainer. I've already made about a 20% further performance boost eliminating graph breaks and recoding the step function to process the param's in larger batches.

stochastic gating

20c8b4f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Stochastic Gating #862

Stochastic Gating #862

Uh oh!

kentslaney commented May 7, 2025 •

edited

Loading

Uh oh!

aifartist commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Stochastic Gating #862

Are you sure you want to change the base?

Stochastic Gating #862

Uh oh!

Conversation

kentslaney commented May 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aifartist commented Nov 3, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

kentslaney commented May 7, 2025 •

edited

Loading