•
Merging ViTs and LLMs using a pretrained (graph) neural net.
15 min read · 2026
Merging experts in Mixture-of-Experts (MoE) LLMs to compress a 235B LLM.
35 min read · 2026
Explaining our ICLR 2025 paper and visualizing neuron permutation symmetry.
15 min read · 2025
22 min read · August 12, 2019 · medium.com
2019 · medium
24 min read · August 04, 2019 · medium.com