Composition of Experts (CoE) Model Scaling
Situation
The Composition of Experts (CoE) project was one of SambaNova's largest and most ambitious initiatives — a Mixture of Experts setup comprising 150 LLaMA 7B models as experts and a single LLaMA 7B router. While the ML team had built a compelling proof-of-concept, it was far from production-ready: the codebase needed refactoring, interfaces were inconsistent, and artifact management was ad hoc.
Task
I was brought in by the CoE team lead to help transform this experimental setup into a production-grade system. My primary responsibilities were refining the codebase, standardizing interfaces, and putting in place sustainable practices for model registry, artifact management, and deployment.
Actions
- Engaged in deep technical discussions with the CoE team to identify current challenges and architectural bottlenecks.
- Refactored the codebase for modularity and maintainability, creating a structure that could scale and be owned long-term.
- Standardized model interfaces and implemented reliable artifact management, enabling a consistent model registry and smoother deployment cycles.
- Defined robust packaging and release practices to streamline integration with the SambaStudio platform.
- Coordinated closely with cross-functional teams to prioritize urgent dependencies and keep pace with the project's high priority.
Result
We successfully deployed the first CoE model onto SambaStudio as a production-ready endpoint. The optimized pipeline enabled several compelling customer demos, demonstrating the scalability and effectiveness of our CoE-based solution.