← Back to Projects

Composition of Experts (CoE) Model Scaling

MoEProduction MLSambaStudio
Transformed an experimental 150-expert LLaMA 7B Mixture-of-Experts proof-of-concept into a production-ready endpoint on SambaStudio, standardizing interfaces, artifact management, and deployment practices.

Situation

The Composition of Experts (CoE) project was one of SambaNova's largest and most ambitious initiatives — a Mixture of Experts setup comprising 150 LLaMA 7B models as experts and a single LLaMA 7B router. While the ML team had built a compelling proof-of-concept, it was far from production-ready: the codebase needed refactoring, interfaces were inconsistent, and artifact management was ad hoc.

Task

I was brought in by the CoE team lead to help transform this experimental setup into a production-grade system. My primary responsibilities were refining the codebase, standardizing interfaces, and putting in place sustainable practices for model registry, artifact management, and deployment.

Actions

  • Engaged in deep technical discussions with the CoE team to identify current challenges and architectural bottlenecks.
  • Refactored the codebase for modularity and maintainability, creating a structure that could scale and be owned long-term.
  • Standardized model interfaces and implemented reliable artifact management, enabling a consistent model registry and smoother deployment cycles.
  • Defined robust packaging and release practices to streamline integration with the SambaStudio platform.
  • Coordinated closely with cross-functional teams to prioritize urgent dependencies and keep pace with the project's high priority.

Result

We successfully deployed the first CoE model onto SambaStudio as a production-ready endpoint. The optimized pipeline enabled several compelling customer demos, demonstrating the scalability and effectiveness of our CoE-based solution.

© 2026 Kuan Zhou. Crafted using Gatsby framework.