Projects

Key engineering projects across my career.

PhD & Before

NMR-Based Quantum Computing Simulation

Quantum ComputingNMRComputational PhysicsSimulation

Undergraduate thesis project: developed computational simulations of NMR-based quantum computing systems, modeling qubit dynamics and gate operations under realistic physical conditions.

Read more →

Electronic Properties Modeling of 2D Materials

2D MaterialsComputational PhysicsDFTGrapheneMoS2

PhD thesis: modeled the electronic properties of two-dimensional materials including graphene and MoS2, investigating transport behavior, band structure, and interface effects using ab initio and tight-binding methods.

Read more →

Kaggle Competitions

Machine LearningDeep LearningCompetitionData Science

Competed in 14 Kaggle competitions across a range of domains — computer vision, NLP, time series, and tabular data — earning Competition Expert status with 2 silver and 2 bronze medals. Peak global rank: 666 out of 200,000+ competitors.

Read more →

SambaNova Systems

PyPEF Parser Development

C/C++Pythonpybind

Developed a C/C++ parser for an internal binary format (PEF) during the first week at SambaNova, bridging it to Python via pybind. Delivered a merged PR by end of week one.

Read more →

HIPNN Model Optimization

ML OptimizationProfilingCompilerKernels

Led integration and optimization of the HIPNN machine learning model to run 3x faster on SambaNova hardware than on NVIDIA V100, coordinating across compiler, kernels, and AI framework teams.

Read more →

PyPEF-Based Runtime Validation

PythonReliabilityValidation

Designed and built a Python-based dynamic PEF validator that proactively catches symbol mismatches before they reach hardware, eliminating system crashes that previously required reboots.

Read more →

Gradient Sync Overlapping (GSO)

Distributed TrainingPatentBERTGPT-3

Optimized distributed training by overlapping gradient synchronization with compute via pipeline parallelism, achieving a 7% performance boost on BERT and GPT-3, leading to a US patent filing.

Read more →

Modelbox Pipeline Bringup

KubernetesgRPCDockerGolangCI/CD

Redesigned an unstable ML platform into a production-grade Kubernetes and gRPC-based deployment pipeline, enabling 10+ model deployments for Accenture and onboarding the first generative AI model.

Read more →

Composition of Experts (CoE) Model Scaling

MoEProduction MLSambaStudio

Transformed an experimental 150-expert LLaMA 7B Mixture-of-Experts proof-of-concept into a production-ready endpoint on SambaStudio, standardizing interfaces, artifact management, and deployment practices.

Read more →

Dynamic Batching for Enhanced Hardware Utilization

LLaMA 2gRPCRedisPerformance

Integrated dynamic batching into the app server for LLaMA 2 (7B and 70B), achieving notable improvements in Time-to-First-Token, throughput, and hardware utilization for a major customer release.

Read more →

Artifact Dashboard for Streamlined Model Deployment

Next.jsMaterial UIFrontendMLOps

Built a self-service artifact dashboard in Next.js and Material UI that enabled product, PM, and SRE teams to independently track and manage model checkpoints, accelerating deployment workflows.

Read more →

Modelbox Testing Infrastructure Improvement

BazelCI/CDJenkinsTesting

Overhauled the modelbox testing framework with mock integration tests and Bazel-based caching, cutting app server + model integration test time from over an hour to seconds.

Read more →

Transformers Version Compatibility Consulting

DebuggingRoot Cause AnalysisHuggingFace Transformers

Unblocked a critical release stalled for several days by a Transformers 4.45.1 tokenizer regression — led root cause analysis, proposed a rollback to 4.43.2, and enabled on-time QA and release.

Read more →

Argument Management with Hydra + Pydantic

HydraPydanticMLOpsArchitecture

Solved a growing argument management pain point across ML models by designing a Hydra + Pydantic hybrid configuration system, adopted as a best practice across the ML stack.

Read more →

Amazon Web Services

Security Data Management Platform

Data PlatformSecurityAWSAI Infrastructure

Architected and implemented key components of a security data management platform to analyze data from AI products across AWS, enabling secure data handling, access control, and insights at scale.

Read more →

AWS DevOps Agent Evaluation & Improvement

AI AgentsDevOpsEvaluationLLM

Drove evaluation infrastructure and performance and accuracy improvements for AWS DevOps AI agents, establishing robust benchmarks and identifying key areas for model and system-level gains.

Read more →
© 2026 Kuan Zhou. Crafted using Gatsby framework.