Kuan Zhou
I am currently an engineer focusing on ML/AI systems including agentic AI systems focusing on coding agents and data security, distributed training(with US patents), inference service performance, AI platform engineering based on Kubernetes, MLOps etc. Additionally, I have a keen interest in building AI applications which leverage the power of generative AI and understanding the mathematics and physics behind neural networks.
Before immersing myself in AI systems, I worked on scientific research in physics - I developed mathematical analysis, research capabilities, and programming skills during undergraduate studies in Physics(thesis: computational simulation for NMR based quantum computing systems, advised by Prof. Xinhua Peng and Prof. Jiangfeng Du) at Univeristy of Science and Technology of China and PhD in Computational Physics(thesis: electronic properties modeling of two-dimensional materials, advised by Prof. Roger Lake) at LATTE lab at University of California, Riverside.
The journey which navigated me from Physics to ML/AI started with reading news about ML/AI, attending ML/AI seminars in Prof. Linli Xu's group, taking ML courses in the CS department, participating Kaggle competitions and completing Insight data science bootcamp. My passion for math and physics was ignited in high school by reading inspiring stories about Albert Einstein and Richard Feynman and participating in Math and Physics Olympiads.
In my spare time, I enjoy films, music, and spending time with my family, hiking, camping, biking, traveling, and trying new foods, along with our two cats, Gemma (orange tabby) and Nova (ragdoll).
Passion
Exploring the synergy between science and technology, building AI applications, understanding the math and physics behind neural networks.
Tech Stack
Proficient in, familiar with, or able to contribute after a brief learning period
Programming Languages
Python, Golang, C/C++, Java, Kotlin, JavaScript/TypeScript, Rust
AI Frameworks
PyTorch, HF Transformers, JAX, TensorFlow, Triton, CUDA
Distributed Systems
Torch Distributed, Megatron-ML, DeepSpeed
ML Platforms
Docker, gRPC, Kubernetes, Istio, OpenTelemetry, Kubebuilder
MLOps
MLFlow, Weights & Biases, BentoML, Flyte, Kubeflow, Hydra
ML Compilers
MLIR, LLVM, TVM
Service Serving
vLLM, Triton Inference Server, Text Generation Inference
AI applications
Electron, Swift/SwiftUI, Streamlit
Frontend
React, NextJS, Material UI, TailwindCSS, FastAPI
Databases
PostgreSQL, DynamoDB, BoltDB, SQL
Scientific Tools
Mathematica, Julia, Matlab, LaTeX
Others
Bazel, AWS CDK, Mermaid, Pybind, Pydantic, JsonSchema, Spark, Hadoop, ORTools, Numba
Experience
Software Dev Engineer II Amazon Web Services
March 2025 - PresentSeattle, WA
- Led the design and implementation of a scalable data management system with lakehouse architecture, enhancing security, accelerating processing, and improving data availability and onboarding
- Developed and delivered an approved data egress pipeline with automated redaction, leveraging Amazon Comprehend and Bedrock to enable research on highly sensitive data; processed terabytes of data to uncover insights driving product improvements
- Mentored an intern in building a Git-like synchronization VSCode IDE extension, accelerating developer workflows and reducing code integration friction
- Drove internal dogfooding initiatives, identifying pain points and proposing enhancements that improved service reliability and application usability
Staff Engineer - Machine LearningSambaNova Systems
April 2020 - February 2025PALO ALTO, CA
- Tech lead in containerizing and deploying generative AI models onto Kubernetes platform SambaStudio
- Led a 5+ engineers team to deploy foundation model based solutions to business customers
- Prototyped the generative AI model deployment pipeline and Kubernetes platform
- Built general and extensive infrastructure for continuous model integration and deployment
- Standardized the model bringup and integration procedure via refactoring ML applications
- Co-designed and co-developed distributed learning infrastructure for extreme large models
- Overlapping gradient synchronization in machine learning
- System for executing an application on heterogeneous reconfigurable processors
- System of heterogeneous reconfigurable processors for the data parallel execution of applications
- Contributed in core features of SambaNova AI framework
- Designed, implemented and maintained a binary data extractor as bridge between compiler and runtime
- Refactored and upgraded AI framework codebase to support functional programming style dataflow execution
- Implemented various deep learning operators from compiler low level kernels to AI framework end to end
- Optimized performance of deep learning models(HIPNN etc.) based on SambaNova AI framework and dataflow architecture
- Integrated TensorBoard as visualization and accuracy debugger tool into SambaNova AI framework
Software Engineer - Machine LearningPetuum Inc.
February 2019 - March 2020SUNNYVALE, CA
- Leveraged OCR engines and deep learning models to process logistic bills automatically with 0.87 accuracy
- Collaborated in implementation of various anomaly detection models for equipment health prediction
- Contributed in machine learning pipeline refactoring and model improvement based on various use cases
Artificial Intelligence FellowInsight Data Science
June 2018 - September 2018SAN FRANCISCO, CA
- Architected SketchTML that takes in several hand drawn sketches and produces an interactive HTML website
- Leveraged the framework of pix2code to build a more robust image captioning model with different styles
- Improved BLEU score up to 0.88 through inventive data augmentation methods and weighted loss functions
Publications
US patents
- Auto-discovery module for the discovery of reconfigurable processors in a pool of heterogeneous reconfigurable processors, August 2024
- System for executing an application on heterogeneous reconfigurable processors, May 2024
- Overlapping gradient synchronization in machine learning, August 2023
- System of heterogeneous reconfigurable processors for the data parallel execution of applications, July 2023
Journal articles
- Tunable Lifshitz transitions an multiband transport in tetralayer graphene, Physical Review Letters, February, 2018
- Interlayer resistance of misoriented MoS2, Phys. Chem. Chem. Phys, Royal Society of Chemistry, March, 2017
Conferences
- Exploiting electron-hole asymmetry at a misoriented MoS2 interface in bipolar device design, TECHCON, September 2018
- Edge states of ABA trilayer Graphene nanoribbons, APS, March 2018
- The interlayer resistance of a misoriented bilayer MoS2 interface, APS, March 2016
- Transport properties across misoriented bilayer MoS2 using Ab-initio calculations, APS, March 2015
Education
PhD in Computational PhysicsUniversity of California, Riverside
September 2013 - December 2018RIVERSIDE, CA
BSc in PhysicsUniversity of Science and Technology of China
Zhongyao Zhao Applied Physics Elite Class
August 2009 - June 2013HEFEI, CHINA