HIPNN Model Optimization

ML OptimizationProfilingCompilerKernels

Led integration and optimization of the HIPNN machine learning model to run 3x faster on SambaNova hardware than on NVIDIA V100, coordinating across compiler, kernels, and AI framework teams.

Situation

My first opportunity as a lead engineer was the HIPNN project. The goal was to optimize a machine learning model to run 3x faster on our hardware stack than on an NVIDIA V100 — a demanding benchmark. The challenge was compounded by the fact that I was new to the internal stack, and key tools including the compiler, custom kernels, and AI framework were still under active development.

Task

Successfully integrate and optimize the HIPNN model by working across teams to fill gaps in the stack, and deliver performance that would meet customer expectations.

Actions

Collaborated closely with both the compiler and kernels teams, leveraging their domain expertise while keeping everyone aligned on shared goals.
Generated a Graphviz-based neural network graph annotated with compiler attributes and exported it as a PDF — this visual became a key tool for tracking milestones and coordinating across teams.
Led kernel bring-up and model profiling, with a particular focus on performance-critical operations like index_add and index_select.
Tackled precision issues by implementing multiprecision handling, and addressed the need for custom backward operators in autograd.
Continuously tested and profiled the model throughout, ensuring every optimization was validated under realistic conditions.

Result

We successfully delivered the HIPNN model meeting the targeted 3x performance improvement over the V100. Customers were highly satisfied with the result, and the project demonstrated our team's capability to handle advanced AI workloads on novel hardware.