Argument Management with Hydra + Pydantic

HydraPydanticMLOpsArchitecture

Solved a growing argument management pain point across ML models by designing a Hydra + Pydantic hybrid configuration system, adopted as a best practice across the ML stack.

Situation

When I joined the team, all ML models shared a single argument parser. As the number of models grew, this created a web of cross-dependencies — a change for one model could break another, and navigating configurations became increasingly painful and error-prone for the entire team.

Task

Research, design, and implement a scalable, modular configuration solution that would eliminate cross-dependencies, improve developer experience, and align with broader industry standards.

Actions

Thoroughly investigated popular configuration solutions in the ML community: dataclasses (as used in Hugging Face Transformers), Hydra by Meta, Pydantic, and JSON Schema — evaluating documentation, flexibility, and community adoption.
Built prototypes of each approach to assess real-world practicality. Hydra stood out for its YAML-based composition, autocomplete support, and configuration inheritance — but lacked internal reflection capabilities we needed.
Found a design pattern combining Hydra and Pydantic that addressed the reflection gap. Built a working prototype on BERT Large and documented the design in detail.
Presented the design to the team, highlighting key benefits: modular per-model configurations, schema validation via Pydantic, and alignment with industry best practices. Created full design docs for transparency and reproducibility.

Result

The Hydra + Pydantic configuration system was adopted across modelbox, ML applications, and AI infrastructure. It eliminated cross-model dependencies, improved error validation, and simplified the development workflow for new model integration. The approach was recognized as a team best practice.