Ask HN: How does your data science or machine learning team handle DevOps?
Machine learning teams often face operating needs not seen in many other domains. Some example: - instrumenting observability that not only monitors data quality and upstream ETL job status, but also domain specific considerations of training ML models, like overfitting, confusion matrices, business use case accuracy or validation checks, ROC curves and more (all needing to be customized and centrally reported per each model training task). - standardizing end to end tooling for special resources, eg queueing and batching to keep utilization high for production GPU systems, high RAM use cases like approximate nearest neighbor indexes, and just run of the mill stuff like how to take a trained model and deploy it behind a microservice in a way that bakes in logging, tracing, alerting, and more. Machine learning engineers and data scientists tend to have a comparative advantage when they can focus on understanding the data, running experiments to decide which models are best, pairing with
DeepCamp AI