scitix
Pinned Loading
Repositories
- sichek Public
Sichek is a tool for detecting and diagnosing node-level issues in AI environments, ensuring the reliability and high performance of GPU-intensive workloads. It proactively identifies hardware and software problems, and triggers automated corrective actions, including task retries and operational maintenance timely
scitix/sichek’s past year of commit activity - netpulse Public
scitix/netpulse’s past year of commit activity - aegis Public
Aegis is an LLM-powered AI cluster autonomous operations system, focused on intelligent capabilities such as Fault Diagnosis, Self-healing, Root Cause Analysis, Cluster Inspection, and Alert Optimization.
scitix/aegis’s past year of commit activity - Megatron-LM Public Forked from NVIDIA/Megatron-LM
Ongoing research training transformer models at scale
scitix/Megatron-LM’s past year of commit activity - deep_learning_examples Public Forked from sallylxl/deep_learning_examples
Contains example scripts for deep learning
scitix/deep_learning_examples’s past year of commit activity
People
This organization has no public members. You must be a member to see who’s a part of this organization.
Top languages
Loading…
Most used topics
Loading…