Skip to content

DeepNVMe update #966

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 50 commits into from
Jun 9, 2025
Merged
Changes from 1 commit
Commits
Show all changes
50 commits
Select commit Hold shift + click to select a range
8106bb8
Fast model checkpointing
tjruwase Dec 30, 2021
761e4e5
Support both legacy and serialized formats
tjruwase Dec 31, 2021
5967c79
Add io_buffer_mb option
tjruwase Jan 3, 2022
d96f1f6
Bug fix
tjruwase Jan 3, 2022
bbd96f2
Force flush
tjruwase Jan 3, 2022
3a16127
More model options; Refactor common codes
tjruwase Jan 4, 2022
c3df495
--gpu option
tjruwase Jan 5, 2022
315f02a
--half and more flexible options
tjruwase Jan 5, 2022
a41ba08
Add deepspeed.save_checkpoint()
tjruwase Jan 8, 2022
4fcb060
Free ds memory
tjruwase Jan 8, 2022
a49c542
Improve repro
tjruwase Jan 8, 2022
233b9e9
Double I/O buffer (#56)
tjruwase Feb 22, 2022
b1f02b2
Double I/O buffer (#60)
tjruwase Mar 11, 2022
a16ac9e
Add checkpoint comparison (#62)
jerryyangli Mar 15, 2022
b945adc
save_checkpoint perf monitoring
tjruwase Mar 19, 2022
2c7a5ed
Merge branch 'staging-fast-model-checkpoint-v2' of github.com:microso…
tjruwase Mar 19, 2022
64a8f75
Disable checkpoint save on exit
tjruwase Mar 22, 2022
44b8664
Perf statistics for save_checkpoint (#64)
tjruwase Mar 22, 2022
ff4bd69
add logs for a100-80
GuanhuaWang Sep 21, 2022
e4817a1
add torch* error log with half flag but without fused flag
GuanhuaWang Sep 22, 2022
b297e17
log for error
GuanhuaWang Sep 22, 2022
f05dab1
local rank arg
tjruwase Oct 5, 2022
fc4291f
Merge branch 'staging-fast-model-checkpoint-v2' of github.com:microso…
tjruwase Oct 5, 2022
db295f1
Merge branch 'staging-fast-model-checkpoint-v2' of github.com:microso…
tjruwase Oct 5, 2022
1aa971a
Handle local_rank arg (#78)
tjruwase Oct 5, 2022
98b2f8a
Single writer option
tjruwase Oct 5, 2022
2e42285
Single writer option (#79)
tjruwase Oct 5, 2022
09dbd8a
Merge branch 'staging-fast-model-checkpoint-v3' of github.com:microso…
tjruwase Oct 7, 2022
a567adf
Allow missing folder
tjruwase Oct 12, 2022
65793bd
DP writer refactor
tjruwase Feb 10, 2023
5bfdf04
Update for DS; Add GDS
tjruwase Feb 12, 2025
9a27914
Integrate GDS into deepspeed_model_save
tjruwase Feb 20, 2025
53572f8
Rebase fast persist
tjruwase Feb 25, 2025
515dded
Rebase fast persist (#184)
tjruwase Feb 25, 2025
d01aa27
Move folder
tjruwase Mar 26, 2025
e5a316f
Merge branch 'olruwase/fast_persist' of github.com:microsoft/DeepSpee…
tjruwase Mar 26, 2025
4059f80
Remove folder
tjruwase Mar 26, 2025
1c3a54c
More cleanup
tjruwase Mar 26, 2025
9a8540b
torch changes
tjruwase Mar 27, 2025
ee2f081
sglang+zero_inference
tjruwase Apr 7, 2025
ad81cec
Remove file
tjruwase Apr 7, 2025
dff5274
Add offload configs
tjruwase Apr 8, 2025
d84bb56
Add pin_memory
tjruwase Apr 8, 2025
db3b32b
Cleanup scripts
tjruwase Apr 8, 2025
6ee91cb
SGLang README
tjruwase Apr 12, 2025
e283b74
Remove file
tjruwase Apr 12, 2025
54872e1
Merge branch 'master' into olruwase/fast_persist
tjruwase Apr 14, 2025
d971d84
Merge branch 'master' into olruwase/fast_persist
loadams May 15, 2025
3decf3d
Merge branch 'master' into olruwase/fast_persist
hwchen2017 May 23, 2025
0512775
Merge branch 'master' into olruwase/fast_persist
PKUWZP Jun 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Merge branch 'master' into olruwase/fast_persist
  • Loading branch information
loadams authored May 15, 2025
commit d971d841e168db47f107e0c581efbe7f65977eb9

This merge commit was added into this branch cleanly.

There are no new changes to show, but you can still view the diff.