Skip to content

Commit 9e244f5

Browse files
d-nettoRAI CI (GitHub Action Automation)
authored andcommitted
Run GC on multiple threads (JuliaLang#48600)
1 parent b5691ab commit 9e244f5

33 files changed

+1540
-1563
lines changed

NEWS.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ Compiler/Runtime improvements
3838
* All uses of the `@pure` macro in `Base` have been replaced with the now-preferred `Base.@assume_effects` ([#44776]).
3939
* `invoke(f, invokesig, args...)` calls to a less-specific method than would normally be chosen
4040
for `f(args...)` are no longer spuriously invalidated when loading package precompile files ([#46010]).
41+
* The mark phase of the Garbage Collector is now multi-threaded ([#48600]).
4142

4243
Command-line option changes
4344
---------------------------
@@ -49,6 +50,8 @@ Command-line option changes
4950
number of interactive threads to create (`auto` currently means 1) ([#42302]).
5051
* New option `--heap-size-hint=<size>` suggests a size limit to invoke garbage collection more eagerly.
5152
The size may be specified in bytes, kilobytes (1000k), megabytes (300M), or gigabytes (1.5G) ([#45369]).
53+
* New option `--gcthreads` to set how many threads will be used by the Garbage Collector ([#48600]).
54+
The default is set to `N/2` where `N` is the amount of worker threads (`--threads`) used by Julia.
5255

5356
Multi-threading changes
5457
-----------------------

base/options.jl

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,6 +11,7 @@ struct JLOptions
1111
cpu_target::Ptr{UInt8}
1212
nthreadpools::Int16
1313
nthreads::Int16
14+
ngcthreads::Int16
1415
nthreads_per_pool::Ptr{Int16}
1516
nprocs::Int32
1617
machine_file::Ptr{UInt8}

base/threadingconstructs.jl

Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,13 @@ function threadpooltids(pool::Symbol)
136136
end
137137
end
138138

139+
"""
140+
Threads.ngcthreads() -> Int
141+
142+
Returns the number of GC threads currently configured.
143+
"""
144+
ngcthreads() = Int(unsafe_load(cglobal(:jl_n_gcthreads, Cint))) + 1
145+
139146
function threading_run(fun, static)
140147
ccall(:jl_enter_threaded_region, Cvoid, ())
141148
n = threadpoolsize()

doc/man/julia.1

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -118,6 +118,11 @@ supported (Linux and Windows). If this is not supported (macOS) or
118118
process affinity is not configured, it uses the number of CPU
119119
threads.
120120

121+
.TP
122+
--gcthreads <n>
123+
Enable n GC threads; If unspecified is set to half of the
124+
compute worker threads.
125+
121126
.TP
122127
-p, --procs {N|auto}
123128
Integer value N launches N additional local worker processes `auto` launches as many workers

doc/src/base/multi-threading.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ Base.Threads.nthreads
1010
Base.Threads.threadpool
1111
Base.Threads.nthreadpools
1212
Base.Threads.threadpoolsize
13+
Base.Threads.ngcthreads
1314
```
1415

1516
See also [Multi-Threading](@ref man-multithreading).

doc/src/manual/command-line-interface.md

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,8 +106,9 @@ The following is a complete list of command-line switches available when launchi
106106
|`-e`, `--eval <expr>` |Evaluate `<expr>`|
107107
|`-E`, `--print <expr>` |Evaluate `<expr>` and display the result|
108108
|`-L`, `--load <file>` |Load `<file>` immediately on all processors|
109-
|`-t`, `--threads {N\|auto`} |Enable N threads; `auto` tries to infer a useful default number of threads to use but the exact behavior might change in the future. Currently, `auto` uses the number of CPUs assigned to this julia process based on the OS-specific affinity assignment interface, if supported (Linux and Windows). If this is not supported (macOS) or process affinity is not configured, it uses the number of CPU threads.|
110-
|`-p`, `--procs {N\|auto`} |Integer value N launches N additional local worker processes; `auto` launches as many workers as the number of local CPU threads (logical cores)|
109+
|`-t`, `--threads {N\|auto}` |Enable N threads; `auto` tries to infer a useful default number of threads to use but the exact behavior might change in the future. Currently, `auto` uses the number of CPUs assigned to this julia process based on the OS-specific affinity assignment interface, if supported (Linux and Windows). If this is not supported (macOS) or process affinity is not configured, it uses the number of CPU threads.|
110+
| `--gcthreads {N}` |Enable N GC threads; If unspecified is set to half of the compute worker threads.|
111+
|`-p`, `--procs {N\|auto}` |Integer value N launches N additional local worker processes; `auto` launches as many workers as the number of local CPU threads (logical cores)|
111112
|`--machine-file <file>` |Run processes on hosts listed in `<file>`|
112113
|`-i` |Interactive mode; REPL runs and `isinteractive()` is true|
113114
|`-q`, `--quiet` |Quiet startup: no banner, suppress REPL warnings|

doc/src/manual/environment-variables.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -315,6 +315,14 @@ then spinning threads never sleep. Otherwise, `$JULIA_THREAD_SLEEP_THRESHOLD` is
315315
interpreted as an unsigned 64-bit integer (`uint64_t`) and gives, in
316316
nanoseconds, the amount of time after which spinning threads should sleep.
317317

318+
### [`JULIA_NUM_GC_THREADS`](@id env-gc-threads)
319+
320+
Sets the number of threads used by Garbage Collection. If unspecified is set to
321+
half of the number of worker threads.
322+
323+
!!! compat "Julia 1.10"
324+
The environment variable was added in 1.10
325+
318326
### `JULIA_EXCLUSIVE`
319327

320328
If set to anything besides `0`, then Julia's thread policy is consistent with

doc/src/manual/multi-threading.md

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,15 @@ julia> Threads.threadid()
7272
three processes have 2 threads enabled. For more fine grained control over worker
7373
threads use [`addprocs`](@ref) and pass `-t`/`--threads` as `exeflags`.
7474

75+
### Multiple GC Threads
76+
77+
The Garbage Collector (GC) can use multiple threads. The amount used is either half the number
78+
of compute worker threads or configured by either the `--gcthreads` command line argument or by using the
79+
[`JULIA_NUM_GC_THREADS`](@ref env-gc-threads) environment variable.
80+
81+
!!! compat "Julia 1.10"
82+
The `--gcthreads` command line argument requires at least Julia 1.10.
83+
7584
## [Threadpools](@id man-threadpools)
7685

7786
When a program's threads are busy with many tasks to run, tasks may experience

src/Makefile

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,7 +99,7 @@ ifeq ($(USE_SYSTEM_LIBUV),0)
9999
UV_HEADERS += uv.h
100100
UV_HEADERS += uv/*.h
101101
endif
102-
PUBLIC_HEADERS := $(BUILDDIR)/julia_version.h $(wildcard $(SRCDIR)/support/*.h) $(addprefix $(SRCDIR)/,julia.h julia_assert.h julia_threads.h julia_fasttls.h julia_locks.h julia_atomics.h jloptions.h)
102+
PUBLIC_HEADERS := $(BUILDDIR)/julia_version.h $(wildcard $(SRCDIR)/support/*.h) $(addprefix $(SRCDIR)/,work-stealing-queue.h julia.h julia_assert.h julia_threads.h julia_fasttls.h julia_locks.h julia_atomics.h jloptions.h)
103103
ifeq ($(OS),WINNT)
104104
PUBLIC_HEADERS += $(addprefix $(SRCDIR)/,win32_ucontext.h)
105105
endif

src/gc-debug.c

Lines changed: 44 additions & 153 deletions
Original file line numberDiff line numberDiff line change
@@ -198,21 +198,32 @@ static void restore(void)
198198

199199
static void gc_verify_track(jl_ptls_t ptls)
200200
{
201-
jl_gc_mark_cache_t *gc_cache = &ptls->gc_cache;
201+
// `gc_verify_track` is limited to single-threaded GC
202+
if (jl_n_gcthreads != 0)
203+
return;
202204
do {
203-
jl_gc_mark_sp_t sp;
204-
gc_mark_sp_init(gc_cache, &sp);
205+
jl_gc_markqueue_t mq;
206+
jl_gc_markqueue_t *mq2 = &ptls->mark_queue;
207+
ws_queue_t *cq = &mq.chunk_queue;
208+
ws_queue_t *q = &mq.ptr_queue;
209+
jl_atomic_store_relaxed(&cq->top, 0);
210+
jl_atomic_store_relaxed(&cq->bottom, 0);
211+
jl_atomic_store_relaxed(&cq->array, jl_atomic_load_relaxed(&mq2->chunk_queue.array));
212+
jl_atomic_store_relaxed(&q->top, 0);
213+
jl_atomic_store_relaxed(&q->bottom, 0);
214+
jl_atomic_store_relaxed(&q->array, jl_atomic_load_relaxed(&mq2->ptr_queue.array));
215+
arraylist_new(&mq.reclaim_set, 32);
205216
arraylist_push(&lostval_parents_done, lostval);
206217
jl_safe_printf("Now looking for %p =======\n", lostval);
207218
clear_mark(GC_CLEAN);
208-
gc_mark_queue_all_roots(ptls, &sp);
209-
gc_mark_queue_finlist(gc_cache, &sp, &to_finalize, 0);
210-
for (int i = 0; i < gc_n_threads; i++) {
219+
gc_mark_queue_all_roots(ptls, &mq);
220+
gc_mark_finlist(&mq, &to_finalize, 0);
221+
for (int i = 0; i < gc_n_threads;i++) {
211222
jl_ptls_t ptls2 = gc_all_tls_states[i];
212-
gc_mark_queue_finlist(gc_cache, &sp, &ptls2->finalizers, 0);
223+
gc_mark_finlist(&mq, &ptls2->finalizers, 0);
213224
}
214-
gc_mark_queue_finlist(gc_cache, &sp, &finalizer_list_marked, 0);
215-
gc_mark_loop(ptls, sp);
225+
gc_mark_finlist(&mq, &finalizer_list_marked, 0);
226+
gc_mark_loop_serial_(ptls, &mq);
216227
if (lostval_parents.len == 0) {
217228
jl_safe_printf("Could not find the missing link. We missed a toplevel root. This is odd.\n");
218229
break;
@@ -246,22 +257,35 @@ static void gc_verify_track(jl_ptls_t ptls)
246257

247258
void gc_verify(jl_ptls_t ptls)
248259
{
249-
jl_gc_mark_cache_t *gc_cache = &ptls->gc_cache;
250-
jl_gc_mark_sp_t sp;
251-
gc_mark_sp_init(gc_cache, &sp);
260+
// `gc_verify` is limited to single-threaded GC
261+
if (jl_n_gcthreads != 0) {
262+
jl_safe_printf("Warn. GC verify disabled in multi-threaded GC\n");
263+
return;
264+
}
265+
jl_gc_markqueue_t mq;
266+
jl_gc_markqueue_t *mq2 = &ptls->mark_queue;
267+
ws_queue_t *cq = &mq.chunk_queue;
268+
ws_queue_t *q = &mq.ptr_queue;
269+
jl_atomic_store_relaxed(&cq->top, 0);
270+
jl_atomic_store_relaxed(&cq->bottom, 0);
271+
jl_atomic_store_relaxed(&cq->array, jl_atomic_load_relaxed(&mq2->chunk_queue.array));
272+
jl_atomic_store_relaxed(&q->top, 0);
273+
jl_atomic_store_relaxed(&q->bottom, 0);
274+
jl_atomic_store_relaxed(&q->array, jl_atomic_load_relaxed(&mq2->ptr_queue.array));
275+
arraylist_new(&mq.reclaim_set, 32);
252276
lostval = NULL;
253277
lostval_parents.len = 0;
254278
lostval_parents_done.len = 0;
255279
clear_mark(GC_CLEAN);
256280
gc_verifying = 1;
257-
gc_mark_queue_all_roots(ptls, &sp);
258-
gc_mark_queue_finlist(gc_cache, &sp, &to_finalize, 0);
259-
for (int i = 0; i < gc_n_threads; i++) {
281+
gc_mark_queue_all_roots(ptls, &mq);
282+
gc_mark_finlist(&mq, &to_finalize, 0);
283+
for (int i = 0; i < gc_n_threads;i++) {
260284
jl_ptls_t ptls2 = gc_all_tls_states[i];
261-
gc_mark_queue_finlist(gc_cache, &sp, &ptls2->finalizers, 0);
285+
gc_mark_finlist(&mq, &ptls2->finalizers, 0);
262286
}
263-
gc_mark_queue_finlist(gc_cache, &sp, &finalizer_list_marked, 0);
264-
gc_mark_loop(ptls, sp);
287+
gc_mark_finlist(&mq, &finalizer_list_marked, 0);
288+
gc_mark_loop_serial_(ptls, &mq);
265289
int clean_len = bits_save[GC_CLEAN].len;
266290
for(int i = 0; i < clean_len + bits_save[GC_OLD].len; i++) {
267291
jl_taggedvalue_t *v = (jl_taggedvalue_t*)bits_save[i >= clean_len ? GC_OLD : GC_CLEAN].items[i >= clean_len ? i - clean_len : i];
@@ -500,7 +524,7 @@ int jl_gc_debug_check_other(void)
500524
return gc_debug_alloc_check(&jl_gc_debug_env.other);
501525
}
502526

503-
void jl_gc_debug_print_status(void)
527+
void jl_gc_debug_print_status(void) JL_NOTSAFEPOINT
504528
{
505529
uint64_t pool_count = jl_gc_debug_env.pool.num;
506530
uint64_t other_count = jl_gc_debug_env.other.num;
@@ -509,7 +533,7 @@ void jl_gc_debug_print_status(void)
509533
pool_count + other_count, pool_count, other_count, gc_num.pause);
510534
}
511535

512-
void jl_gc_debug_critical_error(void)
536+
void jl_gc_debug_critical_error(void) JL_NOTSAFEPOINT
513537
{
514538
jl_gc_debug_print_status();
515539
if (!jl_gc_debug_env.wait_for_debugger)
@@ -1264,139 +1288,6 @@ int gc_slot_to_arrayidx(void *obj, void *_slot) JL_NOTSAFEPOINT
12641288
return (slot - start) / elsize;
12651289
}
12661290

1267-
// Print a backtrace from the bottom (start) of the mark stack up to `sp`
1268-
// `pc_offset` will be added to `sp` for convenience in the debugger.
1269-
NOINLINE void gc_mark_loop_unwind(jl_ptls_t ptls, jl_gc_mark_sp_t sp, int pc_offset)
1270-
{
1271-
jl_jmp_buf *old_buf = jl_get_safe_restore();
1272-
jl_jmp_buf buf;
1273-
jl_set_safe_restore(&buf);
1274-
if (jl_setjmp(buf, 0) != 0) {
1275-
jl_safe_printf("\n!!! ERROR when unwinding gc mark loop -- ABORTING !!!\n");
1276-
jl_set_safe_restore(old_buf);
1277-
return;
1278-
}
1279-
void **top = sp.pc + pc_offset;
1280-
jl_gc_mark_data_t *data_top = sp.data;
1281-
sp.data = ptls->gc_cache.data_stack;
1282-
sp.pc = ptls->gc_cache.pc_stack;
1283-
int isroot = 1;
1284-
while (sp.pc < top) {
1285-
void *pc = *sp.pc;
1286-
const char *prefix = isroot ? "r--" : " `-";
1287-
isroot = 0;
1288-
if (pc == gc_mark_label_addrs[GC_MARK_L_marked_obj]) {
1289-
gc_mark_marked_obj_t *data = gc_repush_markdata(&sp, gc_mark_marked_obj_t);
1290-
if ((jl_gc_mark_data_t *)data > data_top) {
1291-
jl_safe_printf("Mark stack unwind overflow -- ABORTING !!!\n");
1292-
break;
1293-
}
1294-
jl_safe_printf("%p: Root object: %p :: %p (bits: %d)\n of type ",
1295-
(void*)data, (void*)data->obj, (void*)data->tag, (int)data->bits);
1296-
jl_((void*)data->tag);
1297-
isroot = 1;
1298-
}
1299-
else if (pc == gc_mark_label_addrs[GC_MARK_L_scan_only]) {
1300-
gc_mark_marked_obj_t *data = gc_repush_markdata(&sp, gc_mark_marked_obj_t);
1301-
if ((jl_gc_mark_data_t *)data > data_top) {
1302-
jl_safe_printf("Mark stack unwind overflow -- ABORTING !!!\n");
1303-
break;
1304-
}
1305-
jl_safe_printf("%p: Queued root: %p :: %p (bits: %d)\n of type ",
1306-
(void*)data, (void*)data->obj, (void*)data->tag, (int)data->bits);
1307-
jl_((void*)data->tag);
1308-
isroot = 1;
1309-
}
1310-
else if (pc == gc_mark_label_addrs[GC_MARK_L_finlist]) {
1311-
gc_mark_finlist_t *data = gc_repush_markdata(&sp, gc_mark_finlist_t);
1312-
if ((jl_gc_mark_data_t *)data > data_top) {
1313-
jl_safe_printf("Mark stack unwind overflow -- ABORTING !!!\n");
1314-
break;
1315-
}
1316-
jl_safe_printf("%p: Finalizer list from %p to %p\n",
1317-
(void*)data, (void*)data->begin, (void*)data->end);
1318-
isroot = 1;
1319-
}
1320-
else if (pc == gc_mark_label_addrs[GC_MARK_L_objarray]) {
1321-
gc_mark_objarray_t *data = gc_repush_markdata(&sp, gc_mark_objarray_t);
1322-
if ((jl_gc_mark_data_t *)data > data_top) {
1323-
jl_safe_printf("Mark stack unwind overflow -- ABORTING !!!\n");
1324-
break;
1325-
}
1326-
jl_safe_printf("%p: %s Array in object %p :: %p -- [%p, %p)\n of type ",
1327-
(void*)data, prefix, (void*)data->parent, ((void**)data->parent)[-1],
1328-
(void*)data->begin, (void*)data->end);
1329-
jl_(jl_typeof(data->parent));
1330-
}
1331-
else if (pc == gc_mark_label_addrs[GC_MARK_L_obj8]) {
1332-
gc_mark_obj8_t *data = gc_repush_markdata(&sp, gc_mark_obj8_t);
1333-
if ((jl_gc_mark_data_t *)data > data_top) {
1334-
jl_safe_printf("Mark stack unwind overflow -- ABORTING !!!\n");
1335-
break;
1336-
}
1337-
jl_datatype_t *vt = (jl_datatype_t*)jl_typeof(data->parent);
1338-
uint8_t *desc = (uint8_t*)jl_dt_layout_ptrs(vt->layout);
1339-
jl_safe_printf("%p: %s Object (8bit) %p :: %p -- [%d, %d)\n of type ",
1340-
(void*)data, prefix, (void*)data->parent, ((void**)data->parent)[-1],
1341-
(int)(data->begin - desc), (int)(data->end - desc));
1342-
jl_(jl_typeof(data->parent));
1343-
}
1344-
else if (pc == gc_mark_label_addrs[GC_MARK_L_obj16]) {
1345-
gc_mark_obj16_t *data = gc_repush_markdata(&sp, gc_mark_obj16_t);
1346-
if ((jl_gc_mark_data_t *)data > data_top) {
1347-
jl_safe_printf("Mark stack unwind overflow -- ABORTING !!!\n");
1348-
break;
1349-
}
1350-
jl_datatype_t *vt = (jl_datatype_t*)jl_typeof(data->parent);
1351-
uint16_t *desc = (uint16_t*)jl_dt_layout_ptrs(vt->layout);
1352-
jl_safe_printf("%p: %s Object (16bit) %p :: %p -- [%d, %d)\n of type ",
1353-
(void*)data, prefix, (void*)data->parent, ((void**)data->parent)[-1],
1354-
(int)(data->begin - desc), (int)(data->end - desc));
1355-
jl_(jl_typeof(data->parent));
1356-
}
1357-
else if (pc == gc_mark_label_addrs[GC_MARK_L_obj32]) {
1358-
gc_mark_obj32_t *data = gc_repush_markdata(&sp, gc_mark_obj32_t);
1359-
if ((jl_gc_mark_data_t *)data > data_top) {
1360-
jl_safe_printf("Mark stack unwind overflow -- ABORTING !!!\n");
1361-
break;
1362-
}
1363-
jl_datatype_t *vt = (jl_datatype_t*)jl_typeof(data->parent);
1364-
uint32_t *desc = (uint32_t*)jl_dt_layout_ptrs(vt->layout);
1365-
jl_safe_printf("%p: %s Object (32bit) %p :: %p -- [%d, %d)\n of type ",
1366-
(void*)data, prefix, (void*)data->parent, ((void**)data->parent)[-1],
1367-
(int)(data->begin - desc), (int)(data->end - desc));
1368-
jl_(jl_typeof(data->parent));
1369-
}
1370-
else if (pc == gc_mark_label_addrs[GC_MARK_L_stack]) {
1371-
gc_mark_stackframe_t *data = gc_repush_markdata(&sp, gc_mark_stackframe_t);
1372-
if ((jl_gc_mark_data_t *)data > data_top) {
1373-
jl_safe_printf("Mark stack unwind overflow -- ABORTING !!!\n");
1374-
break;
1375-
}
1376-
jl_safe_printf("%p: %s Stack frame %p -- %d of %d (%s)\n",
1377-
(void*)data, prefix, (void*)data->s, (int)data->i,
1378-
(int)data->nroots >> 1,
1379-
(data->nroots & 1) ? "indirect" : "direct");
1380-
}
1381-
else if (pc == gc_mark_label_addrs[GC_MARK_L_module_binding]) {
1382-
// module_binding
1383-
gc_mark_binding_t *data = gc_repush_markdata(&sp, gc_mark_binding_t);
1384-
if ((jl_gc_mark_data_t *)data > data_top) {
1385-
jl_safe_printf("Mark stack unwind overflow -- ABORTING !!!\n");
1386-
break;
1387-
}
1388-
jl_safe_printf("%p: %s Module (bindings) %p (bits %d) -- [%p, %p)\n",
1389-
(void*)data, prefix, (void*)data->parent, (int)data->bits,
1390-
(void*)data->begin, (void*)data->end);
1391-
}
1392-
else {
1393-
jl_safe_printf("Unknown pc %p --- ABORTING !!!\n", pc);
1394-
break;
1395-
}
1396-
}
1397-
jl_set_safe_restore(old_buf);
1398-
}
1399-
14001291
static int gc_logging_enabled = 0;
14011292

14021293
JL_DLLEXPORT void jl_enable_gc_logging(int enable) {

0 commit comments

Comments
 (0)