Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Show HN: A pure reference counting GC in Go (github.com/sendilkumarn)
57 points by sendilkumarn on Jan 15, 2020 | hide | past | favorite | 39 comments



By the way concurrent algorithm described in paper has couple of bugs that were found and corrected by dcodeIO during prototyping this GC/RC for AssemblyScript:

https://github.com/dcodeIO/purerc


Yeah I think this one has that fix. Infact, it is inspired from purerc :)


Just curious: Does this do anything to avoid the need to explicitly close resources in a garbage collected environment? (IE, the Disposable pattern in C# / .Net)

Specifically, when I work in a reference-counted environment, if an object has a resource, like an open file, or a socket, I don't need to do anything special when I'm done with that object. The reference count predictably drops when I'm done with the object, and its destructor cleans up its resources.

In contrast, because garbage collection is non-deterministic, in a garbage collected environment, I have to use extra special care when an object uses a resource (file, socket, ect,) that needs to be released at a predictable moment.

Because: If, in a garbage collected environment, the rule of thumb is to "make sure you don't have circular references to your resources, (files, sockets, ect,)" this would dramatically simplify resource (not memory) management!


Even in a reference counted or RAII environment, you still have to think about this. Your C++ function might do something with a file (or other resource) and then a bunch of other stuff, or vice-versa. If you care about holding that resource open for as little time as possible, you need to consciously introduce a new anonymous scope where the resource object and operations are contained (or, of course, you could break it out into another function).


Quite cool. But I miss the part where you can actually use the objects themselves to store something in. What's your goal here? To have a real usable gc, becuase if so, it should probably allocate outside the Go heap, and directly from the OS? Also the fields should probably be packed:

   Name      string
   Rc        int
   Crc       int
   Color     string
   Buffered  bool
   Children  []*Object
   Freed     bool
   isAcyclic bool
as those seem to have quite a bit of overhead for an object.


I completely agree. My goal here is to learn and debug what is happening. (intention is to backport this into tinygo)

There are a plenty of steps to make it into usable gc.


Ah that makes sense. I did the "non thread safe" version of this paper for our webassembly support (Because at the time, Boehm didn't support it) https://github.com/remobjects/IslandRTL/blob/master/Source/S...

But the threadsafe one does have a lot of raw edges, and I couldn't get that one working.


Thats cool.

I think assemblyscript has implemented it. Something you might be interested to explore


yeah. But for multithreaded I figure there's little chance I can do better than Boehm (especially considering the testing involved in this).


The code doesn't build and is extremely racy.

edit: I fixed the import cycles and got it to build, but now the tests don't pass. The more of this code I read ... the worse it gets. Why is this on top of HN?


This is quite cool. I love seeing new research into reference counting systems since they seem to have fallen out of favor vs mark/sweep and such (ARC notwithstanding).


They were never in favor, because any reference counting implementation that matches tracing GC in performance, specially on multi-core machines, ends up being a tracing GC in disguise.


One of the biggest OSes today is running nearly entirely on reference counting, so I think it's safe to say they are actually in favour.


If you mean iOS, it is more a side effect of the project failure to add GC to Objective-C than anything else.

Also reference counting only applies to a tiny subset of Objective-C data structures, classes derived from NSObject using Cocoa retain/release patterns.

If you add Swift to the soup, there is still quite some work to do regarding performance,

https://github.com/ixy-languages/ixy.swift/blob/master/perfo...

The other, even bigger OS, is running on tracing GCs.

And Windows is running on a mix of reference counting and tracing GC.


Isn’t one of the benefits of RC better (lower and more predictable) latency than with GC? Basically, you can presict when (and how much) garbage will be collected, and if it turns out that’s too much, you can fix your code... good luck doingthat with GC.

The downside is, of course, that it requires much more care (to avoid cycles)


On single core without threading.

If you add multi-threading and multi-core, there is lock contention for every single memory access to update the related counter.

Additionally, if there is a big data structure in use, reaching zero will trigger a cascade deletion events reaching stop-the-world like effect.

Which, depending on how the deletion was implemented, might trigger a stack overflow in very deep structures.

Herb Sutter has a very interesting talk about the bad issues of reference counting and how to work around them.

CppCon 2016: Herb Sutter “Leak-Freedom in C++... By Default.”

https://www.youtube.com/watch?v=JfmTagWcqoE

There are multiple ways to optimize reference counting, like deferred counting, cycle collectors and a couple of others, which when you are done with all them, ends up being quite similar to having a tracing GC anyway.


> If you add multi-threading and multi-core, there is lock contention for every single memory access to update the related counter.

You only have to update the counter when creating (or destroying) new "owning" references that might outlive any other reference to the object. This is a rare operation, and doing it potentially from multiple threads in a way that requires atomic update is even rarer.

> Additionally, if there is a big data structure in use, reaching zero will trigger a cascade deletion events reaching stop-the-world like effect.

This is a side effect of deterministic disposal, of course. And it only happens when freeing up large object subgraphs, not just any and all "large data structures".

Yes, C++ implements reference counting in a non-ideal way. Swift is also quite bad. But one can do better.


When one does better it usually ends up with some form of tracing GC implementation, as per most CS research papers, it just gets to call it reference counting with optimizations to feel better.


That's definitely true of some enhanced variations on reference counting. The changes I mentioned though do not involve any tracing of live data at runtime. (Although once one introduces the possibility of general cycles, it seems hard to avoid the requirement of tracing some live data.)


> Additionally, if there is a big data structure in use, reaching zero will trigger a cascade deletion events reaching stop-the-world like effect.

Even a deletion cascade won't stop the world: only the single thread which reached zero will be "stopped". While in tracing GC, a "stop-the-world" stops all threads, even the ones which aren't doing any allocation or deallocation at the moment.


There are concurrent GC's that don't pause all threads while running garbage collection. Plus the whole point of GC is that it's what you use when you just don't know at runtime which operations should involve "allocation or deallocation" of objects. From the POV of tracing GC, any "new" reference to an object might be the reference that extends its lifetime and prevents it being collected, and conversely when removing a reference.


Those GCs are special exceptions. Most modern GCs just aim to have bearably low pause times.


High performance tracing GC algorithms don't stop all threads.

Also there are tracing GC that operate with local heaps.


People routinely profile and optimize the amount of memory churn in GC languages, and many non-RC GCs provide good telemetry about it.

In RC systems you don't escape latency spikes from cascading deallocations and nondeterministic malloc/free call latencies. The malloc impl underlying RC still needs to occasionally work quite a lot to recycle freed memory, just with additional constraints compared to non-RC GCs.


Apple didn't fail to add GC. They successfully added it and then took it out again. Why? Because ARC gets you 99% of the way there while consuming less CPU and being strictly deterministic -- and the biggest Objective-C platform in the world has to run without a hitch or glitch on tiny, battery-powered devices.

Adding GC to the Android platform was a stupid move. The Android UI is still janky compared to iOS, and GC is not the only reason why, but it's a significant reason. Android devices also eat battery like it's going out of style compared to iPhones and iPads. Again, GC is not the only reason for this but it factors in.


Oh they did fail indeed, Objective-C libraries were crashing left and right due to underlying C semantics and mixing frameworks compiled with and without GC enabled.

So they did the only possible exit, automate the manual retain / release Cocoa patterns on the compiler, and sell a marketing story about ARC and how GC is bad.

Pity that Borland and Microsoft haven't done the same about their compiler support for COM RC, done years before.

Android UI is janky due to bad coding practices, specially doing too much on UI thread.

Our phones are super computers compared with Lisp Machines.

If you want to start counting devices sold, it is pretty clear that tracing GC has won, specially when we had any kind of device with a web browser on it.


Pausless GCs are real and exist.


Oh wow, thanks for that link. I knew ARC was slow, I didn’t know it was THAT slow. Swift code reviewed by three programmers, no dynamic dispatch, no low-hanging fruit to optimize, and still much slower than even Golang.


I don’t feel like investigating that, nor do I have the hardware, but I think spending ¾ of time in reference count updates is really high.

https://iacoma.cs.uiuc.edu/iacoma-papers/pact18.pdf has 42% on average, and halves that using their modification.


Are you talking about linux? I thought the kernel used refcounting?


Android


Reference counting may have flexibility advantages compared to tracing GC. Obligate reference counting (ala ObjC/Swift) vs. obligate tracing GC is not the only relevant comparison.


The only flexibility is that is easier to implement, whence why it is usually the first GC algorithm one learns, and among the initial chapters of GC algorithms overview books.


The algorithm used here is a combination of both rc and tracing, it's not simple.


Thus making the point that in the end it always ends up being some form of tracing GC when performance matters.


Right. To make RC work like a tracing GC (performant and without failing on circular references); it's going to be a combination. That said, there are a lot of interesting ideas in this that might be quite worth it. Like the article referenced, doesn't do a full heap scan, but makes a note of what is worth looking at.


Has this been proven?

If not, there's room for more research.



Multiple times, you just need to search for a couple of CS papers, including one I link below.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: