You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
First of all, this is a wonderful project and lots of fun to play with — thanks for all the hard work that went into it!
I’m wondering about what decides total memory use, particularly on iPhones. At the moment this implementation works well on iOS as seen in the Objective-C example. But the memory use (> 500MB for the base model) is obviously on the high side for anything but professional apps (ie. apps like Photoshop, more likely to be running on iPad anyway, where users will use them for a long period for specific pieces of work, and be more forgiving if all their other apps get terminated by the system).
On a high level, what are the constraints on total memory usage? Is it basically a fixed quantity relating to the work the encoder has to do? Is there any prospect of it coming down much in future, using quantisation or other techniques? Would future use of GPUs (or perhaps even the Apple Neural Engine) reduce the memory requirement, or would that only relate to a speedup in processing time? I’m really just trying to get a rough idea of what levers exist to be pulled, if any.
Thanks again!
The text was updated successfully, but these errors were encountered:
We currently use a total of 506 MB, but we really need 140 MB to store the model and ~23 MB to store the KV cache (i.e. memory). The rest of the memory usage currently goes to store the intermediate tensors that are created by ggml during the inference. This is because we maintain the entire computation graph. But technically, we don't need it.
It will take some modifications in ggml to support this. Probably not an easy task at the moment for anyone else other than me, due to lack of good documentation of how the library works.
But yes - in theory, the mem usage can be reduced.
First of all, this is a wonderful project and lots of fun to play with — thanks for all the hard work that went into it!
I’m wondering about what decides total memory use, particularly on iPhones. At the moment this implementation works well on iOS as seen in the Objective-C example. But the memory use (> 500MB for the base model) is obviously on the high side for anything but professional apps (ie. apps like Photoshop, more likely to be running on iPad anyway, where users will use them for a long period for specific pieces of work, and be more forgiving if all their other apps get terminated by the system).
On a high level, what are the constraints on total memory usage? Is it basically a fixed quantity relating to the work the encoder has to do? Is there any prospect of it coming down much in future, using quantisation or other techniques? Would future use of GPUs (or perhaps even the Apple Neural Engine) reduce the memory requirement, or would that only relate to a speedup in processing time? I’m really just trying to get a rough idea of what levers exist to be pulled, if any.
Thanks again!
The text was updated successfully, but these errors were encountered: