Skip to content

Memory for fetched message attachments is never released which results in memory leak #531

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
robkleinee opened this issue Nov 17, 2024 · 7 comments

Comments

@robkleinee
Copy link

robkleinee commented Nov 17, 2024

A long-running script which fetches and processes messages with attachments from a mailbox with this IMAP package results eventually in a 'memory size of ... bytes exhausted'.

I think this has something to do with the fetched attachments. It seems that these attachments somehow are never released from memory?

A workaround is to set manually the attachments to an empty collection after the attachments are processed.

Code to duplicate the bug (be sure to have emails with attachments in the mailbox)

Package version: 5.5.0
PHP: 8.1.x

$inboxFolder = $client->getFolder('inbox');

while (true) {
    echo 'Memory usage: ' . memory_get_usage() . PHP_EOL;

    $messages = $inboxFolder
        ->messages()
        ->all()
        ->limit(1)
        ->get();

    // Process the messages
}

Workaround (Extended on 2025-05-08)

$inboxFolder = $client->getFolder('inbox');

while (true) {
    echo 'Memory usage: ' . memory_get_usage() . PHP_EOL;

    $messages = $inboxFolder
        ->messages()
        ->all()
        ->limit(1)   // The workaround only works when only one message is retrieved and processed at at time
        ->get();

    $messages->each(
        function ($message) {
            // TODO: Process the message

            // Optionally move the message to another folder/mailbox
            $movedMessage = $message->move('name_of_mailbox');

            // Workaround to fix memory issue, for both message and the moved message
            $movedMessage->setAttachments(AttachmentCollection::make([]));
            $message->setAttachments(AttachmentCollection::make([]));
        }
    );
}

Am I doing something wrong? Or is there a memory leak?

Thanks in advance!

PS: It seems like this package is abandoned?

@veroca88
Copy link

I have the same issue!
Package version: 4.1.2
PHP: 8.1.x

@wurst-hans
Copy link
Contributor

@robkleinee @veroca88 I also have the problem that Webklex uses a lot of memory. I have created an IMAP account for a test. It contains 7 e-mails, 3 of which have attachments with a total size of 20 MB (1x 10MB, 2x 5MB). This is enough to exceed a memory limit of 128 MB in PHP.

$manager = new ClientManager([]);
$transport = $manager->make([
    'host' => $settings->getIncomingHost(),
    'port' => $settings->getIncomingPort(),
    'encryption' => $encryption,
    'validate_cert' => true,
    'username' => $settings->getIncomingUser(),
    'password' => $settings->getIncomingPassword(),
    'protocol' => $settings->getInboxType(),
    'timeout' => self::TIMEOUT,
]);
@$transport->connect();

$folder = $transport->getFolderByPath('INBOX');
$messages = $folder->query()->whereAll()->get();
dd(memory_get_usage(), memory_get_peak_usage());

This returns 134MB for memory_get_usage() and 145MB for memory_get_peak_usage(). This is definitely too much. This is only a small mailbox.
I have now spent many hours trying to find the cause. I found the suggestion on #531, but it only brings a minimal improvement.

Memory consumption only seems to be a problem with (larger) attachments, because the data is stored multiple times in memory (for example as property $raw and $content). I have found two approaches to massively reduce memory consumption.

At the end of the constructor of the Attachment class, I discard the local property $message and $part:

class Attachment {
    /**
     * Attachment constructor.
     * @param Message $message
     * @param Part $part
     * @throws DecoderNotFoundException
     */
    public function __construct(Message $message, Part $part) {
        [...]
        unset($this->message);
        unset($this->part);
    }
}

I know that this is not "clean" and can cause further problems. At least in my case it doesn't cause any problems. With this adjustment I get only 68MB for memory_get_usage() and 130MB for memory_get_peak_usage().

Furthermore, I do not retrieve messages all at once, but iterate over the messages with the following code:

$query = $folder->query();
$list = $folder->overview($sequence = "1:*");
foreach (array_keys($list) as $uid) {
    $message = $query->getMessageByUid($uid);
}

This results in 13MB for memory_get_usage() and 117MB for memory_get_peak_usage(). However, this only works in combination with the adaptation of the attachment class. Without adapting the class, nothing changes in terms of memory consumption. So, although each message is retrieved individually, there are probably so many cross-references between message and attachment that everything is still kept in memory.

I have no idea how best to simplify all this, but the library definitely needs to be customized and can no longer hold such large amounts of data in memory. And it's hard to apply the patches on my own (it requires extending a lot of classes).

@Webklex
Copy link
Owner

Webklex commented May 7, 2025

Hi @wurst-hans thanks for investigating and sharing what you've found. If you would like to tackle the core issue, please feel free to do so, even if it involves changes to several classes. I believe the benefits would be rather substantial - perhaps something for a major version jump to v7?

@wurst-hans
Copy link
Contributor

I've been trying to understand the code better for a few hours now and I'm making progress, but slowly. I don't have much time at the moment so I'm focusing on optimizing my problems. I am very grateful for your library and of course try to give something back to the community. However, my customizations are very experimental and I still don't understand PHP IMAP well enough.

The recommended customization of #531 does more than I thought. However, it makes a decisive difference whether you retrieve all messages at once with

$messages = $folder->query()->whereAll()->get();

or whether you only fetch the headers and retrieve each message individually with

$list = $folder->overview($sequence = "1:*");
foreach (array_keys($list) as $uid) {
    $message = $query->getMessageByUid($uid);
    if ($message->hasAttachments()) {
        // do something
    }
    $message->setAttachments(AttachmentCollection::make());
}

This reduces memory consumption considerably. You only have to empty the attachments once after each access to an e-mail so that the memory consumption remains low.

My suggestion is to check whether all properties/methods are really being used. I could imagine that you don't really need a $message/getMessage() on the attachment. With regard to the large file attachments, the Part/Structure class is also a big problem because it stores $raw and $content, among other things. This class is also referenced in the attachment as $part. This means that a file attachment of an e-mail is stored at least three times in the memory and probably even a fourth time in the Message class as $structure.

Of course it's nice to have as much raw data as possible on the objects, but not in the case of large attachments (or hundreds of emails with small attachments, which in total also leads to problems). If I come up with something better, I'll get back to you.

@robkleinee
Copy link
Author

robkleinee commented May 8, 2025

@wurst-hans Thanks for your investigation.

I think there are two seperate problems concerning memory usage:

  1. A high amount of memory is used when processing mails with attachments.
  2. Some objects regarding the attachments are never destroyed, which eventually leads to a crash due to the memory leak.

I opened this issue because of problem number 2. I think that's a bigger problem than problem number 1.

The workaround I suggested in my openings post is working for me. It's running for months now in a production environment, without crashes.

I think the problem has something to do with a circular dependency between some attachment objects, so the garbage collector can't destroy them when the message is destroyed. But at the moment I don't have the time to dive into it.

Some more information about the workaround:

  • The workaround only works when only one message is retrieved and processed at a time (that's why the limit(1) is there).
  • When optionally moving the message to another folder, the workaround should also be applied on the moved message.

So in total (I also editted my openings post):

$inboxFolder = $client->getFolder('inbox');

while (true) {
    echo 'Memory usage: ' . memory_get_usage() . PHP_EOL;

    $messages = $inboxFolder
        ->messages()
        ->all()
        ->limit(1)   // The workaround only works when only one message is retrieved and processed at at time
        ->get();

    $messages->each(
        function ($message) {
            // TODO: Process the message

            // Optionally move the message to another folder/mailbox
            $movedMessage = $message->move('name_of_mailbox');

            // Workaround to fix memory issue, for both message and the moved message
            $movedMessage->setAttachments(AttachmentCollection::make([]));
            $message->setAttachments(AttachmentCollection::make([]));
        }
    );
}

@wurst-hans
Copy link
Contributor

wurst-hans commented May 8, 2025

Thanks for clarification. I still prefer using overview() now, because I'm not going to process each message. And even with option setFetchBody(false) the basic information retrieval via ->overview() is much faster than ->messages()->all().

I agree, that this might have to do with some circular references not being cleaned up. Nevertheless, the message data is kept at least 4 times in memory (sometimes as raw data, sometimes as already decoded data) for every message. But, yes, problem number two might have higher criticality. And I still haven't found the reason for that, really crazy.

@robkleinee
Copy link
Author

I think the most simple (lame?) solution could be to make a destructor on 'Message' class and clear the attachments of the message. That should fix the memory leak I think.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants