Skip to content

Confusing behaviour when ListFields are set to empty list #267

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
gareth-lloyd opened this issue Apr 3, 2013 · 27 comments · May be fixed by #2517
Open

Confusing behaviour when ListFields are set to empty list #267

gareth-lloyd opened this issue Apr 3, 2013 · 27 comments · May be fixed by #2517

Comments

@gareth-lloyd
Copy link
Contributor

import mongoengine
class TestDoc(mongoengine.Document):
    list_field = mongoengine.ListField()

mongoengine.connect('test')
test_doc = TestDoc.objects.create()

After this operation, in spite of the fact that list_field was not given any value, the BSON document in the collection has a corresponding field initialized to an empty list:

cluster-1:PRIMARY> use test
switched to db test
cluster-1:PRIMARY> db.test_doc.find()                                                                                              
{ "_id" : ObjectId("515c24a4c02c1c58d59f02c0"), "_types" : [ "TestDoc" ], "list_field" : [ ], "_cls" : "TestDoc" }   

I understand why initializing to empty list could be a good idea - it allows subsequent list operations on that field to succeed. However, in the light of this, the following behaviour seems wrong to me:

test_doc.list_field = []
test_doc.save()

This call causes mongoengine to issue an $unset command, removing list_field from the BSON document:

cluster-1:PRIMARY> db.test_doc.find()                                                                                              
{ "_id" : ObjectId("515c24a4c02c1c58d59f02c0"), "_types" : [ "TestDoc" ], "_cls" : "TestDoc" }

So ListFields, unlike other field types, get initialized to empty lists when no value is given, but setting a ListField to empty list causes it to be removed from the underlying BSON document.

@tarass
Copy link

tarass commented Apr 3, 2013

I would much rather lists not be auto serialized as empties into the database. Though, it is a bit of a consistency issue with how mongo handles lists, especially as they apply to indexes.
Currently mongo treats empty array as A value and adds it to even a sparse index. You don't need an array to exist before you $push to an array, but $pull leaves an empty array in place.

That said I would still rather have a non-existant entry until content is added.

@rozza
Copy link
Contributor

rozza commented Apr 11, 2013

This issue seesaws alot - once 0.8 is out I will review for 0.9

@mmellison
Copy link
Contributor

I would also like to see this changed to be more consistent. There are situations where I am removing entries from a list, but require an empty list to be kept in the database.

As of right now, I have to detect if I am removing the last element from the list and if so, use the update method to set the field as an empty list.

@MRigal
Copy link
Member

MRigal commented Jun 11, 2015

@seglberg I'll remove the 0.10 milestone on this one, when you have time, you're very welcome to submit something on the topic

@MRigal MRigal removed this from the 0.10 milestone Jun 11, 2015
@MRigal
Copy link
Member

MRigal commented Jun 17, 2015

A similar case is also the issue #938

@wojcikstefan
Copy link
Member

This is not an issue anymore - test_doc.list_field = [] doesn't unset the list in the database:

In [3]: class Doc(Document):
        nums = ListField(IntField())
   ...:

In [4]: doc = Doc.objects.create()

In [5]: Doc._get_collection().find_one({ '_id': doc.pk })
Out[5]: {u'_id': ObjectId('58448163c6e47b87154ecc6c'), u'nums': []}

In [6]: doc.nums = []

In [7]: doc.save()
Out[7]: <Doc: Doc object>

In [8]: Doc._get_collection().find_one({ '_id': doc.pk })
Out[8]: {u'_id': ObjectId('58448163c6e47b87154ecc6c'), u'nums': []}

@andrewsosa
Copy link

andrewsosa commented Feb 25, 2017

Hello! Is this fix included in release v0.11.0? I'm still having this issue on the latest release, also with flask-mongoengine 0.9.2. Thanks!

I'm using a ListField of ReferenceFields.

@wojcikstefan
Copy link
Member

@andrewsosa001 could you provide some code that reliably reproduces your issue?

@andrewsosa
Copy link

andrewsosa commented Feb 28, 2017

@wojcikstefan Here's a sample script I wrote to demonstrate:

from app import db # Import MongoEngine instance from Flask app instance. 

class Doc(db.Document):
    refs = db.ListField(db.ReferenceField('Doc'))

# Create some docs
d0 = Doc().save()
d1 = Doc().save()
d2 = Doc().save()

# How many Docs have refs = []
print Doc.objects.filter(refs__size=0).count() # prints 3

# Add references for d1 and d2 to d0.
d0.refs.append(d1)
d0.refs.append(d2)
d0.save()

# Check reference counts in DB
print Doc.objects.filter(refs__size=0).count() # prints 2
print Doc.objects.filter(refs__size=2).count() # prints 1

# Remove the references from d0
d0.refs.pop()
d0.refs.pop()
d0.save()

# Check reference count in DB
print Doc.objects.filter(refs__size=0).count()       # prints 2
print Doc.objects.filter(refs__exists=False).count() # prints 1

I've also visually inspected my MongoDB instance using Robomongo to confirm my findings.

Thanks!

@wojcikstefan
Copy link
Member

Ah, thanks @andrewsosa001, that makes it clearer. Proof I have shown in #267 (comment) is invalid then - it only worked because I was changing the value from an implicit [] to an explicit [], which doesn't result in a changed field and hence doesn't update it in the MongoDB doc. A quick test reveals that any operation that actually updates the list to be empty unsets it from the MongoDB doc:

In [35]: from mongoengine import *

In [36]: connect('testdb')
Out[36]: MongoClient('localhost', 27017)

In [37]: class Doc(Document):
    ...:     nums = ListField(IntField())
    ...:

In [38]: doc = Doc.objects.create()

In [39]: Doc._get_collection().find_one({ '_id': doc.pk })  # empty list as expected
Out[39]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d'), u'nums': []}

In [40]: doc.nums.append(1)

In [41]: doc.save()
Out[41]: <Doc: Doc object>

In [42]: Doc._get_collection().find_one({ '_id': doc.pk })  # non-empty list as expected
Out[42]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d'), u'nums': [1]}

In [43]: doc.nums = []

In [44]: doc.save()
Out[44]: <Doc: Doc object>

In [45]: Doc._get_collection().find_one({ '_id': doc.pk })  # WRONG, expected `'nums': []`
Out[45]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d')}

In [46]: doc.nums.append(1)

In [47]: doc.save()
Out[47]: <Doc: Doc object>

In [48]: Doc._get_collection().find_one({ '_id': doc.pk })  # non-empty list as expected
Out[48]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d'), u'nums': [1]}

In [49]: doc.nums.pop()
Out[49]: 1

In [50]: doc.save()
Out[50]: <Doc: Doc object>

In [51]: Doc._get_collection().find_one({ '_id': doc.pk })  # WRONG, expected `'nums': []`
Out[51]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d')}

In [53]: doc.nums.append(1)

In [54]: doc.save()
Out[54]: <Doc: Doc object>

In [55]: Doc._get_collection().find_one({ '_id': doc.pk })  # non-empty list as expected
Out[55]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d'), u'nums': [1]}

In [56]: doc.nums.remove(1)

In [57]: doc.save()
Out[57]: <Doc: Doc object>

In [58]: Doc._get_collection().find_one({ '_id': doc.pk })  # WRONG, expected `'nums': []`
Out[58]: {u'_id': ObjectId('58b5922fc6e47bbb11794c6d')}

@wojcikstefan wojcikstefan reopened this Feb 28, 2017
@wojcikstefan
Copy link
Member

Now, one last question we have to answer is why we expect an empty list to be stored in the database in the cases above.

An argument can be made for auto-unsetting of the field actually being beneficial in terms of storage. However, in such case we should also handle document creation correctly so that it doesn't set nums: [] by default when using Doc.objects.create(). Creation w/o the field is much more important here, because updates unsetting the field don't really make that extra space salvageable on the storage level until database compaction. There's also a certain cleanliness to the approach where your document only contains non-empty fields and it fits nicely with the non-relational paradigm of MongoDB.

On the other hand, always storing an empty list makes for easier querying where you can always be sure that querying by Doc.objects(nums=[]) or by Doc.objects(nums__size=0) will return the expected results. It also is more intuitive when you inspect your raw docs in MongoDB ("why is it empty" is a much easier question to reason about than "why doesn't it exist").

I'm personally leaning towards the latter argument of always storing the empty lists. What do you think @lafrech @thedrow @andrewsosa001 @gukoff ?

@andrewsosa
Copy link

@wojcikstefan I think it actually makes more sense to have empty lists automatically removed (what is currently happening). You made good points in your argument for that behavior.

It would be easy to adjust for in terms of querying, instead of the Docs.objects(nums=0), you would instead be looking for Docs.objects(nums__exists=False). On the flip side, it would be more consistent, e.g. Doc.objects(nums__size=n), where n could be 0, 1, 2, etc... .

Ultimately, I think as long as the approach is consistent document creation behavior (i.e., right now new documents with empty list fields initialize to []), either approach would be viable.

@gukoff
Copy link
Contributor

gukoff commented Mar 4, 2017

An argument can be made for auto-unsetting of the field actually being beneficial in terms of storage.

This is true. However, there's always a tradeoff between simplicity and performance, and for sure one should keep away of the premature optimization. This heuristic seems like one, because we improve the memory issue without being explicitly asked to. And this improvement implies the simplicity/maintainability/transparency degradation, because you have to maintain the introduced complexity.

There's also a certain cleanliness to the approach where your document only contains non-empty fields and it fits nicely with the non-relational paradigm of MongoDB.

Is it? Implicitly having the same field in 2 possible states with different behaviour {non-existent/non-empty array} reminds me only of the nullable references aka the Billion Dollar Mistake.

I think that transparency is important. Mongoengine isn't a service, which uses the database as its private backend. It doesn't even hide the raw MongoDB queries to the full. It is a tool, which helps a user work with the database. Not the database helps a user work with mongoengine(which might be a rather convenient scenario). Therefore, all the tricks with the storage should be explicit.

A hyperbolized example of a similar heuristic can be a compression of the large strings. It would make storage more efficient, but the string indexes useless.

@wojcikstefan
Copy link
Member

This is wonderfully put, thank you for taking the time to write it @gukoff ! I agree with all of your points.

Clearly current implementation is already causing confusion, as proven by this issue. We could try to tweak it and fix some of its problems, but it will always have certain edge cases. For example, Doc.objects.update(pull__nums=last_remaining_value_in_a_given_doc) will leave you with an empty list and attempting to unset it in a "smart" implicit way would be way too hacky and auto-magical. We're just asking for trouble/confusion/code complexity.

In the future, we should go with the more explicit (and thus more intuitive) solution. The only "magic" I'm a fan of is to default to an empty list in the MongoEngine object if the document we loaded from the database doesn't contain a specific list field or if that field is set to null. This way, we always ensure that it's possible to iterate over that field w/o doing any extra checks. This is quite intuitive and consistent with past behavior.

@wojcikstefan
Copy link
Member

Small note wrt consistency among different field types: When you assign a None value to a particular field, we unset it from the MongoDB doc (see the code below). This behavior seems pretty natural to me and I'd probably want to preserve it. However, stuff like doc.list_field.pop()/remove() is already one step further in terms of implicitness. The question that remains is: Does it really make sense to set empty complex values (e.g. lists or dicts) in the doc upon creation? I'm leaning towards a yes because, unless a field is explicitly unset, it ensures we'll always be able to query for docs with empty lists via db.doc.find({nums: [] }).

@HakuG
Copy link

HakuG commented Aug 26, 2018

Did this ever get resolved? I feel like there should at least be a way to make sure the field is not unset, preferably when defining the document's fields.

@anlauren
Copy link

Same opinion here, was that ever resolved? As it turns out, unsetting the fields seems to not cascade on the CachedReferenceFields as well

@alaminKutumbita
Copy link

alaminKutumbita commented Mar 2, 2019

I found a nice little work around this issue. You can use this
test_doc.save()
test_doc.update(add_to_set__list_field = [])
it will update the reference with the empty list field

@HakuG
Copy link

HakuG commented Mar 4, 2019 via email

@anlauren
Copy link

anlauren commented Mar 4, 2019

The problem @alaminKutumbita is that no post_save hooks is triggered on update, moreover it doesnt update the cached reference fields :/

@neilgoldman
Copy link

Having a similar issue when setting a DictionaryField to an empty object/dict, the field becomes unset. But the field has required=True so now the document fails validation on loading. I'm able to get around it by using pymongo functions, but am worried that later something will save the document and cause mongoengine to unset it again.

@zellsun
Copy link

zellsun commented Jul 23, 2019

You'll get different results when you query a collection or call .first() at this query.

For the whole collection, those unset fields are missing.
But magically, .first() will return you default values for those unset fields.

I rely on this behavior as my workaround, that is I loop the whole results, and for every single iteration the default values are there. (However, the unset fields have been unset and no data in db actually.)

@vivektweeny
Copy link

Hi, In the model definition add default value like this "users = fields.ListField(default=['text/html'])",
it will set empty list [ ] at time of updating the document.

@vivektweeny
Copy link

import mongoengine
class TestDoc(mongoengine.Document):
    list_field = mongoengine.ListField()

mongoengine.connect('test')
test_doc = TestDoc.objects.create()

After this operation, in spite of the fact that list_field was not given any value, the BSON document in the collection has a corresponding field initialized to an empty list:

cluster-1:PRIMARY> use test
switched to db test
cluster-1:PRIMARY> db.test_doc.find()                                                                                              
{ "_id" : ObjectId("515c24a4c02c1c58d59f02c0"), "_types" : [ "TestDoc" ], "list_field" : [ ], "_cls" : "TestDoc" }   

I understand why initializing to empty list could be a good idea - it allows subsequent list operations on that field to succeed. However, in the light of this, the following behaviour seems wrong to me:

test_doc.list_field = []
test_doc.save()

This call causes mongoengine to issue an $unset command, removing list_field from the BSON document:

cluster-1:PRIMARY> db.test_doc.find()                                                                                              
{ "_id" : ObjectId("515c24a4c02c1c58d59f02c0"), "_types" : [ "TestDoc" ], "_cls" : "TestDoc" }

So ListFields, unlike other field types, get initialized to empty lists when no value is given, but setting a ListField to empty list causes it to be removed from the underlying BSON document.

Hi, In the model definition add default value like this "users = fields.ListField(default=['text/html'])",
it will set empty list [ ] at time of updating the document.

@bagerard
Copy link
Collaborator

FYI I started working on fixing this so that MongoEngine won't unset list/dict fields as soon as the list/dict is empty

@hieubz
Copy link

hieubz commented May 8, 2021

I got the same issue at v0.23.1.

@zoebriois
Copy link

Hello, any updates ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.