Skip to content

Apply defaults to fields with None value at 'set' time. #349

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Jun 6, 2013

Conversation

nigelmcnie
Copy link
Contributor

If a field has a default, and you explicitly set it to None, the behaviour before this patch was very confusing:

class Person(Document):
    created = DateTimeField(default=datetime.datetime.utcnow)

>>> p = Person(created=None)
>>> p.created
datetime.datetime(2013, 5, 30, 0, 18, 20, 242628)
>>> p.created
datetime.datetime(2013, 5, 30, 0, 18, 20, 995248)
>>> p.created
datetime.datetime(2013, 5, 30, 0, 18, 21, 370578)

It would be stored as None, and then at 'get' time, the default would be applied. As you can see, if the default is a generator, this leads to some crazy behaviour.

There's an argument that if I asked it to be set to None, why not respect that? But I don't think that's how the rest of mongoengine seems to work (for example, setting a field to None seems to mean it doesn't even get set in mongo - as opposed to being set but with a 'null' value). Besides, as the code shows above, you'd expect p.created to return None. So clearly, mongoengine is already expecting None to mean 'default' where a default is available.

This bug also interacts nastily with required=True - if you're forcibly setting the field to None, then at validation time, the None will fail validation despite a perfectly valid default being available.

With this patch, when the field is set, the default is immediately applied. This means any generation happens once, the getter always returns the same value, and 'required' validation always respects the default.

Note: this breakage seems to be new since mongoengine 0.8.

If a field has a default, and you explicitly set it to None, the
behaviour before this patch was very confusing:

    class Person(Document):
        created = DateTimeField(default=datetime.datetime.utcnow)

    >>> p = Person(created=None)
    >>> p.created
    datetime.datetime(2013, 5, 30, 0, 18, 20, 242628)
    >>> p.created
    datetime.datetime(2013, 5, 30, 0, 18, 20, 995248)
    >>> p.created
    datetime.datetime(2013, 5, 30, 0, 18, 21, 370578)

It would be stored as None, and then at 'get' time, the default would be
applied. As you can see, if the default is a generator, this leads to some
crazy behaviour.

There's an argument that if I asked it to be set to None, why not respect that?
But I don't think that's how the rest of mongoengine seems to work (for
example, setting a field to None seems to mean it doesn't even get set in mongo
- as opposed to being set but with a 'null' value). Besides, as the code shows
above, you'd expect p.created to return None. So clearly, mongoengine is
already expecting None to mean 'default' where a default is available.

This bug also interacts nastily with required=True - if you're forcibly setting
the field to None, then at validation time, the None will fail validation
despite a perfectly valid default being available.

With this patch, when the field is set, the default is immediately applied.
This means any generation happens once, the getter always returns the same
value, and 'required' validation always respects the default.

Note: this breakage seems to be new since mongoengine 0.8.
@amcgregor
Copy link
Contributor

The difficulty here is having a non-None default does not imply, to me or anyone I've asked, that required is True, so None should be a perfectly valid value.

My suggestion here is that when you del the attribute the default should be set. This is far more semantic: setting the value to None, a value, is far different than "deleting the existing value and having no value", an opportune moment to apply a default. I do agree that the default should not be repeatedly applied, but explicitly set at the time the value is deleted, though.

Sure None classically represents "empty". In databases I typically use NULL to imply "not applicable" rather than "empty". (I.e. a string value that is empty is an empty string, not NULL.)

Edited to add: this follows with my concept of document instance initialization. If I do not pass the keyword value, the value is missing, not None, and thus the default is applied.

@nigelmcnie
Copy link
Contributor Author

You've said "None should be a perfectly valid value", and "setting the value to None, a value, is far different than "deleting the existing value and having no value"". I can see how that's one valid way of handling None, and I acknowledge that my patch handles None differently - namely by assuming that None is exactly equal to not specifying a value.

The reason I advance this patch, is because currently, mongoengine does not distinguish between None and "not set". Thus, I see this patch as merely continuing existing behaviour.

When I say it does not distinguish between None and "not set", what I mean is that when you set a value to None in mongoengine currently, it completely omits the field from being stored in mongo (unless a default is present). This means that when the record is read again, we have no way of telling whether someone saved a None or did not specify the field at all.

This script helps demonstrate this behaviour - after it's run, in mongo you end up with these records, and as you can see, there's no distinction between setting testfield to None and never setting it.

The problem with trying to treat None as different from "not set" come when you amend your class to add a new field that has a default (when documents already exist in mongo). Upon reading a document, mongoengine can't tell the difference between an explicitly set None value, and a missing value.

Another problem is when you come to read the attribute value. Currently, if you never set it, the attribute getter will return None for it. There's no way to tell this apart from "it was set explicitly to None".

It seems to me like there's only two ways this can work sensibly:

  1. None is treated equally to empty - which is current behaviour, and what this patch continues.
  2. None is treated differently to empty - in which case, mongoengine has to make this distinction at the mongo level, else all sorts of breakage will occur when defaults are applied (and possibly in other cases too).

Number 2 sounds like the greater of two evils - all existing databases using mongoengine would have to somehow have their explicit None values stored in mongo, and I can't think of a sane way you could do an upgrade that made this work.

@nigelmcnie
Copy link
Contributor Author

A clarification: above, I said this:

"when you set a value to None in mongoengine currently, it completely omits the field from being stored in mongo (unless a default is present)"

What actually happens depends on your mongoengine version.

In 0.7, when you set a field with a default to None explicitly, the default is applied.

In 0.8, when you set a field with a default to None explicitly, the default is not applied.

This is the behaviour change that lead me to write this patch in the first place, as we have been using mongoengine since 0.7.5, and thus have much code that relies on the "None means 'default'" behaviour - because that's what it always used to mean (and it still does when you get down to the mongo level in 0.8, as outlined in my previous comment).

@rozza
Copy link
Contributor

rozza commented Jun 3, 2013

There are two issues here.

Setting of a default value and setting a value to None.

I think when the callable is called eg it should probably set the value explicitly in _data - to stop it being called all the time.

I think if you set something to be None - it should be set to None - even if it wont validate later - I just dont know how much work that is...

@nigelmcnie
Copy link
Contributor Author

@rozza if you treat "setting to None" differently to "never setting it", how would you propose differentiating between the two at read time?

I can see how you could store them differently in mongo itself - an explicit None could be a null, and "not set" can just not exist. But when reading them out again.. how do you tell the code using mongoengine which is which?

And if you do decide to do it this way, there's all the existing code out there that assumes that None is the same as not set. I see that the version number is 0.X, so you probably still reserve the right to make changes like this - but I would suggest that if you do, the behaviour (and change) should be documented.

FYI, one example of where I've found the None == "not set" behaviour useful, is when developing webapps where a page manages the documents in a collection, with standard CRUD functionality. When the page is saved, we send back JSON representing the documents. The existing documents send back their IDs, and the new ones have no ID. Then on the server:

questions = json.loads(form.data['questions'])
survey.questions = []
for question in questions:
    survey.questions.append(m.Survey.Question(
        id         = question.get('id'),
        type       = question['type'],
        label      = question['label'],
        label_text = question['label_text'],
        options    = question['options'],
        required   = question['required'],
        show_other = question['show_other'],
    ))

In 0.7, this worked nicely, and was quite concise. If there's no ID, we technically did a "set ID to None", but then at save time, the default was applied.

And an FYI for others working around this issue, we changed the line above to this as a way to get back 0.7 behaviour:

        id         = question.get('id', m.Survey.Question.id.default()),

@honzajavorek
Copy link

This bug also interacts nastily with required=True - if you're forcibly setting the field to None, then at validation time, the None will fail validation despite a perfectly valid default being available.

Aargh, I just experienced this! It took me a lot of time to find this issue and understand what the behavior is. I consider it to be very confusing.

@rozza
Copy link
Contributor

rozza commented Jun 5, 2013

Interestingly I'm not seeing a difference in behavour for 0.7 and 0.8 with this test - eg p.created is generated each time when called if created is past in as None.

Recap

As theres lots of discussion in this thread - lets clarify the broken behaviours with a recap.

class Person(Document):
    created = DateTimeField(default=datetime.datetime.utcnow)

>>> p = Person(created=None)
>>> test = p.created
>>> test2 = p.created
>>> assert p._data['created'] == None 
>>> assert test == test2  # FAILS on all versions

However this succeeds - so this approach sounds broken to me:

    >>> p = Person()
    >>> test = p.created
    >>> test2 = p.created
    >>> assert p._data['created'] != None
    >>> assert test == test2

What is Broken?

The storage of data:

>>> Person.drop_collection()

>>> p = Person(created=None)
>>> test = p.created
>>> test2 = p.created
>>> p.save()

>>> assert p._data['created'] == None
>>> assert "created" in Person._get_collection().find_one()  #  Works in 0.7 and breaks in 0.8

Compare that to:

>>> Person.drop_collection()

>>> p = Person()  # No set created
>>> test = p.created
>>> test2 = p.created
>>> p.save()

>>> assert p._data['created'] == None
>>> assert "created" in Person._get_collection().find_one()  #  Works in both 0.7 and 0.8

Solutions?

  1. Fix p.created == p.created

  2. Choose:

    A) Revert back to 0.7 behaviour and if set to None set it as default2
    B) Make setting to None mean it and that the default isn't called

  3. Document the changes.

Have I missed anything? I thinking reverting back to 0.7 behaviour (A) is the best choice and your solution @nigelmcnie looks good. I wonder if we should pass the responsibility to the field itself on how to handle being set to None?

@rozza
Copy link
Contributor

rozza commented Jun 5, 2013

  • Pick a Solution
  • Update the Documentation
  • Add testcase for the required=True issue

@amcgregor
Copy link
Contributor

Things are even stranger than they seem. Apologies for the massive wall of text and code that follows.

Test Class

The following is my test case's model. The fields are named logically based on the field definition itself.

from random import randint
from mongoengine import Document, StringField

class Test(Document):
class Test(Document):
    meta = dict(allow_inheritance=False)

    srd = StringField(required=True, default='d')  # string required default
    src = StringField(required=True, default=lambda: str(randint(1000000, 9999999)))  # string required callable-default

    snd = StringField(required=False, default='d')  # string not-required default
    snc = StringField(required=False, default=lambda: str(randint(1000000, 9999999)))  # string not-required callable-default

Object Creation

This works as expected:

>>> a = Test()
>>> a._data  # Omission looks correct; callable defaults are immediately evaluated.
{'id': None, 'srd': u'd', 'src': u'6495825', 'snd': u'd', 'snc': u'3364130'}

>>> a.src == a.src and a.snc == a.snc  # Because these have values, the defaults shouldn't regenerate.
True

Explicit Reassignment

This illustrates one of the problems: if _data[key] is missing or None, the defaults are generated on attribute access but never recorded.

>>> a.src = None  # Explicitly set.
>>> a._data['src']  # None
>>> a.src == a.src  # Generates a new value on each access!
False
>>> a._data['src']  # None; new values never recorded

Additionally, even though attribute access generates a new value, if the field is required validation fails early as the value is None, even though for non-required fields it (almost) correctly assigns a generated default on save.

Explicit Delete

This is simply unfortunate. By defining this behaviour to mean "regenerate the default" we can still allow None as a valid value for non-requried fields that also happen to have defaults. If no default is available, this should remove the key from _data. (It doesn't do that either right now.)

>>> del a.snc  # Does nothing!
>>> 'snc' in a._data and bool(a._data['snc'])
True

Explicit Construction

The problem outlined above is consistent between reassignment and initial construction:

>>> b = Test(srd=None, src=None, snd=None, snc=None)
>>> b.src == b.src
False

Non-Required Defaults and Saving

This is where things go from "well, that's odd" into bizarro land. ;)

>>> c = Test(snc=None)  # Set the non-required defaulted field.
>>> c._data['snc']  # None
>>> c.save()
>>> c._data['snc']  # Still None!
>>> c.snc == c.snc  # As above, generates a new value…
False

Now, what was actually saved to the database?

>>> d = Test.objects.get(id=c.id)
>>> d._data['snc']  # NOT NONE!  WUT?!
'6047255'
>>> d.snc == d.snc
True
>>> c.snc == d.snc  # NOT THE SAME (c.snc regenerates on each access)
False

If a non-required field with a default value is set to None (either at construction time or through reassignment) and the record saved, a new default is generated and committed to the database but the original object is not updated. This can potentially happen several times during the lifetime of a record, ignoring that such a pattern of behaviour is not exactly efficient.

I've also verified that None is committed to the DB by pymongo as null, making it wholly separate from the key being missing.

@nigelmcnie
Copy link
Contributor Author

@amcgregor I ran through your testing, I noticed one difference from what you wrote. del a.snc doesn't quite do nothing, it will remove a value if set, such that the next access to the field will generate a new value.

>>> a.snc
'4564664'
>>> del a.snc
>>> a.snc
'6068006'

However, this doesn't matter too much - I think we (and anyone reading this pull request in future) have a reasonable handle on the problems now, which is good.

I have tried your series of tests with my patch applied, and it seems that the behaviour quirks and outright craziness you found are all resolved:

>>> a = Test()
>>> a._data
{'src': '2462786', 'snd': 'd', 'snc': '7965970', 'srd': 'd', 'id': None}
>>> a.src == a.src and a.snc == a.snc
True

# Generates new default, which is saved in _data
>>> a.src = None
>>> a._data['src']
'4215481'
>>> a.src == a.src
True
>>> a.src
'4215481'

# And again, just to check
>>> a.src = None
>>> a.src
'9044154'
>>> a.src
'9044154'

# Sensible even when assigned on init
>>> b = Test(srd=None, src=None, snd=None, snc=None)
>>> b.src == b.src
True

# Crazy behaviour now gone...
>>> c = m.Test(snc=None)
>>> c._data['snc']
'3219701'
>>> c.save()
<Test: Test object>
>>> c._data['snc']
'3219701'
>>> c.snc == c.snc
True

# ...including when read from database
>>> d = Test.objects.get(id=c.id)
>>> d._data['snc']
u'3219701'
>>> d.snc = d.snc
>>> d = m.Test.objects.get(id=c.id)
>>> d._data['snc']
u'3219701'
>>> d.snc == d.snc
True
>>> c.snc == d.snc
True

# 'del' behaviour. My patch, if merged, will mean we assign new default
>>> del c.snc
>>> c._data['snc']
'3731136'

So, regarding solutions...

I suggest that we pick (A) (of course I do ;)).

I also think there could be some merit in the idea of allowing the field itself to define how it wants to treat None - that has none of the backward compatibility problems that I could see by picking option B, and would allow @amcgregor to use None in the way she wants. A separate patch from this one could carefully implement it, taking care of all the behaviour changes that would result (e.g. I guess you might want accessors to throw a KeyError if there's no value stored in mongo, vs None if there's a null in mongo, so code using mongoengine can tell the difference).

rozza added a commit that referenced this pull request Jun 6, 2013
Now field defaults are king, unsetting or setting to None on a field
with a default means the default is reapplied.
@rozza rozza merged commit 4c9e907 into MongoEngine:master Jun 6, 2013
@rozza
Copy link
Contributor

rozza commented Jun 6, 2013

Ok - I've cleaned up a little and added those testcases!

Thanks for all your help @nigelmcnie and @amcgregor :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants