-
Notifications
You must be signed in to change notification settings - Fork 183
Issue #227: Allow charset encoding in content-type field #233
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@@ -403,4 +406,20 @@ def reduce(self, *args): | |||
mr.add(self.bucket.name, self.key) | |||
return mr.reduce(*args) | |||
|
|||
def _parse_content_type(self, value, oldcharset): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this method on RiakObject? Since it has content_type
and charset
properties, shouldn't it be the responsibility of the transport to marshal those into the proper places? Protobuffs already separates the two.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The RiakObject knows what its content-type and charset is. I see what you are saying. We could maybe put some of this logic into the RiakClient, or in the transport via the client, but we have to do it before the encoder/decoder is identified.
A question I had is do we want to worry about the charset being in the content-type instead of in charset. If so, then I think it belongs in the RiakObject. If we don't want to mess with it and only want the proper encoder/decoder, then we could push it down as you suggest.
@seancribbs Moved out of RiakObject and into RiakClient. This solution checks just before the encoder and decoder are looked up. Maybe should be in bucket? When the transport put() or put_new() is called, the object is serialized via the RiakObject.encoded_data property. RiakObject._serialize() gets its encoder from the bucket or the client, but the transport does not really have any input at that stage. Pros:
We could just toss the content subtype info when data is read in from Riak in the HTTP transport, if that is all you are looking for. |
@javajolt So, this resolves the issue of the encoder/decoder, but does not resolve the issue of assigning the charset on the RiakObject from the transport. I'm happy to have that be part of a separate stream of work, however. Last concern: having a single Message on the client seems prone to race conditions. Could you instead make a method/function that has no side-effects (i.e. creates a new Message each time and throws it away), or put a critical section around the modification? |
Yet another hair-brained approach. This time the RiakObject manages its own content-type and charset. This supports changing the charset in the content-type option used to construct a RiakObject and by bucket.new(). This may be too invasive so perhaps I should just keep it very simple and revert back to the HTTP-only encoding change. Upon further reflection, I think I will look at a the HTTP-only approach in the morning. |
Went back to the much simpler plan in which the only real change is in the HTTP transport layer, as we discussed. I had been confused because I thought the charset can ONLY be set via the content-type string, but realized that the RiakObject can just be manipulated independently. |
@@ -542,8 +541,12 @@ def _build_put_headers(self, robj): | |||
"""Build the headers for a POST/PUT request.""" | |||
|
|||
# Construct the headers... | |||
if robj.charset is not None: | |||
content_type = "%s; charset=%s" % (robj.content_type, robj.charset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that sub-type parameters are usually quoted, e.g. '%s; charset="%s"'
Good work @javajolt 👍 to merge |
Issue #227: Allow charset encoding in content-type field for HTTP
Closes Issue #227.
Checks in serialization/deserialization for "charset" encoding inside the character-set field. Also when HTTP object is read, this same check is made.
Problem from #227 below:
Sample error (from master):
johndoe_dict = johndoe.data
File "build/bdist.macosx-10.7-intel/egg/riak/riak_object.py", line 80, in _get_data
File "build/bdist.macosx-10.7-intel/egg/riak/riak_object.py", line 141, in _deserialize
TypeError: No decoder for content type "application/json; charset=UTF-8"
See also http://stackoverflow.com/questions/15489167/objects-stored-by-riak-java-client-end-up-as-raw-json-when-read-by-riak-python-c
The transports need to remove the charset sub-type parameter from the content-type in order for this to work correctly (and possibly set it on the payload or as metadata).