Skip to content

Issue #227: Allow charset encoding in content-type field #233

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 7 commits into from
Apr 4, 2013

Conversation

hazen
Copy link

@hazen hazen commented Apr 2, 2013

Closes Issue #227.

Checks in serialization/deserialization for "charset" encoding inside the character-set field. Also when HTTP object is read, this same check is made.

Problem from #227 below:

Sample error (from master):

johndoe_dict = johndoe.data
  File "build/bdist.macosx-10.7-intel/egg/riak/riak_object.py", line 80, in _get_data
  File "build/bdist.macosx-10.7-intel/egg/riak/riak_object.py", line 141, in _deserialize
TypeError: No decoder for content type "application/json; charset=UTF-8"
See also http://stackoverflow.com/questions/15489167/objects-stored-by-riak-java-client-end-up-as-raw-json-when-read-by-riak-python-c

The transports need to remove the charset sub-type parameter from the content-type in order for this to work correctly (and possibly set it on the payload or as metadata).

@@ -403,4 +406,20 @@ def reduce(self, *args):
mr.add(self.bucket.name, self.key)
return mr.reduce(*args)

def _parse_content_type(self, value, oldcharset):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this method on RiakObject? Since it has content_type and charset properties, shouldn't it be the responsibility of the transport to marshal those into the proper places? Protobuffs already separates the two.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The RiakObject knows what its content-type and charset is. I see what you are saying. We could maybe put some of this logic into the RiakClient, or in the transport via the client, but we have to do it before the encoder/decoder is identified.

A question I had is do we want to worry about the charset being in the content-type instead of in charset. If so, then I think it belongs in the RiakObject. If we don't want to mess with it and only want the proper encoder/decoder, then we could push it down as you suggest.

@ghost ghost assigned hazen Apr 3, 2013
@hazen
Copy link
Author

hazen commented Apr 3, 2013

@seancribbs Moved out of RiakObject and into RiakClient. This solution checks just before the encoder and decoder are looked up. Maybe should be in bucket?

When the transport put() or put_new() is called, the object is serialized via the RiakObject.encoded_data property. RiakObject._serialize() gets its encoder from the bucket or the client, but the transport does not really have any input at that stage.

Pros:

  • Pretty small change
  • Can create objects with content subtypes
    Cons:
  • Cannot create a special en/decoder for content subtypes
  • Logic is not in the transport

We could just toss the content subtype info when data is read in from Riak in the HTTP transport, if that is all you are looking for.

@ghost ghost assigned seancribbs Apr 3, 2013
@seancribbs
Copy link

@javajolt So, this resolves the issue of the encoder/decoder, but does not resolve the issue of assigning the charset on the RiakObject from the transport. I'm happy to have that be part of a separate stream of work, however.

Last concern: having a single Message on the client seems prone to race conditions. Could you instead make a method/function that has no side-effects (i.e. creates a new Message each time and throws it away), or put a critical section around the modification?

@hazen
Copy link
Author

hazen commented Apr 4, 2013

Yet another hair-brained approach. This time the RiakObject manages its own content-type and charset. This supports changing the charset in the content-type option used to construct a RiakObject and by bucket.new(). This may be too invasive so perhaps I should just keep it very simple and revert back to the HTTP-only encoding change.

Upon further reflection, I think I will look at a the HTTP-only approach in the morning.

@hazen
Copy link
Author

hazen commented Apr 4, 2013

Went back to the much simpler plan in which the only real change is in the HTTP transport layer, as we discussed. I had been confused because I thought the charset can ONLY be set via the content-type string, but realized that the RiakObject can just be manipulated independently.

@@ -542,8 +541,12 @@ def _build_put_headers(self, robj):
"""Build the headers for a POST/PUT request."""

# Construct the headers...
if robj.charset is not None:
content_type = "%s; charset=%s" % (robj.content_type, robj.charset)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that sub-type parameters are usually quoted, e.g. '%s; charset="%s"'

@seancribbs
Copy link

Good work @javajolt 👍 to merge

hazen pushed a commit that referenced this pull request Apr 4, 2013
Issue #227: Allow charset encoding in content-type field for HTTP
@hazen hazen merged commit 338bf9d into master Apr 4, 2013
@seancribbs seancribbs deleted the bch-content-charset branch April 4, 2013 18:14
@seancribbs seancribbs removed their assignment May 8, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants