-
Notifications
You must be signed in to change notification settings - Fork 1k
Fix issue in umqtt when MQTT CONNECT packet is greater than 127 bytes #163
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
This module is written to avoid useful allocations and especially reallocations (fragment memory, cause issue with low-heap systems), and your patch brings up bunch of that. |
Hi Paul, |
Note that this patch only works for messages up to 255 bytes length, if it's larger than this then msg[1] contains a truncated size to start with.
One way is to support only messages up to 255 bytes length and always use 2 bytes to store the remaining length. If the length is 127 or less then the first byte is just 0x80. This seems to be allowed by the spec. |
255 bytes is enough for my use case, but would be nice to fix this once and for all.
|
This seems to be not disallowed by spec, but unfortunately, interoperability behavior may vary. See 4a5965b |
@dmascord : Also please see patch above for an idea how to do it. General idea is that you should precalculate message size first and allocate needed bytearray once. The calculation can be conservatively higher (i.e. you may allocate buffer to be able to hold 2 bytes of msg length), but encoding should use minimal no. of bytes needed (and they you will use 2nd arg of .write() to send out exact no. of bytes needed). |
What about the latest committed approach ? |
umqtt.simple/umqtt/simple.py
Outdated
@@ -53,24 +54,45 @@ def set_last_will(self, topic, msg, retain=False, qos=0): | |||
|
|||
def connect(self, clean_session=True): | |||
self.sock = socket.socket() | |||
self.addr = socket.getaddrinfo(self.server, self.port)[0][-1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm afraid these and above changes are unrelated to the topic of this PR and should be submitted separately.
umqtt.simple/umqtt/simple.py
Outdated
if self.lw_topic: | ||
sz += 2 + len(self.lw_topic) + 2 + len(self.lw_msg) | ||
|
||
if sz < 128: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nope, this ladder doesn't look good. The idea to save memory, any memory. Here you try to avoid reallocations of heap, but by storing 4 times more than needed in code constants area (which is usually still in heap). Optimization is a subtle matter.
As I mentioned, you should allocate bytearray once, conservatively large enough, but fill it in in dynamic way. If there's a static trailer, you can fill it in with slice assignment syntax, as long as on the left and right side there's the same amount of data. Then it's essentially a memcpy(), not insertion or deletion. E.g.:
foo[3:4] = b"fo"
You replace 2 bytes with another 2 bytes, that's ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK... so, is there a way for me to see the memory usage in the different areas? So I can write the code changes, and test to see if I am doing it memory efficiently ?
Ie, If I allocate bytearray(b"\x10\0\0\0\0\0\0\0\0\0\0\0\0\0\0"), (ie, the largest amount), and then fill that dynamically (ie, don't create a new sz variable, or enc_byte), then once I know that I won't need the extra bytes, can I right trim() to remove the excess ?
I'm sorry if I am asking very basic questions, but I need a bit of guidance to get this patch acceptable for the environment that it is running on.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not reduce the bytearray to bytearray(b"x04MQTT\x04\x02\0\0"), and then write the size to the socket as it is being calculated ? That way we can just have the allocation of the int, and not have to do any reallocations or writing into the bytearray ? How expensive is writing to the socket (ie, the function call) vs having an extra byte that will be thrown away ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK... so, is there a way for me to see the memory usage in the different areas? So I can write the code changes, and test to see if I am doing it memory efficiently ?
There're some, but you don't need to go that deep. Speculative reasoning (well, reasoning at all) about the changes you make should be generally enough (assuming you know common-sense principles of memory allocation, etc.)
then once I know that I won't need the extra bytes, can I right trim() to remove the excess ?
You would need to do that if that was a large buffer. For something of dozen of bytes, no need to bother.
If you want a crash course into this stuff, http://sqlite.org/malloc.html may be a good one, especially section 4. But again, you don't need all that, except perhaps for these important conclusions:
In other words, unless all memory allocations are of exactly the same size (n=1) then the system needs access to more memory than it will ever use at one time. Furthermore, we see that the amount of surplus memory required grows rapidly as the ratio of largest to smallest allocations increases, and so there is strong incentive to keep all allocations as near to the same size as possible.
That shows the idea - instead of allocating something of 7, 8, 9, or 10 bytes at different times, it's better always allocate 10.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not reduce the bytearray to bytearray(b"x04MQTT\x04\x02\0\0"), and then write the size to the socket as it is being calculated ?
This shows exactly the kind of thinking we (MicroPython maintainers) would like to provoke in contributors to the project. On a first glance, viable and good idea, I'll leave it to you to prove it.
How expensive is writing to the socket (ie, the function call) vs having an extra byte that will be thrown away ?
Function calls are expensive, but the most pressing resource in uPy systems in uPy systems in memory. The heuristic of that is that you should avoid writing output byte by byte, but writing a million integers one by one is better than storing them in the array and writing at once (the latter won't work as there simply won't that much memory on most of uPy systems).
OK, have reworked it again, but now I have a commit chain to clean up :) |
@dmascord , thanks for updates, it may take me some time to look thru them, sorry. |
No worries, take your time, I am running my patches in my environment anyway. |
Just +1ing the need for this patch. Losant's broker uses a 24 byte client ID, 36 byte user ID and 64 byte password. |
@martyvis : Do you +1 the need for this patch, or your having tested it? |
What's the easiest way to test? Currently running esp8266 1.8.7 stable release (esp8266-20170108-v1.8.7.bin) Can I get a bin file with the patch? |
I'm afraid only you may know what is the easiest way for you to test it. But unless people who experience issues get together and work on resolving them (like - testing each other's patches), we'll have very slow progress ;-(. |
I am happy to test, but haven't explored building my own binary. Is it in the daily builds or can I get it from somewhere else? |
Hi martyvis, I have just built the latest micropython binary based on the current git (as of 868453d3d83ea8d3ed161bd20f2f6cb59a396562), with the umqtt/simple.py "frozen". http://tusker.org/firmware-combined-umqtt-20170426.bin For your comfort level, at the moment, git diff shows the following:
|
Yes this works nicely, Damien! Losant authenticates the connection and accepts the published message. ( Changing last 8 digits of strings to 8s)
And in Losant
At least for me it would be good to see it merged with the master branch. (I actually ended up working around this by using Mosquitto as a bridge). Nothing is in production - I'm just trying to get my head around Losant and hoping that micropython proves to be a good way of prototyping IoT edge devices |
@martyvis : Ok, thanks for testing! @dmascord : For future cases, while you're waiting for review, feel free to squash incremental changes together and check compliance with https://github.com/micropython/micropython-lib/wiki/ContributorGuidelines. (No need to change anything now, I'll handle that myself.) |
using logic from http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.html#_Toc385349213
Please see sample code which works (with this code fix) connecting to Azure IoT: