Fix issue in umqtt when MQTT CONNECT packet is greater than 127 bytes #163

dmascord · 2017-03-30T10:13:42Z

using logic from http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.html#_Toc385349213

Please see sample code which works (with this code fix) connecting to Azure IoT:

from umqtt.simple import MQTTClient
import utime
import ubinascii
import uhashlib
import network

try:
    import usocket as socket
except:
    import socket
try:
    import ustruct as struct
except:
    import struct

# (date(2000, 1, 1) - date(1900, 1, 1)).days * 24*60*60
NTP_DELTA = 3155673600 
UTIME_DELTA = 946684800

host = "pool.ntp.org"
wifi_ssid = "<ssid>"
wifi_password = "<password>"
deviceId = '<Azure IoT device Id>'
url = '<Azure IoT Hub Hostname>.azure-devices.net'
deviceKey = '<Azure IoT device key>'

sha256_blocksize = 64;

def time():
    NTP_QUERY = bytearray(48)
    NTP_QUERY[0] = 0x1b
    addr = socket.getaddrinfo(host, 123)[0][-1]
    s = socket.socket(socket.AF_INET, socket.SOCK_DGRAM)
    s.settimeout(1)
    res = s.sendto(NTP_QUERY, addr)
    msg = s.recv(48)
    s.close()
    val = struct.unpack("!I", msg[40:44])[0]
    return val - NTP_DELTA

# There's currently no timezone support in MicroPython, so
# utime.localtime() will return UTC time (as if it was .gmtime())
def settime():
    t = time()
    import machine
    import utime
    tm = utime.localtime(t)
    tm = tm[0:3] + (0,) + tm[3:6] + (0,)
    machine.RTC().datetime(tm)
    #print(utime.time())
    print(utime.localtime())


def strxor(a, b):
	c = bytearray(len(a))
	for i in range(len(a)):
		c[i] = a[i] ^ b
	return c
	
def hmac_sha256(key, message):
	if (len(key) > sha256_blocksize):
		key = uhashlib.sha256(key) # keys longer than blocksize are shortened
	if (len(key) < sha256_blocksize):
		# keys shorter than blocksize are zero-padded (where ∥ is concatenation)
		key = key + (b'\x00' * (sha256_blocksize - len(key))) # Where * is repetition.
	
	o_key_pad = strxor(key, 0x5C) # Where blocksize is that of the underlying hash function
	i_key_pad = strxor(key, 0x36) # Where ⊕ is exclusive or (XOR)
	
	hasher = uhashlib.sha256(i_key_pad)
	hasher.update(message)
	internalhash = hasher.digest()
	
	hasher2 = uhashlib.sha256(o_key_pad)
	hasher2.update(internalhash)

	return hasher2.digest()
	
# sas generator from https://github.com/bechynsky/AzureIoTDeviceClientPY/blob/master/DeviceClient.py
def generate_sas_token(uri, deviceid, key, ttl):
	urlToSign = uri.replace("\'","%27").replace("/","%2F").replace("=","%3D").replace("+","%2B")
	sign_key = "%s\n%d" % (urlToSign, int(ttl))
	
	h = hmac_sha256(ubinascii.a2b_base64(key), message = "{0}\n{1}".format(urlToSign, ttl).encode('utf-8'))
	signature=ubinascii.b2a_base64(h).decode("utf-8").replace("\'","%27").replace("/","%2F").replace("=","%3D").replace("+","%2B").replace("\n","")
	
	return_sas_token =  "SharedAccessSignature sr={0}&sig={1}&se={2}".format(urlToSign, signature, ttl)
	return return_sas_token
	
# Test reception e.g. with:
# mosquitto_sub -t foo_topic

def do_connect():
    import network
    sta_if = network.WLAN(network.STA_IF)
    if not sta_if.isconnected():
        print('connecting to network...')
        sta_if.active(True)
        sta_if.connect(wifi_ssid, wifi_password)
        while not sta_if.isconnected():
            pass
    print('network config:', sta_if.ifconfig())

def main():
    do_connect() 
    sta_if = network.WLAN(network.STA_IF)
    while not sta_if.isconnected():
        utime.sleep(1)
    # ensure that epoch is correct
    settime()
    
    ttl = int(utime.time()) + 36000 + UTIME_DELTA
    p = generate_sas_token(url,deviceId,deviceKey,ttl)
    c = MQTTClient(client_id=deviceId, server=url, port=8883, ssl=True, keepalive=10000, user=(url + '/' + deviceId), password=generate_sas_token(url,deviceId,deviceKey,ttl))
    c.connect()
    c.publish(b"foo_topic", b"hello")
    c.disconnect()

if __name__ == "__main__":
    main()

…rom http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.html#_Toc385349213

pfalcon · 2017-03-30T12:28:52Z

This module is written to avoid useful allocations and especially reallocations (fragment memory, cause issue with low-heap systems), and your patch brings up bunch of that.

dmascord · 2017-03-30T22:32:04Z

Hi Paul,
Understood. How else can we address this issue? Can we allocate and array of size 13 first, write the correct size into it, and then throw away the unused 0-3 bytes depending on the total message size ? Otherwise, is there a way for us to know the total size to write, and then allocate 10-13 depending on that total size ?
Cheers,
Damien

dpgeorge · 2017-03-30T23:56:29Z

Note that this patch only works for messages up to 255 bytes length, if it's larger than this then msg[1] contains a truncated size to start with.

How else can we address this issue?

One way is to support only messages up to 255 bytes length and always use 2 bytes to store the remaining length. If the length is 127 or less then the first byte is just 0x80. This seems to be allowed by the spec.

dmascord · 2017-03-31T06:35:52Z

255 bytes is enough for my use case, but would be nice to fix this once and for all.
Is it possible to do the following:

Allocate 13 bytes for header.
As you are adding up the remaining length field, do the % 128 operation each time it passes 128
Write msg[1] then msg[2] as needed based on the size
Write the remaining 8 bytes of the header into remaining indexes
Write the body starting in position 11, 12, or 13 depending on the total size.
Sorry if I am missing something obvious, or misunderstanding the allocation strategies that micropython uses!

pfalcon · 2017-04-02T04:21:04Z

One way is to support only messages up to 255 bytes length and always use 2 bytes to store the remaining length. If the length is 127 or less then the first byte is just 0x80. This seems to be allowed by the spec.

This seems to be not disallowed by spec, but unfortunately, interoperability behavior may vary. See 4a5965b

pfalcon · 2017-04-02T04:28:10Z

@dmascord : Also please see patch above for an idea how to do it. General idea is that you should precalculate message size first and allocate needed bytearray once. The calculation can be conservatively higher (i.e. you may allocate buffer to be able to hold 2 bytes of msg length), but encoding should use minimal no. of bytes needed (and they you will use 2nd arg of .write() to send out exact no. of bytes needed).

dmascord · 2017-04-04T05:23:50Z

What about the latest committed approach ?

pfalcon · 2017-04-04T08:08:16Z

umqtt.simple/umqtt/simple.py

@@ -53,24 +54,45 @@ def set_last_will(self, topic, msg, retain=False, qos=0):

    def connect(self, clean_session=True):
        self.sock = socket.socket()
+        self.addr = socket.getaddrinfo(self.server, self.port)[0][-1]


I'm afraid these and above changes are unrelated to the topic of this PR and should be submitted separately.

pfalcon · 2017-04-04T08:14:43Z

umqtt.simple/umqtt/simple.py

+        if self.lw_topic:
+            sz += 2 + len(self.lw_topic) + 2 + len(self.lw_msg)
+
+        if sz < 128:


Nope, this ladder doesn't look good. The idea to save memory, any memory. Here you try to avoid reallocations of heap, but by storing 4 times more than needed in code constants area (which is usually still in heap). Optimization is a subtle matter.

As I mentioned, you should allocate bytearray once, conservatively large enough, but fill it in in dynamic way. If there's a static trailer, you can fill it in with slice assignment syntax, as long as on the left and right side there's the same amount of data. Then it's essentially a memcpy(), not insertion or deletion. E.g.:

foo[3:4] = b"fo"

You replace 2 bytes with another 2 bytes, that's ok.

OK... so, is there a way for me to see the memory usage in the different areas? So I can write the code changes, and test to see if I am doing it memory efficiently ?

Ie, If I allocate bytearray(b"\x10\0\0\0\0\0\0\0\0\0\0\0\0\0\0"), (ie, the largest amount), and then fill that dynamically (ie, don't create a new sz variable, or enc_byte), then once I know that I won't need the extra bytes, can I right trim() to remove the excess ?

I'm sorry if I am asking very basic questions, but I need a bit of guidance to get this patch acceptable for the environment that it is running on.

Why not reduce the bytearray to bytearray(b"x04MQTT\x04\x02\0\0"), and then write the size to the socket as it is being calculated ? That way we can just have the allocation of the int, and not have to do any reallocations or writing into the bytearray ? How expensive is writing to the socket (ie, the function call) vs having an extra byte that will be thrown away ?

OK... so, is there a way for me to see the memory usage in the different areas? So I can write the code changes, and test to see if I am doing it memory efficiently ?

There're some, but you don't need to go that deep. Speculative reasoning (well, reasoning at all) about the changes you make should be generally enough (assuming you know common-sense principles of memory allocation, etc.)

then once I know that I won't need the extra bytes, can I right trim() to remove the excess ?

You would need to do that if that was a large buffer. For something of dozen of bytes, no need to bother.

If you want a crash course into this stuff, http://sqlite.org/malloc.html may be a good one, especially section 4. But again, you don't need all that, except perhaps for these important conclusions:

In other words, unless all memory allocations are of exactly the same size (n=1) then the system needs access to more memory than it will ever use at one time. Furthermore, we see that the amount of surplus memory required grows rapidly as the ratio of largest to smallest allocations increases, and so there is strong incentive to keep all allocations as near to the same size as possible.

That shows the idea - instead of allocating something of 7, 8, 9, or 10 bytes at different times, it's better always allocate 10.

Why not reduce the bytearray to bytearray(b"x04MQTT\x04\x02\0\0"), and then write the size to the socket as it is being calculated ?

This shows exactly the kind of thinking we (MicroPython maintainers) would like to provoke in contributors to the project. On a first glance, viable and good idea, I'll leave it to you to prove it.

How expensive is writing to the socket (ie, the function call) vs having an extra byte that will be thrown away ?

Function calls are expensive, but the most pressing resource in uPy systems in uPy systems in memory. The heuristic of that is that you should avoid writing output byte by byte, but writing a million integers one by one is better than storing them in the array and writing at once (the latter won't work as there simply won't that much memory on most of uPy systems).

dmascord · 2017-04-09T15:05:19Z

OK, have reworked it again, but now I have a commit chain to clean up :)

pfalcon · 2017-04-13T08:28:15Z

@dmascord , thanks for updates, it may take me some time to look thru them, sorry.

dmascord · 2017-04-13T08:38:00Z

No worries, take your time, I am running my patches in my environment anyway.

martyvis · 2017-04-26T08:47:44Z

Just +1ing the need for this patch. Losant's broker uses a 24 byte client ID, 36 byte user ID and 64 byte password.

pfalcon · 2017-04-26T21:12:05Z

@martyvis : Do you +1 the need for this patch, or your having tested it?

martyvis · 2017-04-26T21:25:27Z

What's the easiest way to test? Currently running esp8266 1.8.7 stable release (esp8266-20170108-v1.8.7.bin) Can I get a bin file with the patch?

pfalcon · 2017-04-26T21:44:15Z

I'm afraid only you may know what is the easiest way for you to test it. But unless people who experience issues get together and work on resolving them (like - testing each other's patches), we'll have very slow progress ;-(.

martyvis · 2017-04-26T22:26:28Z

I am happy to test, but haven't explored building my own binary. Is it in the daily builds or can I get it from somewhere else?

dmascord · 2017-04-26T23:58:56Z

Hi martyvis,

I have just built the latest micropython binary based on the current git (as of 868453d3d83ea8d3ed161bd20f2f6cb59a396562), with the umqtt/simple.py "frozen".

http://tusker.org/firmware-combined-umqtt-20170426.bin

For your comfort level, at the moment, git diff shows the following:

diff --git a/esp8266/esp8266.ld b/esp8266/esp8266.ld
index 960c751..e5292ce 100644
--- a/esp8266/esp8266.ld
+++ b/esp8266/esp8266.ld
@@ -5,7 +5,7 @@ MEMORY
     dport0_0_seg : org = 0x3ff00000, len = 0x10
     dram0_0_seg :  org = 0x3ffe8000, len = 0x14000
     iram1_0_seg :  org = 0x40100000, len = 0x8000
-    irom0_0_seg :  org = 0x40209000, len = 0x87000
+    irom0_0_seg :  org = 0x40209000, len = 0xa7000
 }

 /* define common sections and symbols */
diff --git a/esp8266/modesp.c b/esp8266/modesp.c
index 432fd5a..ce75c1c 100644
--- a/esp8266/modesp.c
+++ b/esp8266/modesp.c
@@ -163,7 +163,7 @@ STATIC mp_obj_t esp_flash_user_start(void) {
     if (IS_OTA_FIRMWARE()) {
         return MP_OBJ_NEW_SMALL_INT(0x3c000 + 0x90000);
     } else {
-        return MP_OBJ_NEW_SMALL_INT(0x90000);
+        return MP_OBJ_NEW_SMALL_INT(0xb0000);
     }
 }
 STATIC MP_DEFINE_CONST_FUN_OBJ_0(esp_flash_user_start_obj, esp_flash_user_start);

martyvis · 2017-04-27T09:42:27Z

Yes this works nicely, Damien! Losant authenticates the connection and accepts the published message. ( Changing last 8 digits of strings to 8s)



>>> device_id='5901b3ce0d10990088888888'
>>> access_key=b'077e3118-1154-4691-901b-38fd88888888'
>>> access_secret=b'1cbf42ccc2b44b9df601b10278a224bb1ce898e951da9b9674fbb9b988888888'
>>> topic="losant/{0}/state".format(device_id)
>>> topic
'losant/5901b3ce0d10990088888888/state'
>>> c=MQTTClient(device_id,server,user=access_key,password=access_secret)
>>> c.connect()
0
>>> data['count']=1.23456
>>> msg
{'data': {'count': 1.23456}}
>>> c.publish(topic,ujson.dumps(msg))

And in Losant

Reported By Device at 04/27/2017 7:27:57 PM { "count": 1.23456 }

At least for me it would be good to see it merged with the master branch. (I actually ended up working around this by using Mosquitto as a bridge). Nothing is in production - I'm just trying to get my head around Losant and hoping that micropython proves to be a good way of prototyping IoT edge devices

pfalcon · 2017-04-27T14:48:39Z

@martyvis : Ok, thanks for testing!

@dmascord : For future cases, while you're waiting for review, feel free to squash incremental changes together and check compliance with https://github.com/micropython/micropython-lib/wiki/ContributorGuidelines. (No need to change anything now, I'll handle that myself.)

pfalcon · 2017-04-27T14:52:44Z

@dmascord : Great work, cleaned up and merged in 2164c88, thanks!

Fix issue when CONNECT packet is greater than 127 bytes using logic f…

16496fa

…rom http://docs.oasis-open.org/mqtt/mqtt/v3.1.1/os/mqtt-v3.1.1-os.html#_Toc385349213

dmascord changed the title ~~Fix issue when CONNECT packet is greater than 127 bytes~~ Fix issue in umqtt when MQTT CONNECT packet is greater than 127 bytes Mar 30, 2017

Reduce reassignments

1e166ee

pfalcon reviewed Apr 4, 2017

View reviewed changes

dmascord added 2 commits April 9, 2017 18:52

Copy technique from publish(...)

2bc4d20

Fix bugs, connects fine to mosquitto (again)

09d9cb9

martyvis mentioned this pull request Apr 26, 2017

umqtt connect function needs to support message length > 127 #171

Closed

pfalcon closed this Apr 27, 2017

DDDanny mentioned this pull request May 31, 2018

umqtt subscribe to topic with 23 bytes fails #284

Open

Fix issue in umqtt when MQTT CONNECT packet is greater than 127 bytes #163

Fix issue in umqtt when MQTT CONNECT packet is greater than 127 bytes #163

Uh oh!

Conversation

dmascord commented Mar 30, 2017 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pfalcon commented Mar 30, 2017

Uh oh!

dmascord commented Mar 30, 2017

Uh oh!

dpgeorge commented Mar 30, 2017

Uh oh!

dmascord commented Mar 31, 2017

Uh oh!

pfalcon commented Apr 2, 2017

Uh oh!

pfalcon commented Apr 2, 2017

Uh oh!

dmascord commented Apr 4, 2017

Uh oh!

pfalcon Apr 4, 2017

Choose a reason for hiding this comment

Uh oh!

pfalcon Apr 4, 2017

Choose a reason for hiding this comment

Uh oh!

dmascord Apr 9, 2017

Choose a reason for hiding this comment

Uh oh!

dmascord Apr 9, 2017

Choose a reason for hiding this comment

Uh oh!

pfalcon Apr 9, 2017

Choose a reason for hiding this comment

Uh oh!

pfalcon Apr 9, 2017

Choose a reason for hiding this comment

Uh oh!

dmascord commented Apr 9, 2017

Uh oh!

pfalcon commented Apr 13, 2017

Uh oh!

dmascord commented Apr 13, 2017

Uh oh!

martyvis commented Apr 26, 2017

Uh oh!

pfalcon commented Apr 26, 2017

Uh oh!

martyvis commented Apr 26, 2017

Uh oh!

pfalcon commented Apr 26, 2017

Uh oh!

martyvis commented Apr 26, 2017

Uh oh!

dmascord commented Apr 26, 2017

Uh oh!

martyvis commented Apr 27, 2017

Uh oh!

pfalcon commented Apr 27, 2017

Uh oh!

pfalcon commented Apr 27, 2017

Uh oh!

Uh oh!

dmascord commented Mar 30, 2017 •

edited

Loading