Skip to content

Correctly block IPv6 domains in http.cookiejar #135768

Open
@LamentXU123

Description

@LamentXU123

Bug report

Bug description:

Now, let's open a flask app here:

from flask import Flask, make_response

app = Flask(__name__)

@app.route('/')
def set_cookie():

    response = make_response("Cookie has been set!")
    response.set_cookie(
        'foo',
        value='bar',   
    )

    return response

if __name__ == '__main__':
    app.run()

This web app set a cookie foo=bar. Then, we use http.cookiejar to process it:

import urllib.request
from http.cookiejar import CookieJar, DefaultCookiePolicy

policy = DefaultCookiePolicy(blocked_domains=['']) # no blockers
cj = CookieJar(policy)
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://127.0.0.1:5000")
for item in cj:
   print('Name = %s' % item.name)
   print('Value = %s' % item.value)

# this should return 

'''
Cookie has been set!
Name = foo
Value = bar
'''

blocked_policy = DefaultCookiePolicy(blocked_domains=["127.0.0.1"]) # block cookies
cj = CookieJar(blocked_policy)
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://127.0.0.1:5000")
for item in cj:
   print('Name = %s' % item.name)
   print('Value = %s' % item.value)
# this should return 

'''
Cookie has been set!
'''

Everything goes well right? BUT if we open the flask app in IPv6 host:

from flask import Flask, make_response

app = Flask(__name__)

@app.route('/')
def set_cookie():

    response = make_response("Cookie has been set!")
    response.set_cookie(
        'foo',
        value='bar',   
    )

    return response

if __name__ == '__main__':
    app.run(host='::1')

Then we use cookiejar to process:

import urllib.request
from http.cookiejar import CookieJar, DefaultCookiePolicy

policy = DefaultCookiePolicy(blocked_domains=['']) # no blockers
cj = CookieJar(policy)
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://[::1]:5000")
for item in cj:
   print('Name = %s' % item.name)
   print('Value = %s' % item.value)

# this should return 

'''
Cookie has been set!
Name = foo
Value = bar
'''

blocked_policy = DefaultCookiePolicy(blocked_domains=["[::1]"]) # block cookies
cj = CookieJar(blocked_policy)
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://[::1]:5000")
for item in cj:
   print('Name = %s' % item.name)
   print('Value = %s' % item.value)
# this should return 

'''
Cookie has been set!
Name = foo
Value = bar
'''

blocked_policy = DefaultCookiePolicy(blocked_domains=["::1"]) # block cookies
cj = CookieJar(blocked_policy)
opener = urllib.request.build_opener(urllib.request.HTTPCookieProcessor(cj))
r = opener.open("http://[::1]:5000")
for item in cj:
   print('Name = %s' % item.name)
   print('Value = %s' % item.value)
# this should return 

'''
Cookie has been set!
Name = foo
Value = bar
'''

NO COOKIES ARE BLOCKED.

I've found the problem in func http.cookiejar.DefaultCookiePolicy.is_blocked

    def is_blocked(self, domain):
        for blocked_domain in self._blocked_domains:
            if user_domain_match(domain, blocked_domain):
                return True
        return False

it use func user_domain_match, as below:

def user_domain_match(A, B):
    """For blocking/accepting domains.

    A and B may be host domain names or IP addresses.

    """
    A = A.lower()
    B = B.lower()
    if not (liberal_is_HDN(A) and liberal_is_HDN(B)):
        if A == B:
            # equal IP addresses
            return True
        return False
    initial_dot = B.startswith(".")
    if initial_dot and A.endswith(B):
        return True
    if not initial_dot and A == B:
        return True
    return False

Well, it seems like we are using liberal_is_HDN func to check if A and B are whether HDN or IP addr. the func is as below:

def liberal_is_HDN(text):
    """Return True if text is a sort-of-like a host domain name.

    For accepting/blocking domains.

    """
    if IPV4_RE.search(text):
        return False
    return True

Well, the IPV4_RE regex:

IPV4_RE = re.compile(r"\.\d+$", re.ASCII)

Now, since the program only check IPv4, our addr of IPv6 is forever a HDN, which is completely wrong. So the user_domain_match func forever returns False because it don't have an initial dot.

And instead of blocked_domains we've also got allow_domains which use the same logic and always returns False.

Why does it retr False? because the IPV6 addr will be added a .local on the end since its been treaded as a abnormal HDN. So when it comes to user_domain_match func, A is [::1].local and B is [::1]

That is, every IPv6 addr will be allowed in the DefaultCookiePolicy no matter what blocked_domains is set.

as the DefaultCookiePolicy mainly focused on privacy issues, this could cause some bypassing tho, so things is getting quiet serious here I think.

This issue is previously disscused in:

#135500
https://discuss.python.org/t/support-ipv6-in-http-cookiejar-when-deciding-whether-a-string-is-a-hdn-or-a-ip-addr/95439

And my previous solution is at #135502 which use ipaddress.ip_address() to identify it, which is NOT good because the IPv6 addr is wrapped in []. I am writing the tests script and completing the PR now.

@ericvsmith Thanks!

CPython versions tested on:

3.14

Operating systems tested on:

Windows

Linked PRs

Metadata

Metadata

Assignees

No one assigned

    Labels

    stdlibPython modules in the Lib dirtype-bugAn unexpected behavior, bug, or error

    Projects

    Status

    No status

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions