Files @ aa51aca7fd1a
Branch filter:

Location: kallithea/scripts/source_format.py

Valentin Kleibel
controller: Handle UnicodeDecodeError from webob decoding invalid URLs

webob will try to utf-8 decode all %-encoded bytes in URL-parameters, but will
not handle Unicode erors ... and neither did Kallithea. Visiting a URL like
http://localhost:5000/?%AD would thus give an unhandled exception showing
"Internal Server Error" to the user, and logging the full traceback and:

WebApp Error: UnicodeDecodeError: 'utf-8' codec can't decode byte 0xad in position 0: invalid start byte

This has been seen a lot recently from attackers probing for a php
vulnerability
https://devco.re/blog/2024/06/06/security-alert-cve-2024-4577-php-cgi-argument-injection-vulnerability-en/ .

Now handle these exceptions more nicely and reject with "400 Bad Request".
#!/usr/bin/env python3

# hg files 'set:!binary()&grep("^#!.*python")' 'set:**.py' | xargs scripts/source_format.py

import re
import sys


filenames = sys.argv[1:]

for fn in filenames:
    with open(fn) as f:
        org_content = f.read()

    mod_name = fn[:-3] if fn.endswith('.py') else fn
    mod_name = mod_name[:-9] if mod_name.endswith('/__init__') else mod_name
    mod_name = mod_name.replace('/', '.')
    def f(m):
        return '"""\n%s\n%s\n' % (mod_name, '~' * len(mod_name))
    new_content = re.sub(r'^"""\n(kallithea\..*\n)(~+\n)?', f, org_content, count=1, flags=re.MULTILINE)

    if new_content != org_content:
        with open(fn, 'w') as f:
            f.write(new_content)