controllers: don’t pass start=0 to BaseRepository.get_changesets()
MercurialRepository.get_changesets() can fail if passing start=0 if the revision 0 is not in self.revisions. That can happen if revision 0 is not in the visible subset of the revisions in the repository. Before Kallithea changeset 7c43e15fb8bc7a73f17f577e59a4698589b6809d, it was working by chance because start=0 was treated like start=None in the relevant places (GitRepository.get_changesets still does that).
The intention of passing start=0 was seemingly to not limit the start. Therefore passing start=None (or nothing, as it’s the default value) should be correct.
I got the following traceback before this change:
Traceback (most recent call last): File "~/vcs/kallithea/kallithea/controllers/changelog.py", line 117, in index collection = c.db_repo_scm_instance.get_changesets(start=0, end=revision, File "~/vcs/kallithea/kallithea/lib/vcs/backends/hg/repository.py", line 529, in get_changesets start_pos = None if start is None else self.revisions.index(start_raw_id) ValueError: '4257f758b3eaacaebb6956d1aefc019afab956b8' is not in list
Running "kallithea-cli front-end-build" with npm 7.21.1 gave:
npm WARN old lockfile The package-lock.json file was created with an old version of npm, npm WARN old lockfile so supplemental metadata must be fetched from the registry. npm WARN old lockfile npm WARN old lockfile This is a one-time fix-up, please be patient...
files: fix raw download of repo files with names with unicode points above 256 in name
Raw download had apparently only been tested with non-ascii characters that were latin1. That was apparently a (too) simple case that worked without crashing.
Files with unicode code points above 256 in their name would fail to download, when Waitress failed like this, trying to get a real byte string by encoding WSGI headers to latin1: UnicodeEncodeError: 'latin-1' codec can't encode characters in position 84-85: ordinal not in range(256)
HTTP headers are of course byte strings on the network, but Python3 WSGI does unfortunately neither expose it as bytes nor as unicode strings to be encoded as utf-8. Instead, it uses unicode strings with byte values encoded as code points 0-255. That is achieved by decoding the utf-8 encoded bytes as latin1.
For raw downloads, the recommended download filename is provided in the Content-Disposition header. The problem is that it was provided as a real unicode string.
Fixed by applying the "proper" latin1-decoding of a utf8-encoding.
api: fix repo group permission check for repo creation
hg.create.repository is only controlling whether all users have write access at top level. For other repo locations, check for write permission to the repo group.
Note: This also covers "repo creation" by forking or by moving another repo.
front-end: use 'bin' path for node commands instead of '.bin'
license-checker is using relative paths for importing other modules - that worked fine when .bin/license-checker was a symlink, but not on filesystems without symlinks support:
git: make sure _check_url only accept the protocols accepted by is_valid_repo_uri
Avoid unnecessary flexibility, ambiguity, and complexity.
The file protocol was never used. But when cloning existing managed repos, is_valid_repo_url would be skipped and _check_url would be called with absolute paths.
repo_groups: fix select of parent group when adding repo group
h.select was passed a list of repo groups where group_id was integer, but parent_group in the request was a string - thus no match.
Do as in repos controller create_repository (and in error handling): leave it to htmlfill to patch up the generated HTML using defaults ... but make sure we always have a default.
repos: extra HTML escaping of repo and repo group names shown in DataTables
These names will already have been "slugged" and can thus not contain anything that can be used for any attack. But let's be explicitly safe and escape them anyway.
raw_name without escaping would cause XSS *if* it was possible to create unsafe repo names.
just_name must be escaped in order to make search work correctly - for example if searching for '<' ... *if* it was possible for names to contain that.
It is checked earlier that git_command is one of two string constants, and with py3, things are much simpler and we don't have to consider string coersion.
git: fix pull request deletion - don't crash on deleting refs to PR heads
The refs name was passed as unicode string, and that would cause failure like: File ".../site-packages/dulwich/repo.py", line 720, in __delitem__ if name.startswith(b"refs/") or name == b"HEAD": TypeError: startswith first arg must be str or a tuple of str, not bytes
Fixed by correctly passing the ref name as bytes, as we do when creating the PR refs.
Git is so different than Mercurial that it is easier to use separate tests. Some of the tests that are relevant for Mercurial doesn't apply to the Git support.
Also fix crash an odd corner case of creating PRs from a repo that has no branches at all.
Commit eca0cb56a822 attempted to fix a type inconsistency, which caused failure in the 'kallithea-api' tool when using '--save-config', but this unfortunately did not fix the problem completely.
Following error still appeared:
Traceback (most recent call last): File ".../bin/kallithea-api", line 33, in <module> sys.exit(load_entry_point('Kallithea', 'console_scripts', 'kallithea-api')()) File ".../bin/kallithea_api.py", line 84, in main 'apihost': args.apihost}) File ".../bin/base.py", line 104, in __init__ self.make_config(config) File ".../bin/base.py", line 132, in make_config ext_json.dump(config, f, indent=4) File "/usr/lib/python3.7/json/__init__.py", line 180, in dump fp.write(chunk) TypeError: a bytes-like object is required, not 'str'
File "kallithea/bin/base.py", line 133, in make_config: Function BinaryIO.write was called with the wrong arguments [wrong-arg-types] Expected: (self, s: Union[bytearray, bytes, memoryview]) Actually passed: (self, s: str)
... E File ".../kallithea/lib/utils.py", line 256, in is_valid_repo_uri E GitRepository._check_url(url) E File ".../kallithea/lib/vcs/backends/git/repository.py", line 183, in _check_url E passmgr.add_password(*authinfo) E File "/usr/lib/python3.7/urllib/request.py", line 848, in add_password E self.reduce_uri(u, default_port) for u in uri) E File "/usr/lib/python3.7/urllib/request.py", line 848, in <genexpr> E self.reduce_uri(u, default_port) for u in uri) E File "/usr/lib/python3.7/urllib/request.py", line 875, in reduce_uri E host, port = splitport(authority) E File "/usr/lib/python3.7/urllib/parse.py", line 1022, in splitport E match = _portprog.fullmatch(host) E TypeError: cannot use a string pattern on a bytes-like object
The authinfo tuple is obtained via mercurial.util.url, which unfortunately returns a tuple of bytes whereas urllib expects strings. It seems that mercurial internally has some more hacking around urllib as urllibcompat.py, which we don't use.
Therefore, transform the bytes into strings before passing authinfo to urllib. As the realm can be None, we need to check it specifically otherwise safe_str would return a string 'None'.
A basic test that catches the mentioned problem is added, even though it does not actually test that cloning with auth info will actually work (it only tests that it fails cleanly if the URI is not reachable).
Additionally, one use of 'test_uri' in hg/repository.py still needed to be transformed from bytes to string. For git this was already ok.
setup: exclude celery 4.4.4 which is broken due to unexpressed dependency
Celery 4.4.4 introduced the use of the 'future' package but forgot to express it in its dependencies.
We could add the missing dependency on 'future' in Kallithea, but since the problem is already fixed upstream shortly after 4.4.4 was released [1], we can be sure that the next release (presumably 4.4.5) will contain the fix.
The kallithea-config sources were removed in commit 213085032127e941a3bd93d0e510828a9d87bf32 but an entry point was still created by setup.py. Moreover, the ini file still referenced this, instead of kallithea-cli (config-create).
tests: actually test something useful in test_edit for gists (Issue #376)
Even though there was a test for editing gists, it did not catch the basic loading problem reported in issue #376. In fact, the test just loaded the edit page, but since no user was actually logged in, it just loaded the login screen. As a result, no real gist editing code was tested at all.
Instead, explicitly check the redirection to a login screen, then proceed with logging in and check that the edit page can be loaded.
Additionally, don't rely on the magic gist id '1' but create an actual gist first.
docs: clarify the installation steps and how things fit together
Hint that create-db user information is for creating the initial local user and that the database connection (which also might use some kind of user) is something else.
feeds: fix failure getting feed for Git repos (Issue #372)
GitChangeset.diff() did return ''.join(self.repository.get_diff(...)) even though get_diff returned a string. It worked, but was unnecessary and inefficient.
That fails in py3: get_diff returns bytes ... and iterating doesn't give characters but integers and we would get: TypeError: sequence item 0: expected a bytes-like object, int found
Fixed by dropping the unnecessary iteration and joining.