kallithea Files · docs/usage/statistics.rst

Files @ 7b0aafc6b7ca

Branch filter:

Location: kallithea/docs/usage/statistics.rst

7b0aafc6b7ca 1.2 KiB text/prs.fallenstein.rst Show Annotation Show as Raw Download as Raw

mads

mysql: create database with explicit UTF-8 character set and collation

A spin-off from Issue #378.

In MySQL, the character sets for server, database, tables, and connection are
set independently. Ideally, they should all use UTF-8, but systems tend to use
latin1 as default encoding, for example:

character_set_server = latin1
collation_server = latin1_swedish_ci

Databases would thus by default be created as:

character_set_database = latin1
collation_database = latin1_swedish_ci

To make things work consistently anyway, we have so far specified the utf8mb4
charset explicitly when creating tables, but there is no corresponding simple
option for specifying the collation for tables. We need a better solution.

If necessary and possible, the system charset and collation should be set to
UTF-8. Some systems already have these defaults default - see
https://mariadb.com/kb/en/differences-in-mariadb-in-debian-and-ubuntu/ .
The defaults can be changed as described on
https://mariadb.com/kb/en/setting-character-sets-and-collations/#example-changing-the-default-character-set-to-utf-8
to give something like:

character_set_server = utf8mb4
collation_server = utf8mb4_unicode_ci

Databases will then by default be created as:

character_set_database = utf8mb4
collation_database = utf8mb4_unicode_ci

and there is thus no longer any need for specifying the charset when creating
tables.

To be reasonably resilient across all systems without relying on system
defaults, we will now start specifying the charset and collation when creating
the database, but drop the specification of charset when creating tables.

For existing databases, it is recommended to change encoding (and collation) by
altering the database and each of the tables inside it as described on
https://stackoverflow.com/questions/6115612/how-to-convert-an-entire-mysql-database-characterset-and-collation-to-utf-8 .

Note the use of utf8mb4_unicode_ci instead of utf8mb4_general_ci - see
https://stackoverflow.com/questions/766809/whats-the-difference-between-utf8-general-ci-and-utf8-unicode-ci .

For investigation of these issues, consider the output from:
show variables like '%char%';
show variables like '%collation%';
show create database `KALLITHEA_DB_NAME`;
SELECT * FROM information_schema.SCHEMATA WHERE schema_name = "KALLITHEA_DB_NAME";
SELECT * FROM information_schema.TABLES T, information_schema.COLLATION_CHARACTER_SET_APPLICABILITY CCSA WHERE CCSA.collation_name = T.table_collation AND T.table_schema = "KALLITHEA_DB_NAME";

.. _statistics:

=====================
Repository statistics
=====================

Kallithea has a *repository statistics* feature, disabled by default. When
enabled, the amount of commits per committer is visualized in a timeline. This
feature can be enabled using the ``Enable statistics`` checkbox on the
repository ``Settings`` page.

The statistics system makes heavy demands on the server resources, so
in order to keep a balance between usability and performance, statistics are
cached inside the database and gathered incrementally.

When Celery is disabled:

  On each first visit to the summary page a set of 250 commits are parsed and
  added to the statistics cache. This incremental gathering also happens on each
  visit to the statistics page, until all commits are fetched.

  Statistics are kept cached until additional commits are added to the
  repository. In such a case Kallithea will only fetch the new commits when
  updating its statistics cache.

When Celery is enabled:

  On the first visit to the summary page, Kallithea will create tasks that will
  execute on Celery workers. These tasks will gather all of the statistics until
  all commits are parsed. Each task parses 250 commits, then launches a new
  task.