Changeset - 5553ecc962e0
[Not reviewed]
default
0 2 0
Mads Kiilerich (mads) - 6 years ago 2019-09-04 22:54:49
mads@kiilerich.com
Grafted from: 4bd3514004ab
scripts/i18n: introduce --merge-pot-file to control normalization

There are actually *two* kinds of normalization:

- in main branches, where we just want the translations - not any trivially
derived information or temporary or unstructured data.
- in i18n branches, where we want the trivially derived information, and also
want to preserve any other information there might be in the .po files.

If no pot file is specifed, do it as on the main branches and strip everything
but actual translations. This mode will primarily be used when grafting or
rebasing changes from i18n branches.

When a pot file is specified, run GNU msgmerge with it on the po files. The pot
file should ideally be fully updated (as done by extract_messages). That will
establish a common baseline, leaving only the essential changes as needing merge.

If merging from default branches to 18n, it is better to skip .po and .pot in
first 'hg merge' pass, while resolving everything else. Then, with the
uncommitted merge, run 'extract_messages', and then merge the .po files using
--merge-pot-file kallithea/i18n/kallithea.pot .

(Actually, these two different modes could perhaps be auto detected ...)
2 files changed with 41 insertions and 19 deletions:
0 comments (0 inline, 0 general)
scripts/i18n
Show inline comments
 
#!/usr/bin/env python3
 

	
 
# -*- coding: utf-8 -*-
 
# This program is free software: you can redistribute it and/or modify
 
# it under the terms of the GNU General Public License as published by
 
# the Free Software Foundation, either version 3 of the License, or
 
# (at your option) any later version.
 
#
 
# This program is distributed in the hope that it will be useful,
 
# but WITHOUT ANY WARRANTY; without even the implied warranty of
 
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 
# GNU General Public License for more details.
 
#
 
# You should have received a copy of the GNU General Public License
 
# along with this program.  If not, see <http://www.gnu.org/licenses/>.
 

	
 
import os
 
import shutil
 
import sys
 

	
 
import click
 

	
 
import i18n_utils
 

	
 

	
 
"""
 
Tool for maintenance of .po and .pot files
 

	
 
Normally, the i18n-related files contain for each translatable string a
 
reference to all the source code locations where this string is found. This
 
meta data is useful for translators to assess how strings are used, but is not
 
relevant for normal development nor for running Kallithea. Such meta data, or
 
derived data like kallithea.pot, will inherently be outdated, and create
 
unnecessary churn and repository growth, making it harder to spot actual and
 
important changes.
 
"""
 

	
 
@click.group()
 
@click.option('--debug/--no-debug', default=False)
 
def cli(debug):
 
    if (debug):
 
        i18n_utils.do_debug = True
 
    pass
 

	
 
@cli.command()
 
@click.argument('po_files', nargs=-1)
 
def normalize_po_files(po_files):
 
@click.option('--merge-pot-file', default=None)
 
@click.option('--strip/--no-strip', default=False)
 
def normalize_po_files(po_files, merge_pot_file, strip):
 
    """Normalize the specified .po and .pot files.
 

	
 
    Only actual translations and essential headers will be preserved.
 
    By default, only actual translations and essential headers will be
 
    preserved, just as we want it in the main branches with minimal noise.
 

	
 
    If a .pot file is specified, the po files will instead be updated by
 
    running GNU msgmerge with this .pot file, thus updating source code
 
    references and preserving comments and outdated translations.
 
    """
 
    for po_file in po_files:
 
        i18n_utils._normalize_po_file(po_file, strip=True)
 
        i18n_utils._normalize_po_file(po_file, merge_pot_file=merge_pot_file, strip=strip)
 

	
 
@cli.command()
 
@click.argument('local')
 
@click.argument('base')
 
@click.argument('other')
 
@click.argument('output')
 
def normalized_merge(local, base, other, output):
 
@click.option('--merge-pot-file', default=None)
 
@click.option('--strip/--no-strip', default=False)
 
def normalized_merge(local, base, other, output, merge_pot_file, strip):
 
    """Merge tool for use with 'hg merge/rebase/graft --tool'
 

	
 
    Merging i18n files with a standard merge tool could yield merge conflicts
 
    when one side is normalized and the other is not. In such cases, it may be
 
    better to first normalize all sides, then proceed with a standard merge.
 
    This command does exactly that, and can be used as 'merge-tool' in
 
    Mercurial commands like merge, rebase and graft.
 
    i18n files are partially manually editored original source of content, and
 
    partially automatically generated and updated. That create a lot of churn
 
    and often cause a lot of merge conflicts.
 

	
 
    To avoid that, this merge tool wrapper will normalize .po content before
 
    running the merge tool.
 

	
 
    By default, only actual translations and essential headers will be
 
    preserved, just as we want it in the main branches with minimal noise.
 

	
 
    If a .pot file is specified, the po files will instead be updated by
 
    running GNU msgmerge with this .pot file, thus updating source code
 
    references and preserving comments and outdated translations.
 

	
 
    Add the following to your user or repository-specific .hgrc file to use it:
 
        [merge-tools]
 
        i18n.executable = /path/to/scripts/i18n
 
        i18n.args = normalized-merge $local $base $other $output
 

	
 
    and then invoke merge/rebase/graft with the additional argument '--tool i18n'.
 
    """
 
    from mercurial import (
 
        context,
 
        simplemerge,
 
        ui as uimod,
 
    )
 

	
 
    print('i18n normalized-merge: merging file %s' % output)
 

	
 
    i18n_utils._normalize_po_file(local, strip=True)
 
    i18n_utils._normalize_po_file(base, strip=True)
 
    i18n_utils._normalize_po_file(other, strip=True)
 
    i18n_utils._normalize_po_file(output, strip=True)
 
    i18n_utils._normalize_po_file(local, merge_pot_file=merge_pot_file, strip=strip)
 
    i18n_utils._normalize_po_file(base, merge_pot_file=merge_pot_file, strip=strip)
 
    i18n_utils._normalize_po_file(other, merge_pot_file=merge_pot_file, strip=strip)
 
    i18n_utils._normalize_po_file(output, merge_pot_file=merge_pot_file, strip=strip)
 

	
 
    # simplemerge will write markers to 'local' if it fails, keep a copy without markers
 
    localkeep = local + '.keep'
 
    shutil.copyfile(local, localkeep)
 

	
 
    ret = simplemerge.simplemerge(uimod.ui.load(),
 
         context.arbitraryfilectx(local.encode('utf-8')),
 
         context.arbitraryfilectx(base.encode('utf-8')),
 
         context.arbitraryfilectx(other.encode('utf-8'))
 
    )
 
    shutil.copyfile(local, output)  # simplemerge wrote to local
 
    if ret:
 
        basekeep = base + '.keep'
 
        otherkeep = other + '.keep'
 
        shutil.copyfile(base, basekeep)
 
        shutil.copyfile(other, otherkeep)
 
        sys.stderr.write("Error: simple merge failed. Run a merge tool manually to resolve conflicts, then use 'hg resolve -m'.\n")
 
        sys.stderr.write('Resolve with e.g.: kdiff3 %s %s %s -o %s\n' % (basekeep, localkeep, otherkeep, output))
 
        sys.exit(ret)
 

	
 
    os.remove(localkeep)
 

	
 
@cli.command()
 
@click.argument('file1')
 
@click.argument('file2')
 
def normalized_diff(file1, file2):
 
@click.option('--merge-pot-file', default=None)
 
@click.option('--strip/--no-strip', default=False)
 
def normalized_diff(file1, file2, merge_pot_file, strip):
 
    """Compare two files while transparently normalizing them."""
 
    sys.exit(i18n_utils._normalized_diff(file1, file2, strip=True))
 
    sys.exit(i18n_utils._normalized_diff(file1, file2, merge_pot_file=merge_pot_file, strip=strip))
 

	
 
if __name__ == '__main__':
 
    cli()
scripts/i18n_utils.py
Show inline comments
 
@@ -114,72 +114,75 @@ def _normalize_po(raw_content):
 
    msgstr "Ingen"
 
    <BLANKLINE>
 
    line 2
 
    <BLANKLINE>
 
    msgid "Specialist"
 
    msgstr ""
 
    "Expert"
 
    <BLANKLINE>
 
    msgid "%d minute"
 
    msgid_plural "%d minutes"
 
    msgstr[0] "minut"
 
    msgstr[1] "minutter"
 
    msgstr[2] ""
 
    ^^^
 
    """
 
    header_start = raw_content.find('\nmsgid ""\n') + 1
 
    header_end = raw_content.find('\n\n', header_start) + 1 or len(raw_content)
 
    chunks = [
 
        header_comment_strip_re.sub('', raw_content[0:header_start])
 
            .strip(),
 
        '',
 
        header_normalize_re.sub('', raw_content[header_start:header_end])
 
            .strip(),
 
        '']  # preserve normalized header
 
    # all chunks are separated by empty line
 
    for raw_chunk in raw_content[header_end:].split('\n\n'):
 
        if '\n#, fuzzy' in raw_chunk:  # might be like "#, fuzzy, python-format"
 
            continue  # drop crazy auto translation that is worse than useless
 
        # strip all comment lines from chunk
 
        chunk_lines = [
 
            line
 
            for line in raw_chunk.splitlines()
 
            if line
 
            and not line.startswith('#')
 
        ]
 
        if not chunk_lines:
 
            continue
 
        # check lines starting from first msgstr, skip chunk if no translation lines
 
        msgstr_i = [i for i, line in enumerate(chunk_lines) if line.startswith('msgstr')]
 
        if (
 
            chunk_lines[0].startswith('msgid') and
 
            msgstr_i and
 
            all(line.endswith(' ""') for line in chunk_lines[msgstr_i[0]:])
 
        ):  # skip translation chunks that doesn't have any actual translations
 
            continue
 
        chunks.append('\n'.join(chunk_lines) + '\n')
 
    return '\n'.join(chunks)
 

	
 
def _normalize_po_file(po_file, strip=False):
 
def _normalize_po_file(po_file, merge_pot_file=None, strip=False):
 
    if merge_pot_file:
 
        runcmd(['msgmerge', '--width=76', '--backup=none', '--previous',
 
                '--update', po_file, '-q', merge_pot_file])
 
    if strip:
 
        po_tmp = po_file + '.tmp'
 
        with open(po_file, 'r') as src, open(po_tmp, 'w') as dest:
 
            raw_content = src.read()
 
            normalized_content = _normalize_po(raw_content)
 
            dest.write(normalized_content)
 
        os.rename(po_tmp, po_file)
 

	
 
def _normalized_diff(file1, file2, strip=False):
 
def _normalized_diff(file1, file2, merge_pot_file=None, strip=False):
 
    # Create temporary copies of both files
 
    temp1 = tempfile.NamedTemporaryFile(prefix=os.path.basename(file1))
 
    temp2 = tempfile.NamedTemporaryFile(prefix=os.path.basename(file2))
 
    debug('normalized_diff: %s -> %s / %s -> %s' % (file1, temp1.name, file2, temp2.name))
 
    shutil.copyfile(file1, temp1.name)
 
    shutil.copyfile(file2, temp2.name)
 
    # Normalize them in place
 
    _normalize_po_file(temp1.name, strip=strip)
 
    _normalize_po_file(temp2.name, strip=strip)
 
    _normalize_po_file(temp1.name, merge_pot_file=merge_pot_file, strip=strip)
 
    _normalize_po_file(temp2.name, merge_pot_file=merge_pot_file, strip=strip)
 
    # Now compare
 
    try:
 
        runcmd(['diff', '-u', temp1.name, temp2.name])
 
    except subprocess.CalledProcessError as e:
 
        return e.returncode
0 comments (0 inline, 0 general)