Kaydet (Commit) 55ddbfc6 authored tarafından Khaled Hosny's avatar Khaled Hosny Kaydeden (comit) Caolán McNamara

tdf#106755: Fix script type for combining marks

We are classifying characters in the “Combining Diacritical Marks”
Unicode block with ScriptType::LATIN, but these are combining marks and
can combine with any script and should have been ScriptType::WEAK. Just
removing them from the range in scriptList does the trick as we will
fallback to getting the script classification based on the Unicode
script property.

Change-Id: I3577f4b03360a1c8e094a207f01b6bbb6abbaf30
Reviewed-on: https://gerrit.libreoffice.org/35811Tested-by: 's avatarJenkins <ci@libreoffice.org>
Reviewed-by: 's avatarCaolán McNamara <caolanm@redhat.com>
Tested-by: 's avatarCaolán McNamara <caolanm@redhat.com>
üst 79982456
......@@ -759,6 +759,10 @@ void TestBreakIterator::testWeak()
{
0x0001, 0x0002,
0x0020, 0x00A0,
0x0300, 0x036F, //Combining Diacritical Marks
0x1AB0, 0x1AFF, //Combining Diacritical Marks Extended
0x1DC0, 0x1DFF, //Combining Diacritical Marks Supplement
0x20D0, 0x20FF, //Combining Diacritical Marks for Symbols
0x2150, 0x215F, //Number Forms, fractions
0x2160, 0x2180, //Number Forms, roman numerals
0x2200, 0x22FF, //Mathematical Operators
......
......@@ -442,7 +442,8 @@ struct UBlock2Script
static const UBlock2Script scriptList[] =
{
{UBLOCK_NO_BLOCK, UBLOCK_NO_BLOCK, ScriptType::WEAK},
{UBLOCK_BASIC_LATIN, UBLOCK_ARMENIAN, ScriptType::LATIN},
{UBLOCK_BASIC_LATIN, UBLOCK_SPACING_MODIFIER_LETTERS, ScriptType::LATIN},
{UBLOCK_GREEK, UBLOCK_ARMENIAN, ScriptType::LATIN},
{UBLOCK_HEBREW, UBLOCK_MYANMAR, ScriptType::COMPLEX},
{UBLOCK_GEORGIAN, UBLOCK_GEORGIAN, ScriptType::LATIN},
{UBLOCK_HANGUL_JAMO, UBLOCK_HANGUL_JAMO, ScriptType::ASIAN},
......
Markdown is supported
0% or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment