Skip to content

Commit 52dc3d1

Browse files
authored
Merge branch 'unicode-org:main' into main
2 parents 873db46 + 500bcd2 commit 52dc3d1

37 files changed

+1487
-31
lines changed

documents/how_to_add_new_language.md

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,9 @@ The following steps with help you identify files that need to be added or change
1010
NOTE: Take a look at [PR #40](https://github.com/unicode-org/inflection/pull/40) and [PR #111](https://github.com/unicode-org/inflection/pull/111) for example on how to add initial language support based on dictionary lookup only.
1111
In general, to bootstrap your progress look for grammatically similar language that's already supported, e.g. if you are adding Serbian look for existing Russian implementation.
1212
This will help you find most of the files you need to add/change and will speed up implementation of the rules and lexicons.
13+
We recommend you spend around a week researching the language and all the different components of the language before even beginning to modify and add the files below. Look at all the files in the project such as tokenizers, configuration files, grammar files, and different lookup functions to see what you need. This will save you a lot of time in the end. We highly suggest you stray away from hardcoded logic and rely on the Dictionary Lookup. Look at all the grammemes, tokenizer logic, and multi-word phrase handling.
14+
15+
Before you add new language support, go to the README.md in the inflection subfolder (inflection/inflection/README.md), build the project, and make sure all the tests run on your computer.
1316

1417
## Mark your language as supported
1518
* UPDATE: inflection/src/inflection/util/LocaleUtils.hpp
@@ -29,13 +32,13 @@ TODO: We need to expand what each of these do.
2932
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer.hpp
3033
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer.cpp
3134
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer_*Xx*DisplayFunction.hpp
32-
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer_*Xx*DisplayFunction.hpp
35+
* ADD: inflection/src/inflection/grammar/synthesis/*Xx*GrammarSynthesizer_*Xx*DisplayFunction.cpp
3336
* UPDATE: inflection/src/inflection/grammar/synthesis/GrammarSynthesizerFactory.cpp
3437
* UPDATE: inflection/src/inflection/grammar/synthesis/fwd.hpp
3538

3639
## Add language specific properties for lists, quantities and related topics
3740
* ADD: inflection/src/inflection/dialog/language/*Xx*CommonConceptFactory.hpp
38-
* ADD: inflection/src/inflection/dialog/language/*Xx*CommonConceptFactory.hpp
41+
* ADD: inflection/src/inflection/dialog/language/*Xx*CommonConceptFactory.cpp
3942
* UPDATE: inflection/src/inflection/dialog/language/fwd.hpp
4043

4144
## Define and create lexion

inflection/resources/org/unicode/inflection/dictionary/.gitattributes

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ dictionary_he.lst filter=lfs diff=lfs merge=lfs -text
88
dictionary_hi.lst filter=lfs diff=lfs merge=lfs -text
99
dictionary_it.lst filter=lfs diff=lfs merge=lfs -text
1010
dictionary_ko.lst filter=lfs diff=lfs merge=lfs -text
11+
dictionary_ml.lst filter=lfs diff=lfs merge=lfs -text
1112
dictionary_nb.lst filter=lfs diff=lfs merge=lfs -text
1213
dictionary_nl.lst filter=lfs diff=lfs merge=lfs -text
1314
dictionary_pt.lst filter=lfs diff=lfs merge=lfs -text
@@ -23,6 +24,7 @@ inflectional_fr.xml filter=lfs diff=lfs merge=lfs -text
2324
inflectional_he.xml filter=lfs diff=lfs merge=lfs -text
2425
inflectional_hi.xml filter=lfs diff=lfs merge=lfs -text
2526
inflectional_it.xml filter=lfs diff=lfs merge=lfs -text
27+
inflectional_ml.xml filter=lfs diff=lfs merge=lfs -text
2628
inflectional_nb.xml filter=lfs diff=lfs merge=lfs -text
2729
inflectional_nl.xml filter=lfs diff=lfs merge=lfs -text
2830
inflectional_pt.xml filter=lfs diff=lfs merge=lfs -text
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:6bda9371a2aa17c08328381e678b77e769269f4ee74749dd4f9e0bd5890cf59c
3+
size 53958746
Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
version https://git-lfs.github.com/spec/v1
2+
oid sha256:1868dab352ff2648c2ba495bc08a3877409eadf177f573817fd03ae07174b12f
3+
size 613479

inflection/resources/org/unicode/inflection/features/grammar.xml

Lines changed: 91 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1624,6 +1624,97 @@
16241624
</category>
16251625
</grammar>
16261626
</language>
1627+
<language id="ml">
1628+
<grammar>
1629+
<category name="case">
1630+
<grammeme name="nominative"/> <!-- no explicit marker; subject form -->
1631+
<grammeme name="accusative"/> <!-- -യെ, -ായെ, marks direct object -->
1632+
<grammeme name="genitive"/> <!-- -ന്റെ, -യുടെ (possessive) -->
1633+
<grammeme name="dative"/> <!-- -ക്ക്, -ന് (to/for) -->
1634+
<grammeme name="instrumental"/> <!-- -ആല് (by means of) -->
1635+
<grammeme name="locative"/> <!-- -യില് (in/at) -->
1636+
<grammeme name="sociative"/> <!-- -ഓടു് (along with) -->
1637+
</category>
1638+
<category name="number">
1639+
<grammeme name="singular"/>
1640+
<grammeme name="plural"/>
1641+
</category>
1642+
<category name="animacy">
1643+
<grammeme name="animate"/>
1644+
<grammeme name="inanimate"/>
1645+
<grammeme name="human"/>
1646+
</category>
1647+
<category name="person">
1648+
<restrictions>
1649+
<restriction name="pos" value="pronoun"/>
1650+
<restriction name="pos" value="verb"/>
1651+
</restrictions>
1652+
<grammeme name="first"/>
1653+
<grammeme name="second"/>
1654+
<grammeme name="third"/>
1655+
</category>
1656+
<category name="gender">
1657+
<restrictions>
1658+
<restriction name="pos" value="pronoun"/>
1659+
<restriction name="pos" value="verb"/>
1660+
<restriction name="pos" value="noun"/>
1661+
</restrictions>
1662+
<grammeme name="masculine"/>
1663+
<grammeme name="feminine"/>
1664+
<grammeme name="neuter"/> <!-- e.g. for objects or animals -->
1665+
</category>
1666+
<category name="tense">
1667+
<restrictions>
1668+
<restriction name="pos" value="verb"/>
1669+
</restrictions>
1670+
<grammeme name="past"/>
1671+
<grammeme name="present"/>
1672+
<grammeme name="future"/>
1673+
</category>
1674+
<category name="form">
1675+
<grammeme name="infinitive"/>
1676+
<grammeme name="participle"/>
1677+
</category>
1678+
<category name="determination">
1679+
<restrictions>
1680+
<restriction name="pos" value="pronoun"/>
1681+
<restriction name="case" value="genitive"/>
1682+
</restrictions>
1683+
<grammeme name="independent"/> <!-- e.g. mine -->
1684+
<grammeme name="dependent"/> <!-- e.g. my {object} -->
1685+
</category>
1686+
<category name="mood">
1687+
<restrictions>
1688+
<restriction name="pos" value="verb"/>
1689+
</restrictions>
1690+
<grammeme name="indicative"/>
1691+
<grammeme name="imperative"/>
1692+
<grammeme name="subjunctive"/>
1693+
</category>
1694+
<category name="pronounType">
1695+
<restrictions>
1696+
<restriction name="pos" value="pronoun"/>
1697+
</restrictions>
1698+
<grammeme name="personal"/> <!-- regular pronouns like ഞാൻ, നീ -->
1699+
<grammeme name="reflexive"/> <!-- e.g. താൻ, തങ്ങൾ -->
1700+
</category>
1701+
<category name="formality">
1702+
<restrictions>
1703+
<restriction name="pos" value="verb"/>
1704+
<restriction name="pos" value="pronoun"/>
1705+
</restrictions>
1706+
<grammeme name="formal"/>
1707+
<grammeme name="informal"/>
1708+
</category>
1709+
<category name="clusivity">
1710+
<restrictions>
1711+
<restriction name="pos" value="pronoun"/>
1712+
</restrictions>
1713+
<grammeme name="inclusive"/>
1714+
<grammeme name="exclusive"/>
1715+
</category>
1716+
</grammar>
1717+
</language>
16271718
<language id="ms">
16281719
<grammar>
16291720
<category name="clusivity">
Lines changed: 75 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,75 @@
1+
അവൻ,third,singular,nominative,masculine
2+
അവൾ,third,singular,nominative,feminine
3+
അത്,third,singular,nominative,neuter
4+
അവനെ,third,singular,accusative,masculine
5+
അവന്റെ,third,singular,genitive,masculine,determination=dependent
6+
അവന്റെത്,third,singular,genitive,masculine,determination=independent
7+
അവളെ,third,singular,accusative,feminine
8+
അവളുടെ,third,singular,genitive,feminine,determination=dependent
9+
അവളുടേതു്,third,singular,genitive,feminine,determination=independent
10+
അതിനെ,third,singular,accusative,neuter
11+
അതിന്റെ,third,singular,genitive,neuter,determination=dependent
12+
അതിന്റേതു്,third,singular,genitive,neuter,determination=independent
13+
അവനിൽ,third,singular,locative,masculine
14+
അവനാൽ,third,singular,instrumental,masculine
15+
അവനോടു്,third,singular,sociative,masculine
16+
അവളിൽ,third,singular,locative,feminine
17+
അവളാൽ,third,singular,instrumental,feminine
18+
അവളോടു്,third,singular,sociative,feminine
19+
അതിൽ,third,singular,locative,neuter
20+
അതാൽ,third,singular,instrumental,neuter
21+
അതോടു്,third,singular,sociative,neuter
22+
അവർ,third,plural,nominative
23+
അവരെ,third,plural,accusative
24+
അവരുടെ,third,plural,genitive,determination=dependent
25+
അവരുടേതു്,third,plural,genitive,determination=independent
26+
അവരിൽ,third,plural,locative
27+
അവരാൽ,third,plural,instrumental
28+
അവരോടു്,third,plural,sociative
29+
നീ,second,singular,nominative,informal
30+
താങ്കൾ,second,singular,nominative,formal
31+
നിനക്ക്,second,singular,dative,informal
32+
താങ്കൾക്ക്,second,singular,dative,formal
33+
നിനെ,second,singular,accusative,informal
34+
താങ്കളെ,second,singular,accusative,formal
35+
നിന്റെ,second,singular,genitive,informal,determination=dependent
36+
നിന്റേതു്,second,singular,genitive,informal,determination=independent
37+
താങ്കളുടെ,second,singular,genitive,formal,determination=dependent
38+
താങ്കളുടേതു്,second,singular,genitive,formal,determination=independent
39+
നിനിൽ,second,singular,locative,informal
40+
നിനാൽ,second,singular,instrumental,informal
41+
നിനോടു്,second,singular,sociative,informal
42+
താങ്കളിൽ,second,singular,locative,formal
43+
താങ്കളാൽ,second,singular,instrumental,formal
44+
താങ്കളോടു്,second,singular,sociative,formal
45+
നിങ്ങൾ,second,plural,nominative,formal
46+
നിങ്ങളെ,second,plural,accusative,formal
47+
നിങ്ങൾക്ക്,second,plural,dative,formal
48+
നിങ്ങളുടെ,second,plural,genitive,formal,determination=dependent
49+
നിങ്ങളുടേതു്,second,plural,genitive,formal,determination=independent
50+
നിങ്ങളിൽ,second,plural,locative,formal
51+
നിങ്ങളാൽ,second,plural,instrumental,formal
52+
നിങ്ങളോടു്,second,plural,sociative,formal
53+
ഞാൻ,first,singular,nominative,exclusive
54+
എനിക്ക്,first,singular,dative
55+
നമുക്ക്,first,plural,dative,inclusive
56+
എന്നെ,first,singular,accusative,exclusive
57+
നമ്മെ,first,plural,accusative,inclusive
58+
എന്റെ,first,singular,genitive,determination=dependent,exclusive
59+
എന്റേത്,first,singular,genitive,determination=independent,exclusive
60+
എന്നിൽ,first,singular,locative
61+
എന്നാൽ,first,singular,instrumental
62+
എന്നോടു്,first,singular,sociative
63+
ഞങ്ങൾ,first,plural,nominative,exclusive
64+
നാം,first,plural,nominative,inclusive
65+
ഞങ്ങളെ,first,plural,accusative,exclusive
66+
ഞങ്ങൾക്ക്,first,plural,dative,exclusive
67+
ഞങ്ങളുടെ,first,plural,genitive,exclusive,determination=dependent
68+
ഞങ്ങളുടേത്,first,plural,genitive,exclusive,determination=independent
69+
നമ്മുടെ,first,plural,genitive,inclusive,determination=dependent
70+
നമ്മുടേതു്,first,plural,genitive,inclusive,determination=independent
71+
ഞങ്ങളിലു്,first,plural,locative,exclusive
72+
ഞങ്ങളാൽ,first,plural,instrumental,exclusive
73+
ഞങ്ങളോടു്,first,plural,sociative,exclusive
74+
താൻ,third,singular,nominative,reflexive
75+
തങ്ങൾ,third,plural,nominative,formal,reflexive

inflection/resources/org/unicode/inflection/locale/supported-locales.properties

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -15,6 +15,7 @@ locale.group.it=it_IT,it_CH
1515
locale.group.ja=ja_JP
1616
locale.group.ko=ko_KR
1717
locale.group.ms=ms_MY
18+
locale.group.ml=ml_IN
1819
locale.group.nb=nb_NO
1920
locale.group.nl=nl_NL,nl_BE
2021
locale.group.pt=pt_BR,pt_PT
Lines changed: 7 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,7 @@
1+
#
2+
# Copyright 2025 Unicode Incorporated and others. All rights reserved.
3+
#
4+
tokenizer.implementation.class=DefaultTokenizer
5+
tokenizer.nonDecompound.file=/org/unicode/inflection/tokenizer/ml/nondecompound.tok
6+
tokenizer.decompound=(ശ്രീ)(.+?)(ഗുരു|സര്‍ക്കാര്‍)|(.+?)(ഗുരു|സര്‍ക്കാര്‍|ഉണ്ട്|ആണ്|ഇല്ല|ഒടൊപ്പം|ഉടൻ|ഓടെ|ഓട്|ഒപ്പം|തന്നെ|പോലും|പോലെ|ഉം|യ്|കളുടെ|ങ്ങളുടെ|ത്തിന്റെ|ൻ്റെ|ന്റെ|യുടേ|യുടെ|യാൽ|യിൽ|ഇൽ|ല്|ൽ|ക്ക്|മാർ|ങ്ങൾ|കൾ|നെ|യെ)
7+
Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,35 @@
1+
അമ്മ
2+
അച്ഛൻ
3+
അച്ഛി
4+
അമ്മൻ
5+
മകൻ
6+
മകൾ
7+
കുട്ടി
8+
കുട്ടികൾ
9+
ആൺകുട്ടി
10+
ആൺകുട്ടികൾ
11+
പെൺകുട്ടി
12+
പെൺകുട്ടികൾ
13+
കഥ
14+
ചിത്രം
15+
ചിത്രങ്ങൾ
16+
ഗ്രന്ഥം
17+
ഗ്രന്ഥങ്ങൾ
18+
മക്കൾ
19+
ഞാൻ
20+
നീ
21+
നിങ്ങൾ
22+
അവൻ
23+
അവൾ
24+
അവ
25+
അവർ
26+
ഇത്
27+
അത്
28+
ഇവ
29+
അവ
30+
ശ്രീ
31+
നാരായണ
32+
ഗുരു
33+
കേരളം
34+
സര്‍ക്കാര്‍
35+
കേരളസര്‍ക്കാര്‍

inflection/src/inflection/dialog/PronounConcept.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -228,7 +228,7 @@ PronounConcept::PronounConcept(const SemanticFeatureModel& model, std::u16string
228228
for (int32_t idx = 0; idx < pronounData->numValues(); idx++) {
229229
const auto& pronounEntry = pronounData->getPronounEntry(idx);
230230
std::u16string_view displayString(pronounEntry.first);
231-
if (displayString.back() == u' ') {
231+
if (!displayString.empty() && displayString.back() == u' ') {
232232
displayString.remove_suffix(1);
233233
}
234234
auto status = U_ZERO_ERROR;

0 commit comments

Comments
 (0)