Skip to content

Conversation

@nciric
Copy link
Contributor

@nciric nciric commented Aug 12, 2025

Fully implements group 3 (all nouns ending with -a).

This removes a need for a large number of nouns to be added to Wikidata.

Resolves part of #172 .

@nciric nciric requested a review from grhoten August 12, 2025 03:35
Comment on lines +224 to +230
static constexpr auto suffix_sg = ::std::to_array<::std::u16string_view>({u"а", u"е", u"и", u"у", u"а", u"ом", u"и"});
static constexpr auto suffix_pl = ::std::to_array<::std::u16string_view>({u"е", u"а", u"ама", u"е", u"е", u"ама", u"ама"});

::std::u16string base = lemma;
// Remove trailing a and apply suffix.
base.pop_back();
base = applySuffix(base, suffix_sg, suffix_pl, number, targetCase);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this kind of mapping, you may be inspired by Arabic, German or Italian. They convert a string to a numeric key (makeLookupKey) containing multiple grammemes, and they map the key to a string. This mapping is initialized in the constructor instead of at runtime.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the concern the runtime size increase (static constexpr)? If yes, I can remove the static (creating these arrays is cheap).
Otherwise the current approach looks simpler. I will look into refactoring this code as I add more cases, potentially implementing Arabic like approach.

WDYT?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to make it more scalable, but this is fine too.

Copy link
Member

@grhoten grhoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look fine. Optional comments to consider where also provided.

<test><source case="genitive" number="plural" gender="feminine" pos="noun">конзерва</source><result>конзерви</result></test>
<test><source case="genitive" number="plural" gender="feminine" pos="noun">гошћа</source><result>гошћа</result></test>
<test><source case="genitive" number="plural" gender="feminine" pos="noun">двојка</source><result>двојака</result></test>
<test><source case="genitive" number="plural" gender="feminine" pos="noun">битка</source><result>битака</result></test>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You have a lot of fully fleshed out constraints. Most of the other languages only change specific grammemes. Sometimes you only specify the case, number or gender. The other tests usually specify less. The other languages usually default to noun. These tests are currently fine, but common usage starts from any surface form (ideally a unique surface form), and then you modify just the relevant grammemes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can probably remove noun info, but I do need a case, gender and number for rule based approach to work.
I also assume nominative input (lemma) - otherwise the rules would be more complex, or would need dictionary support to implement them.

Comment on lines +224 to +230
static constexpr auto suffix_sg = ::std::to_array<::std::u16string_view>({u"а", u"е", u"и", u"у", u"а", u"ом", u"и"});
static constexpr auto suffix_pl = ::std::to_array<::std::u16string_view>({u"е", u"а", u"ама", u"е", u"е", u"ама", u"ама"});

::std::u16string base = lemma;
// Remove trailing a and apply suffix.
base.pop_back();
base = applySuffix(base, suffix_sg, suffix_pl, number, targetCase);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was to make it more scalable, but this is fine too.

@nciric nciric merged commit 60043f1 into main Aug 19, 2025
7 checks passed
@nciric nciric deleted the cira-sr branch August 19, 2025 20:06
@nciric nciric restored the cira-sr branch August 19, 2025 20:13
@nciric nciric deleted the cira-sr branch August 19, 2025 20:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants