Get Gender From Noun Using Nltk With German Corpora
Solution 1:
I don't believe NLTK can do that out of the box for German. However, there are freely available morphological taggers for German which can do that for you, for example RFTagger:
http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/
It gives output like this:
Das PRO.Dem.Subst.-3.Nom.Sg.Neut
ist VFIN.Sein.3.Sg.Pres.Ind
ein ART.Indef.Nom.Sg.Masc
Testsatz N.Reg.Nom.Sg.Masc
. SYM.Pun.Sent
However it is not in Python, so you would have to call it using subprocess. Another option would be to obtain a corpus with nouns tagged for German gender, such as the Tiger corpus:
http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.en.html
and train NLTK to recognize the genders, but I would expect RFTagger is a quicker/more accurate solution.
Solution 2:
Pattern purports to predict German noun gender with ~75% accuracy:
>>>from pattern.de import gender, MALE, FEMALE, NEUTRAL>>>print gender('Katze')
FEMALE
Unfortunately it's only available in Python 2.x.
Solution 3:
I just found this project which sounds promising regarding the question: https://github.com/aakhundov/deep-german .They predict from the character level which probably makes sense in a language like German. Although gender is not as easily detectable as in languages like Spanish, there is some regularities.
What also would work is to do relational parsing, get the pronouns referring to the object you want to classify and then see, whether they are female, male, or neutral. Maybe have a look at spacy for that, too.
Post a Comment for "Get Gender From Noun Using Nltk With German Corpora"