Skip to content Skip to sidebar Skip to footer

Get Gender From Noun Using Nltk With German Corpora

I'm experimenting with NTLK. My question is if the library can detect the gender of a noun in German. I want to receive this information in order to determine if a text is written

Solution 1:

I don't believe NLTK can do that out of the box for German. However, there are freely available morphological taggers for German which can do that for you, for example RFTagger:

http://www.cis.uni-muenchen.de/~schmid/tools/RFTagger/

It gives output like this:

Das     PRO.Dem.Subst.-3.Nom.Sg.Neut 
ist     VFIN.Sein.3.Sg.Pres.Ind 
ein     ART.Indef.Nom.Sg.Masc 
Testsatz    N.Reg.Nom.Sg.Masc 
.   SYM.Pun.Sent 

However it is not in Python, so you would have to call it using subprocess. Another option would be to obtain a corpus with nouns tagged for German gender, such as the Tiger corpus:

http://www.ims.uni-stuttgart.de/forschung/ressourcen/korpora/tiger.en.html

and train NLTK to recognize the genders, but I would expect RFTagger is a quicker/more accurate solution.

Solution 2:

Pattern purports to predict German noun gender with ~75% accuracy:

>>>from pattern.de import gender, MALE, FEMALE, NEUTRAL>>>print gender('Katze')

FEMALE

Unfortunately it's only available in Python 2.x.

Solution 3:

I just found this project which sounds promising regarding the question: https://github.com/aakhundov/deep-german .They predict from the character level which probably makes sense in a language like German. Although gender is not as easily detectable as in languages like Spanish, there is some regularities.

What also would work is to do relational parsing, get the pronouns referring to the object you want to classify and then see, whether they are female, male, or neutral. Maybe have a look at spacy for that, too.

Post a Comment for "Get Gender From Noun Using Nltk With German Corpora"