Skip to content Skip to sidebar Skip to footer

Processing Non-english Text

I have a python file that reads a file given by the user, processes it, and ask questions in flash card format. The program works fine with an english txt file but I encounter erro

Solution 1:

It's common question. Seems that you're using cmd which doesn't support unicode, so error occurs during translation of output to the encoding, which your cmd runs. And as unicode has a wider charset, than encoding used in cmd, it gives an error

IDLE is built ontop of tkinter's Text widget, which perfectly supports Python strings in unicode.

And, finally, when you specify a file you'd like to open, the open function assumes that it's in platform default (per locale.getpreferredencoding()). So if your file encoding differs, you should exactly mention it in keyword arg encoding to open func.


Solution 2:

The Windows console does not natively support Unicode (despite what people say about chcp 65001). It's designed to be backwards compatible so only supports 8bit character sets.

Use win-unicode-console instead. It talks to the cmd at a lower level, which allows all Unicode characters to be printed, and importantly, inputted.

The best way to enable it is in your usercustomize script, so that's enabled by default on your machine.


Post a Comment for "Processing Non-english Text"