Python 2 Assumes Different Source Code Encodings
Solution 1:
The -c
and -m
switches, ultimately run the code supplied with the exec
statement or the compile()
function, both of which take Latin-1 source code:
The first expression should evaluate to either a Unicode string, a Latin-1 encoded string, an open file object, a code object, or a tuple.
This is not documented, it's an implementation detail, that may or may not be considered a bug.
I don't think it is something that is worth fixing however, and Latin-1 is a superset of ASCII so little is lost. How code from -c
and -m
is handled has been cleaned up in Python 3 and is much more consistent there; code passed in with -c
is decoded using the current locale, and modules loaded with the -m
switch default to UTF-8, as usual.
If you want to know the exact implementations used, start at the Py_Main()
function in Modules/main.c
, which handles both -c
and -m
as:
if (command) {
sts = PyRun_SimpleStringFlags(command, &cf) != 0;
free(command);
} elseif (module) {
sts = RunModule(module, 1);
free(module);
}
-c
is executed through thePyRun_SimpleStringFlags()
function, which in turn callsPyRun_StringFlags()
. When you useexec
a bytestring object is passed toPyRun_StringFlags()
too, and the source code is then assumed to contain Latin-1-encoded bytes.-m
uses theRunModule()
function to pass the module name to the private function_run_module_as_main()
in therunpy
module, which usespkgutil.get_loader()
to load the module metadata, and fetches the module code object with theloader.get_code()
function on the PEP 302 loader; if no cached bytecode is available then the code object is produced by using thecompile()
function with the mode set toexec
.
Post a Comment for "Python 2 Assumes Different Source Code Encodings"