What Is The Default Encoding Method For Code Assumed By Python Interpreter?

September 07, 2022 Post a Comment

Some people use the following to declare the encoding method for the text of their Python source code: # -*- coding: utf-8 -*- Back in 2001, it is said the default encoding method

Solution 1:

Without any explicit encoding declaration, the assumed encoding for your source code will be

ascii for Python 2.x
utf-8 for Python 3.x

See PEP 0263 and Using source code encoding for Python 2.x, and PEP 3120 for the new default of utf-8 for Python 3.x.

So the default encoding assumened for source code will be directly dependent of the version of the Python interpreter, and it is not configurable.

Note that the source code encoding is something entirely different than dealing with non-ASCII characters as part of your data in strings.

There are two distinct cases where you may encounter non-ASCII characters:

As part of your programs data, during runtime
As part of your source code (and since you can't have non-ASCII characters in identifiers, that usually means hard coded string data in your source code or comments).

The source code encoding declaration affects what encoding your source code will be interpreted with - so it's only needed if you decide to directly put non-ASCII characters in your source code.

So, the following code will eventually have to deal with the fact that there might be non-ASCII characters in data.txt:

with open('data.txt') as f:
    for line in f:
        # do something with `line`

But it doesn't contain any non-ASCII characters in the source code, therefore it doesn't need an encoding declaration at the top of the file. It will however need to properly decode line if it wants to turn it into unicode. Simply doing unicode(line) will use the system default encoding, which is ascii (different from the default source encoding, but happens to also be ascii). So to explicitely decode the string using utf-8 you'd need to do line.decode('utf-8').

This code however does contain non-ASCII characters directly in its source code:

TEST_DATA = 'Bär'    # <--- non-ASCII character on this line
print TEST_DATA

And it will fail with a SyntaxError similar to this, unless you declare an explicit source code encoding:

SyntaxError: Non-ASCII character '\xc3' in file foo.py on line 1, but no encoding declared;
see http://www.python.org/peps/pep-0263.html for details

So assuming your text editor is configured to save files in utf-8, you'd need to put the line

# -*- coding: utf-8 -*-

at the top of the file for Python to interpret the source code correctly.

My advice however would be to generally avoid putting non-ASCII characters in your source code, exactly because if it depends on your and your co-workers editor and terminal settings wheter it will be written and read correctly.

Instead you can use escaped strings to safely enter non-ASCII characters in your code:

TEST_DATA = 'B\xc3\xa4r'

Solution 2:

By default, Python source files are treated as encoded in UTF-8. In that encoding, — although the standard library only uses ASCII characters for identifiers, a convention that any portable code should follow. To display all these characters properly, the editor must recognize that the file is UTF-8, and it must use a font that supports all the characters in the file.

It is also possible to specify a different encoding for source files. In order to do this, we put the below code on top of our code !

# -*- coding: encoding -*-

https://docs.python.org/dev/tutorial/interpreter.html

Python Channel

What Is The Default Encoding Method For Code Assumed By Python Interpreter?

Solution 1:

Solution 2:

Post a Comment for "What Is The Default Encoding Method For Code Assumed By Python Interpreter?"