How To Encode A Unicode String (ones From Json) To 'utf-8' In Python?
Solution 1:
As you said in comments, you solved your issue using:
jsonContent = json.dumps(request.json)
guid_object = hashlib.sha1(jsonContent.encode('utf-8'))
But it's important to understand why this works. Flask sends you unicode()
for non-ASCII, and str()
for ASCII. Dumping the result using JSON will give you consistent results since it abstracts away the internal Python representation, just as if you only had unicode()
.
Python 2
In Python 2 (the Python version you're using), you don't need .encode('utf-8')
because the default value of ensure_ascii
of json.dumps()
is True
. When you send non-ASCII data to json.dumps()
, it will use JSON escape sequences to actually dump ASCII: no need to encode to UTF-8. Also, since the Zen of Python says that "Explicit is better than implicit", even if ensure_ascii
is already True
, you could specify it:
jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent)
Python 3
In Python 3 however, this would no longer work. Inded, json.dumps()
returns unicode
in Python 3, even if everything in the unicode
string is ASCII. But hashlib.sha1
only works on bytes
. You need to make the conversion explicit, even if the ASCII encoding is all you need:
jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent.encode('ascii'))
This is why Python 3 is a better language: it forces you to be more explicit about the text you use, whether it is str
(Unicode) or bytes
. This avoids many, many problems down the road.
Post a Comment for "How To Encode A Unicode String (ones From Json) To 'utf-8' In Python?"