Skip to content Skip to sidebar Skip to footer

How To Encode A Unicode String (ones From Json) To 'utf-8' In Python?

I am creating a REST API using Flask-Python. One of the urls (/uploads) takes in (a POST HTTP request) and a JSON '{'src':'void', 'settings':'my settings'}'. I can individually ext

Solution 1:

As you said in comments, you solved your issue using:

jsonContent = json.dumps(request.json)
guid_object = hashlib.sha1(jsonContent.encode('utf-8'))

But it's important to understand why this works. Flask sends you unicode() for non-ASCII, and str() for ASCII. Dumping the result using JSON will give you consistent results since it abstracts away the internal Python representation, just as if you only had unicode().

Python 2

In Python 2 (the Python version you're using), you don't need .encode('utf-8') because the default value of ensure_ascii of json.dumps() is True. When you send non-ASCII data to json.dumps(), it will use JSON escape sequences to actually dump ASCII: no need to encode to UTF-8. Also, since the Zen of Python says that "Explicit is better than implicit", even if ensure_ascii is already True, you could specify it:

jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent)

Python 3

In Python 3 however, this would no longer work. Inded, json.dumps() returns unicode in Python 3, even if everything in the unicode string is ASCII. But hashlib.sha1 only works on bytes. You need to make the conversion explicit, even if the ASCII encoding is all you need:

jsonContent = json.dumps(request.json, ensure_ascii=True)
guid_object = hashlib.sha1(jsonContent.encode('ascii'))

This is why Python 3 is a better language: it forces you to be more explicit about the text you use, whether it is str (Unicode) or bytes. This avoids many, many problems down the road.

Post a Comment for "How To Encode A Unicode String (ones From Json) To 'utf-8' In Python?"