Utf-8 Coding In Python
I have an UTF-8 character encoded with `_' in between, e.g., '_ea_b4_80'. I'm trying to convert it into UTF-8 character using replace method, but I can't get the correct encoding.
Solution 1:
\x
is only meaningful in string literals, you're can't use replace
to add it.
To get your desired result, convert to bytes, then decode:
import binascii
r ='_ea_b4_80'
rhexonly = r.replace('_', '') # Returns'eab480'
rbytes = binascii.unhexlify(rhexonly) # Returns b'\xea\xb4\x80'
rtext = rbytes.decode('utf-8') # Returns'관' (unicode if Py2, str Py3)
print(rtext)
which should get you 관
as you desire.
If you're using modern Py3, you can avoid the import (assuming r
is in fact a str
; bytes.fromhex
, unlike binascii.hexlify
, only take str
inputs, not bytes
inputs) using the bytes.fromhex
class method in place of binascii.unhexlify
:
rbytes = bytes.fromhex(rhexonly) # Returns b'\xea\xb4\x80'
Post a Comment for "Utf-8 Coding In Python"