Python3 – bytesとString

1 概要
2 ASCII文字列
- 2.1 encode/decode
- 2.2 16進表現
3 UTF-8/Shift_JIS
- 3.1 encode/decode
- 3.2 16進表現

概要

Pythonでの文字列表現をバイト列に変換するにはencodeメソッドを用い、バイト列を文字列に変換するにはdecodeメソッドを用いる。encode/decodeの引数には文字コードを指定する。

バイト列をPythonのprint文で出力すると、b'文字列'で表現され、改行文字などはエスケープコード（\nなど）で表示される。

ASCII文字列

encode/decode

ASCII文字列はencode/decodeの引数を'ascii'とする。

# String->bytesへエンコード
s = 'ABC\nDEF'
print(s.encode('ascii'))
# b'ABC\nDEF'

# bytes->Stringへデコード
b = b'ABC\nDEF'
print(b.decode('ascii'))
# ABC
# DEF

# String->bytesへエンコード

s = 'ABC\nDEF'

print(s.encode('ascii'))

# b'ABC\nDEF'

# bytes->Stringへデコード

b = b'ABC\nDEF'

print(b.decode('ascii'))

# ABC

# DEF

ASCIIの場合は引数が'utf-8'や'shift_jis'としても結果は同じ。

16進表現

bytes.hexメソッドの引数にバイト列を与えると、その16進表現の文字列が得られる。

print(bytes.hex(b'ABC\nDEF'))
# 4142430a444546

1 2	print(bytes.hex(b'ABC\nDEF')) # 4142430a444546

UTF-8/Shift_JIS

encode/decode

マルチバイト文字の場合、encodeとdecodeで文字コードを整合させる。

# String->bytesへエンコード
s = 'あい\nうえ'
print(s.encode('utf-8'))
# b'\xe3\x81\x82\xe3\x81\x84\n\xe3\x81\x86\xe3\x81\x88'

# bytes->Stringへデコード
b = b'\xe3\x81\x82\xe3\x81\x84\n\xe3\x81\x86\xe3\x81\x88'
print(b.decode('utf-8'))
# あい
# うえ

# String->bytesへエンコード

s = 'あい\nうえ'

print(s.encode('utf-8'))

# b'\xe3\x81\x82\xe3\x81\x84\n\xe3\x81\x86\xe3\x81\x88'

# bytes->Stringへデコード

b = b'\xe3\x81\x82\xe3\x81\x84\n\xe3\x81\x86\xe3\x81\x88'

print(b.decode('utf-8'))

# あい

# うえ

encodeとdecodeで文字コードが違うと、文字化けするのではなくエラーになる。

b = b'\xe3\x81\x82\xe3\x81\x84\n\xe3\x81\x86\xe3\x81\x88'
print(b.decode('shift_jis'))
# Traceback (most recent call last):
#  File "bytestring.py", line 5, in <module>
#    print(b.decode('shift_jis'))
# UnicodeDecodeError: 'shift_jis' codec can't decode byte 0x86 in position 9: illegal multibyte sequence

b = b'\xe3\x81\x82\xe3\x81\x84\n\xe3\x81\x86\xe3\x81\x88'

print(b.decode('shift_jis'))

# Traceback (most recent call last):

# File "bytestring.py", line 5, in <module>

# print(b.decode('shift_jis'))

# UnicodeDecodeError: 'shift_jis' codec can't decode byte 0x86 in position 9: illegal multibyte sequence

ただしerrors引数の設定をデフォルトの'strict’から変更すると、文字化けした文字列などが返される。

16進表現

マルチバイト文字の16進表現はバイト列の表現のとおりになる。

print(bytes.hex(b'\xe3\x81\x82\xe3\x81\x84\n\xe3\x81\x86\xe3\x81\x88'))
# e38182e381840ae38186e38188

1 2	print(bytes.hex(b'\xe3\x81\x82\xe3\x81\x84\n\xe3\x81\x86\xe3\x81\x88')) # e38182e381840ae38186e38188

TauStation

Python3 – bytesとString

概要

ASCII文字列

encode/decode

16進表現

UTF-8/Shift_JIS

encode/decode

16進表現

コメントを残すコメントをキャンセル

概要

ASCII文字列

encode/decode

16進表現

UTF-8/Shift_JIS

encode/decode

16進表現

コメントを残す コメントをキャンセル

コメントを残すコメントをキャンセル