# UnicodeDecodeError: 'charmap' codec can't decode byte
The Python "UnicodeDecodeError: 'charmap' codec can't decode byte in position"occurs when we specify an incorrect encoding or don't explicitly set theencoding
keyword argument when opening a file.
To solve the error, specify the correct encoding, e.g. utf-8
.
Here is an example of how the error occurs.
I have a file called example.txt
with the following contents.
example.txt
Copied!
𝘈Ḇ𝖢𝕯٤ḞԍНǏhello world
And here is the code that tries to decode the contents of example.txt
.
main.py
Copied!
# ⛔️ UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1: character maps to <undefined>with open('example.txt', 'r', encoding='cp856') as f: lines = f.readlines() print(lines)
The error is caused because the example.txt
file doesn't use the specifiedencoding (cp856
).
example.txt
Copied!
𝘈Ḇ𝖢𝕯٤ḞԍНǏhello world
# Specifying the correct encoding when opening the file
If you know the encoding the file uses, make sure to specify it using theencoding
keyword argument.
Otherwise, the first thing you can try is setting the encoding to utf-8
.
main.py
Copied!
with open('example.txt', 'r', encoding='utf-8') as f: lines = f.readlines() # ✅ ['𝘈Ḇ𝖢𝕯٤ḞԍНǏ\n', 'hello world'] print(lines)
The utf-8
encoding is capable of encoding over a million valid character code points in Unicode.
The same approach can be used if you use theopen()function directly instead of using thewith statement.
main.py
Copied!
my_file = open('example.txt', 'r', encoding='utf-8')lines = my_file.readlines()print(lines) # ['𝘈Ḇ𝖢𝕯٤ḞԍНǏ\n', 'hello world']
You can view all of the standard encodings inthis tableof the official docs.
Some of the common encodings are ascii
, latin-1
and utf-32
.
# Specifying an encoding when using the patlib
module
If you use the pathlib
module, specify an encoding when calling the specificmethod.
main.py
Copied!
from pathlib import Pathtext = Path('example.txt').read_text(encoding='utf-8')# 𝘈Ḇ𝖢𝕯٤ḞԍНǏ# hello worldprint(text)
You can pass the encoding when calling methods such asPath.read_textorPath.write_text.
# Ignoring characters that cannot be decoded
If the error persists, you could set theerrors keyword argumentto ignore
to ignore the characters that cannot be decoded.
Note that ignoring characters that cannot be decoded can lead to data loss.
main.py
Copied!
# 👇️ Set errors to ignorewith open('example.txt', 'r', encoding='utf-8', errors='ignore') as f: lines = f.readlines() # ✅ ['𝘈Ḇ𝖢𝕯٤ḞԍНǏ\n', 'hello world'] print(lines)
Opening the file with an incorrect encoding with errors
set to ignore
won'traise a UnicodeDecodeError
.
main.py
Copied!
with open('example.txt', 'r', encoding='cp856', errors='ignore') as f: lines = f.readlines() # ✅ ['\xadרט©ז\xadצ\xadץ»┘©×םן\n', 'hello world'] print(lines)
The characters that cannot be decoded are simply ignored.
# Opening the file in binary mode
If you don't need to interact with the contents of the file, you can open it inbinary mode without decoding it.
main.py
Copied!
with open('example.txt', 'rb') as f: lines = f.readlines() # ✅ [b'\xf0\x9d\x98\x88\xe1\xb8\x86\xf0\x9d\x96\xa2\xf0\x9d\x95\xaf\xd9\xa4\xe1\xb8\x9e\xd4\x8d\xd0\x9d\xc7\x8f\n', b'hello world'] print(lines)
We opened the file in binary mode (using the rb
- read binary mode), so thelines
list contains bytes objects.
You can use this approach if you need to upload the file to a remote server anddon't need to decode it.
Encoding is the process of converting a string
to a bytes
object and decoding is the process of converting a bytes
object to a string
.
When decoding a bytes object, we have to use the same encoding that was used toencode the string to a bytes object.
# Try using the cp437
encoding
If the error persists, try to use thecp437 encoding when opening thefile.
main.py
Copied!
with open('example.txt', 'r', encoding='cp437') as f: lines = f.readlines() # ✅ ['≡¥ÿêß╕å≡¥ûó≡¥ò»┘ñß╕₧╘ì╨¥╟Å\n', 'hello world'] print(lines)
The Code page 437 encoding is the character set of the original IBM personalcomputer and includes all printable ASCII characters as well as some accentedletters.
If you still get an error, set the errors
keyword argument to ignore
in thecall to theopen() function.
main.py
Copied!
with open('example.txt', 'r', encoding='cp437', errors='ignore') as f: lines = f.readlines() # ✅ ['≡¥ÿêß╕å≡¥ûó≡¥ò»┘ñß╕₧╘ì╨¥╟Å\n', 'hello world'] print(lines)
The characters that cannot be decoded are simply ignored which may cause dataloss.
If the error persists, try other encodings such as utf-16
, utf-32
,latin-1
, etc.
# Trying to find the encoding of the file
You can try to figure out what the encoding of the file is by using the file
command.
The command is available on macOS and Linux, but can also be used on Windows ifyou have Git and Git Bash installed.
Make sure to run the command in Git Bash if on Windows.
Open your shell in the directory that contains the file and run the followingcommand.
shell
Copied!
file *
The screenshot shows that the file uses the ASCII
encoding.
This is the encoding you should specify when opening the file.
main.py
Copied!
with open('example.txt', 'r', encoding='ascii') as f: lines = f.readlines() print(lines)
If you are on Windows, you can also:
- Open the file in the basic version of Notepad.
- Click on "Save as".
- Look at the selected encoding right next to the "Save" button.
The screenshot shows that the encoding for the file is UTF-8
, so that's whatwe have to specify when calling theopen() function.
main.py
Copied!
with open('example.txt', 'r', encoding='utf-8') as f: lines = f.readlines() print(lines)
# Try using the latin-1
encoding
If the error persists, try to use thelatin-1 encoding when openingthe file.
main.py
Copied!
with open('example.txt', 'r', encoding='latin-1') as f: lines = f.readlines() # ['ð\x9d\x98\x88á¸\x86ð\x9d\x96¢ð\x9d\x95¯Ù¤á¸\x9eÔ\x8dÐ\x9dÇ\x8f\n', 'hello world'] print(lines)
Make sure to check if you get legible results when using the latin-1
encoding.
# Using a different encoding causes the error
Here is an example that shows how using a different encoding to encode a stringto bytes than the one used to decode the bytes object causes the error.
main.py
Copied!
my_text = '𝘈Ḇ𝖢𝕯٤ḞԍНǏ'my_binary_data = my_text.encode('utf-8')# ⛔️ UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 1: character maps to <undefined>my_text_again = my_binary_data.decode('cp856')
We can solve the error by using the utf-8
encoding to decode the bytes object.
main.py
Copied!
my_text = '𝘈Ḇ𝖢𝕯٤ḞԍНǏ'my_binary_data = my_text.encode('utf-8')# 👉️ b'\xf0\x9d\x98\x88\xe1\xb8\x86\xf0\x9d\x96\xa2\xf0\x9d\x95\xaf\xd9\xa4\xe1\xb8\x9e\xd4\x8d\xd0\x9d\xc7\x8f'print(my_binary_data)# ✅ Specify the correct encodingmy_text_again = my_binary_data.decode('utf-8')print(my_text_again) # 👉️ '𝘈Ḇ𝖢𝕯٤ḞԍНǏ'