programing

UnicodeDecodeError : 'ascii'코덱이 위치 1의 바이트 0xef를 디코딩 할 수 없습니다.

nasanasas 2020. 8. 15. 09:21
반응형

UnicodeDecodeError : 'ascii'코덱이 위치 1의 바이트 0xef를 디코딩 할 수 없습니다.


문자열을 UTF-8로 인코딩하는 데 몇 가지 문제가 있습니다. string.encode('utf-8')사용을 포함하여 여러 가지를 시도했지만 unicode(string)오류가 발생합니다.

UnicodeDecodeError : 'ascii'코덱은 위치 1에서 0xef 바이트를 디코딩 할 수 없습니다. 서 수가 범위에 없습니다 (128).

이것은 내 문자열입니다.

(。・ω・。)ノ

무슨 일이 일어나고 있는지 모르겠어요.

편집 : 문제는 문자열을 그대로 인쇄하는 것이 제대로 표시되지 않는다는 것입니다. 또한 변환하려고 할 때이 오류 :

Python 2.7.1+ (r271:86832, Apr 11 2011, 18:13:53)
[GCC 4.5.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)

이것은 UTF-8로 설정되지 않은 터미널의 인코딩과 관련이 있습니다. 여기 내 터미널입니다

$ echo $LANG
en_GB.UTF-8
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
(。・ω・。)ノ
>>> 

내 터미널에서 예제는 위와 함께 작동하지만 LANG설정을 제거 하면 작동하지 않습니다.

$ unset LANG
$ python
Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
UnicodeEncodeError: 'ascii' codec can't encode characters in position 1-5: ordinal not in range(128)
>>> 

이 변경 사항을 영구적으로 만드는 방법을 알아 보려면 Linux 변형 문서를 참조하십시오.


시험:

string.decode('utf-8')  # or:
unicode(string, 'utf-8')

edit:

'(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'.decode('utf-8') gives u'(\uff61\uff65\u03c9\uff65\uff61)\uff89', which is correct.

so your problem must be at some oter place, possibly if you try to do something with it were there is an implicit conversion going on (could be printing, writing to a stream...)

to say more we'll need to see some code.


My +1 to mata's comment at https://stackoverflow.com/a/10561979/1346705 and to the Nick Craig-Wood's demonstration. You have decoded the string correctly. The problem is with the print command as it converts the Unicode string to the console encoding, and the console is not capable to display the string. Try to write the string into a file and look at the result using some decent editor that supports Unicode:

import codecs

s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
s1 = s.decode('utf-8')
f = codecs.open('out.txt', 'w', encoding='utf-8')
f.write(s1)
f.close()

Then you will see (。・ω・。)ノ.


Try setting the system default encoding as utf-8 at the start of the script, so that all strings are encoded using that.

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

If you are working on a remote host, look at /etc/ssh/ssh_config on your local PC.

When this file contains a line:

SendEnv LANG LC_*

comment it out with adding # at the head of line. It might help.

With this line, ssh sends language related environment variables of your PC to the remote host. It causes a lot of problems.


No problems with my terminal. The above answers helped me looking in the right directions but it didn't work for me until I added 'ignore':

fix_encoding = lambda s: s.decode('utf8', 'ignore')

As indicated in the comment below, this may lead to undesired results. OTOH it also may just do the trick well enough to get things working and you don't care about losing some characters.


It's fine to use the below code in the top of your script as Andrei Krasutski suggested.

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

But I will suggest you to also add # -*- coding: utf-8 -* line at very top of the script.

Omitting it throws below error in my case when I try to execute basic.py.

$ python basic.py
  File "01_basic.py", line 14
SyntaxError: Non-ASCII character '\xd9' in file basic.py on line 14, but no encoding declared; see http://python.org/dev/peps/pep-0263/ for details

The following is the code present in basic.py which throws above error.

code with error

from pylatex import Document, Section, Subsection, Command, Package
from pylatex.utils import italic, NoEscape

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

def fill_document(doc):
    with doc.create(Section('ِش سثؤفهخى')):
        doc.append('إخع ساخعمي شمصشغس سحثشن فاث فقعفا')
        doc.append(italic('فشمهؤ ؤخىفثىفس شقث شمسخ ىهؤث'))

        with doc.create(Subsection('آثص ٍعلاسثؤفهخى')):
            doc.append('بشةخعس ؤقشئغ ؤاشقشؤفثقس: $&#{}')


if __name__ == '__main__':
    # Basic document
    doc = Document('basic')
    fill_document(doc)

Then I added # -*- coding: utf-8 -*- line at very top and executed. It worked.

code without error

# -*- coding: utf-8 -*-
from pylatex import Document, Section, Subsection, Command, Package
from pylatex.utils import italic, NoEscape

import sys
reload(sys)
sys.setdefaultencoding('utf-8')

def fill_document(doc):
    with doc.create(Section('ِش سثؤفهخى')):
        doc.append('إخع ساخعمي شمصشغس سحثشن فاث فقعفا')
        doc.append(italic('فشمهؤ ؤخىفثىفس شقث شمسخ ىهؤث'))

        with doc.create(Subsection('آثص ٍعلاسثؤفهخى')):
            doc.append('بشةخعس ؤقشئغ ؤاشقشؤفثقس: $&#{}')


if __name__ == '__main__':
    # Basic document
    doc = Document('basic')
    fill_document(doc)

Thanks.


It looks like your string is encoded to utf-8, so what exactly is the problem? Or what are you trying to do here..?

Python 2.7.3 (default, Apr 20 2012, 22:39:59) 
[GCC 4.6.3] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> s = '(\xef\xbd\xa1\xef\xbd\xa5\xcf\x89\xef\xbd\xa5\xef\xbd\xa1)\xef\xbe\x89'
>>> s1 = s.decode('utf-8')
>>> print s1
(。・ω・。)ノ
>>> s2 = u'(。・ω・。)ノ'
>>> s2 == s1
True
>>> s2
u'(\uff61\uff65\u03c9\uff65\uff61)\uff89'

this works for ubuntu 15.10:

sudo locale-gen "en_US.UTF-8"
sudo dpkg-reconfigure locales

In my case, it was caused by my Unicode file being saved with a "BOM". To solve this, I cracked open the file using BBEdit and did a "Save as..." choosing for encoding "Unicode (UTF-8)" and not what it came with which was "Unicode (UTF-8, with BOM)"


I was getting the same type of error, and I found that the console is not capable of displaying the string in another language. Hence I made the below code changes to set default_charset as UTF-8.

data_head = [('\x81\xa1\x8fo\x89\xef\x82\xa2\x95\xdb\x8f\xd8\x90\xa7\x93x\x81\xcb3\x8c\x8e\x8cp\x91\xb1\x92\x86(\x81\x86\x81\xde\x81\x85)\x81\xa1\x8f\x89\x89\xf1\x88\xc8\x8aO\x81A\x82\xa8\x8b\xe0\x82\xcc\x90S\x94z\x82\xcd\x88\xea\x90\xd8\x95s\x97v\x81\xa1\x83}\x83b\x83v\x82\xcc\x82\xa8\x8e\x8e\x82\xb5\x95\xdb\x8c\xaf\x82\xc5\x8fo\x89\xef\x82\xa2\x8am\x92\xe8\x81\xa1', 'shift_jis')]
default_charset = 'UTF-8' #can also try 'ascii' or other unicode type
print ''.join([ unicode(lin[0], lin[1] or default_charset) for lin in data_head ])

BOM, it's so often BOM for me

vi the file, use

:set nobomb

and save it. That nearly always fixes it in my case


This is the best answer: https://stackoverflow.com/a/4027726/2159089

in linux:

export PYTHONIOENCODING=utf-8

so sys.stdout.encoding is OK.


I had the same error, with URLs containing non-ascii chars (bytes with values > 128)

url = url.decode('utf8').encode('utf-8')

Worked for me, in Python 2.7, I suppose this assignment changed 'something' in the str internal representation--i.e., it forces the right decoding of the backed byte sequence in url and finally puts the string into a utf-8 str with all the magic in the right place. Unicode in Python is black magic for me. Hope useful


i solve that problem changing in the file settings.py with 'ENGINE': 'django.db.backends.mysql', don´t use 'ENGINE': 'mysql.connector.django',


Just convert the text explicitly to string using str(). Worked for me.

참고URL : https://stackoverflow.com/questions/10561923/unicodedecodeerror-ascii-codec-cant-decode-byte-0xef-in-position-1

반응형