programing

리소스 u'tokenizers / punkt / english.pickle '을 찾을 수 없습니다.

nasanasas 2020. 9. 4. 07:33
반응형

리소스 u'tokenizers / punkt / english.pickle '을 찾을 수 없습니다.


내 코드 :

import nltk.data
tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

에러 메시지:

[ec2-user@ip-172-31-31-31 sentiment]$ python mapper_local_v1.0.py
Traceback (most recent call last):
File "mapper_local_v1.0.py", line 16, in <module>

    tokenizer = nltk.data.load('nltk:tokenizers/punkt/english.pickle')

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 774, in load

    opened_resource = _open(resource_url)

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 888, in _open

    return find(path_, path + ['']).open()

File "/usr/lib/python2.6/site-packages/nltk/data.py", line 618, in find

    raise LookupError(resource_not_found)

LookupError:

Resource u'tokenizers/punkt/english.pickle' not found.  Please
use the NLTK Downloader to obtain the resource:

    >>>nltk.download()

Searched in:
- '/home/ec2-user/nltk_data'
- '/usr/share/nltk_data'
- '/usr/local/share/nltk_data'
- '/usr/lib/nltk_data'
- '/usr/local/lib/nltk_data'
- u''

이 프로그램을 Unix 컴퓨터에서 실행하려고합니다.

오류 메시지에 따라 유닉스 컴퓨터에서 파이썬 셸에 로그인 한 다음 아래 명령을 사용했습니다.

import nltk
nltk.download()

그런 다음 d-down loader 및 l-list 옵션을 사용하여 사용 가능한 모든 것을 다운로드했지만 여전히 문제가 지속됩니다.

나는 인터넷에서 해결책을 찾기 위해 최선을 다했지만 위의 단계에서 언급 한 것과 동일한 해결책을 얻었습니다.


alvas의 답변에 추가하려면 punkt말뭉치 만 다운로드 할 수 있습니다 .

nltk.download('punkt')

다운로드 all는 나에게 과잉처럼 들립니다. 그것이 당신이 원하는 것이 아니라면.


punkt모델 만 다운로드하려는 경우 :

import nltk
nltk.download('punkt')

If you're unsure which data/model you need, you can install the popular datasets, models and taggers from NLTK:

import nltk
nltk.download('popular')

With the above command, there is no need to use the GUI to download the datasets.


I got the solution:

import nltk
nltk.download()

once the NLTK Downloader starts

d) Download l) List u) Update c) Config h) Help q) Quit

Downloader> d

Download which package (l=list; x=cancel)? Identifier> punkt


From the shell you can execute:

sudo python -m nltk.downloader punkt 

If you want to install the popular NLTK corpora/models:

sudo python -m nltk.downloader popular

If you want to install all NLTK corpora/models:

sudo python -m nltk.downloader all

To list the resources you have downloaded:

python -c 'import os; import nltk; print os.listdir(nltk.data.find("corpora"))'
python -c 'import os; import nltk; print os.listdir(nltk.data.find("tokenizers"))'

The same thing happened to me recently, you just need to download the "punkt" package and it should work.

When you execute "list" (l) after having "downloaded all the available things", is everything marked like the following line?:

[*] punkt............... Punkt Tokenizer Models

If you see this line with the star, it means you have it, and nltk should be able to load it.


import nltk
nltk.download('punkt')

Open the Python prompt and run the above statements.

The sent_tokenize function uses an instance of PunktSentenceTokenizer from the nltk.tokenize.punkt module. This instance has already been trained and works well for many European languages. So it knows what punctuation and characters mark the end of a sentence and the beginning of a new sentence.


Go to python console by typing

$ python

in your terminal. Then, type the following 2 commands in your python shell to install the respective packages:

>> nltk.download('punkt') >> nltk.download('averaged_perceptron_tagger')

This solved the issue for me.


My issue was that I called nltk.download('all') as the root user, but the process that eventually used nltk was another user who didn't have access to /root/nltk_data where the content was downloaded.

So I simply recursively copied everything from the download location to one of the paths where NLTK was looking to find it like this:

cp -R /root/nltk_data/ /home/ubuntu/nltk_data

  1. Execute the following code:

    import nltk
    nltk.download()
    
  2. After this, NLTK downloader will pop out.

  3. Select All packages.
  4. Download punkt.

Simple nltk.download() will not solve this issue. I tried the below and it worked for me:

in the nltk folder create a tokenizers folder and copy your punkt folder into tokenizers folder.

This will work.! the folder structure needs to be as shown in the picture


You need to rearrange your folders Move your tokenizers folder into nltk_data folder. This doesn't work if you have nltk_data folder containing corpora folder containing tokenizers folder


For me nothing of the above worked, so I just downloaded all the files by hand from the web site http://www.nltk.org/nltk_data/ and I put them also by hand in a file "tokenizers" inside of "nltk_data" folder. Not a pretty solution but still a solution.


I faced same issue. After downloading everything, still 'punkt' error was there. I searched package on my windows machine at C:\Users\vaibhav\AppData\Roaming\nltk_data\tokenizers and I can see 'punkt.zip' present there. I realized that somehow the zip has not been extracted into C:\Users\vaibhav\AppData\Roaming\nltk_data\tokenizers\punk. Once I extracted the zip, it worked like music.

참고URL : https://stackoverflow.com/questions/26570944/resource-utokenizers-punkt-english-pickle-not-found

반응형