노드 fs를 사용하여 aws s3 버킷에서 파일 읽기

programing

노드 fs를 사용하여 aws s3 버킷에서 파일 읽기

nasanasas 2020. 11. 23. 08:14

노드 fs를 사용하여 aws s3 버킷에서 파일 읽기

다음을 사용하여 aws s3 버킷에있는 파일을 읽으려고합니다.

fs.readFile(file, function (err, contents) {
  var myLines = contents.Body.toString().split('\n')
})

노드 aws-sdk를 사용하여 파일을 다운로드하고 업로드 할 수 있었지만 단순히 파일을 읽고 내용을 구문 분석하는 방법을 잃어 버렸습니다.

다음은 s3에서 파일을 읽는 방법의 예입니다.

var s3 = new AWS.S3();
var params = {Bucket: 'myBucket', Key: 'myKey.csv'}
var s3file = s3.getObject(params)

몇 가지 옵션이 있습니다. 두 번째 인수로 콜백을 포함 할 수 있으며, 이는 오류 메시지 및 객체와 함께 호출됩니다. 이 예제 는 AWS 설명서에서 직접 가져온 것입니다.

s3.getObject(params, function(err, data) {
  if (err) console.log(err, err.stack); // an error occurred
  else     console.log(data);           // successful response
});

또는 출력을 스트림으로 변환 할 수 있습니다. AWS 설명서 에도 예제 가 있습니다.

var s3 = new AWS.S3({apiVersion: '2006-03-01'});
var params = {Bucket: 'myBucket', Key: 'myImageFile.jpg'};
var file = require('fs').createWriteStream('/path/to/file.jpg');
s3.getObject(params).createReadStream().pipe(file);

이렇게하면됩니다.

new AWS.S3().getObject({ Bucket: this.awsBucketName, Key: keyName }, function(err, data)
{
    if (!err)
        console.log(data.Body.toString());
});

S3 텍스트 파일을 한 줄씩 처리하려는 것 같습니다. 다음은 표준 readline 모듈과 AWS의 createReadStream ()을 사용하는 노드 버전입니다.

const readline = require('readline');

const rl = readline.createInterface({
    input: s3.getObject(params).createReadStream()
});

rl.on('line', function(line) {
    console.log(line);
})
.on('close', function() {
});

아직 이유를 알 수 없었지만 createReadStream/ pipe접근 방식이 저에게 효과적이지 않았습니다. 큰 CSV 파일 (300MB +)을 다운로드하려고했는데 중복 된 줄이 생겼습니다. 무작위 문제인 것 같았습니다. 최종 파일 크기는 다운로드를 시도 할 때마다 다양했습니다.

AWS JS SDK 예제를 기반으로 다른 방법을 사용했습니다 .

var s3 = new AWS.S3();
var params = {Bucket: 'myBucket', Key: 'myImageFile.jpg'};
var file = require('fs').createWriteStream('/path/to/file.jpg');

s3.getObject(params).
    on('httpData', function(chunk) { file.write(chunk); }).
    on('httpDone', function() { file.end(); }).
    send();

이렇게하면 마치 매력처럼 작동했습니다.

다음은 s3에서 json 데이터를 검색하고 구문 분석하는 데 사용한 예입니다.

    var params = {Bucket: BUCKET_NAME, Key: KEY_NAME};
    new AWS.S3().getObject(params, function(err, json_data)
    {
      if (!err) {
        var json = JSON.parse(new Buffer(json_data.Body).toString("utf8"));

       // PROCESS JSON DATA
           ......
     }
   });

S3 매우 큰 파일에서 다운로드 할 때 똑같은 문제가 발생했습니다.

AWS 문서의 예제 솔루션은 작동하지 않습니다.

var file = fs.createWriteStream(options.filePath);
        file.on('close', function(){
            if(self.logger) self.logger.info("S3Dataset file download saved to %s", options.filePath );
            return callback(null,done);
        });
        s3.getObject({ Key:  documentKey }).createReadStream().on('error', function(err) {
            if(self.logger) self.logger.error("S3Dataset download error key:%s error:%@", options.fileName, error);
            return callback(error);
        }).pipe(file);

이 솔루션이 작동하는 동안 :

    var file = fs.createWriteStream(options.filePath);
    s3.getObject({ Bucket: this._options.s3.Bucket, Key: documentKey })
    .on('error', function(err) {
        if(self.logger) self.logger.error("S3Dataset download error key:%s error:%@", options.fileName, error);
        return callback(error);
    })
    .on('httpData', function(chunk) { file.write(chunk); })
    .on('httpDone', function() { 
        file.end(); 
        if(self.logger) self.logger.info("S3Dataset file download saved to %s", options.filePath );
        return callback(null,done);
    })
    .send();

createReadStream시도는 단지를 발생하지 않습니다 end, close또는 error어떤 이유로 콜백. 이것에 대해 여기 를 참조 하십시오 .

I'm using that solution also for writing down archives to gzip, since the first one (AWS example) does not work in this case either:

        var gunzip = zlib.createGunzip();
        var file = fs.createWriteStream( options.filePath );

        s3.getObject({ Bucket: this._options.s3.Bucket, Key: documentKey })
        .on('error', function (error) {
            if(self.logger) self.logger.error("%@",error);
            return callback(error);
        })
        .on('httpData', function (chunk) {
            file.write(chunk);
        })
        .on('httpDone', function () {

            file.end();

            if(self.logger) self.logger.info("downloadArchive downloaded %s", options.filePath);

            fs.createReadStream( options.filePath )
            .on('error', (error) => {
                return callback(error);
            })
            .on('end', () => {
                if(self.logger) self.logger.info("downloadArchive unarchived %s", options.fileDest);
                return callback(null, options.fileDest);
            })
            .pipe(gunzip)
            .pipe(fs.createWriteStream(options.fileDest))
        })
        .send();

If you want to save memory and want to obtain each row as a json object, then you can use fast-csv to create readstream and can read each row as a json object as follows:

const csv = require('fast-csv');
const AWS = require('aws-sdk');

const credentials = new AWS.Credentials("ACCESSKEY", "SECRETEKEY", "SESSIONTOKEN");
AWS.config.update({
    credentials: credentials, // credentials required for local execution
    region: 'your_region'
});
const dynamoS3Bucket = new AWS.S3();
const stream = dynamoS3Bucket.getObject({ Bucket: 'your_bucket', Key: 'example.csv' }).createReadStream();

var parser = csv.fromStream(stream, { headers: true }).on("data", function (data) {
    parser.pause();  //can pause reading using this at a particular row
    parser.resume(); // to continue reading
    console.log(data);
}).on("end", function () {
    console.log('process finished');
});

I prefer Buffer.from(data.Body).toString('utf8'). It supports encoding parameters. With other AWS services (ex. Kinesis Streams) someone may want to replace 'utf8' encoding with 'base64'.

new AWS.S3().getObject(
  { Bucket: this.awsBucketName, Key: keyName }, 
  function(err, data) {
    if (!err) {
      const body = Buffer.from(data.Body).toString('utf8');
      console.log(body);
    }
  }
);

참고URL : https://stackoverflow.com/questions/27299139/read-file-from-aws-s3-bucket-using-node-fs

'programing' 카테고리의 다른 글

Pandas DataFrame 헤더에서 공백을 제거하려면 어떻게해야합니까? (0)	2020.11.23
Font Awesome 아이콘을 'fa-5x'보다 크게 만들 수 있습니까? (0)	2020.11.23
ValidationError :“expiresInMinutes”는 NodeJs JsonWebToken이 허용되지 않습니다. (0)	2020.11.23
jsx가 작동하지 않습니다. (0)	2020.11.23
Android ViewModel 추가 인수 (0)	2020.11.23

현재글노드 fs를 사용하여 aws s3 버킷에서 파일 읽기

nasanasa

노드 fs를 사용하여 aws s3 버킷에서 파일 읽기

노드 fs를 사용하여 aws s3 버킷에서 파일 읽기

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

노드 fs를 사용하여 aws s3 버킷에서 파일 읽기

노드 fs를 사용하여 aws s3 버킷에서 파일 읽기

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바