데이터베이스 레코드의 물리적 대 논리적 / 소프트 삭제?

programing

데이터베이스 레코드의 물리적 대 논리적 / 소프트 삭제?

nasanasas 2020. 8. 13. 23:27

데이터베이스 레코드의 물리적 대 논리적 / 소프트 삭제?

레코드를 실제로 또는 물리적으로 삭제하는 것과 반대로 레코드의 논리적 / 소프트 삭제 (즉, 레코드가 삭제되었음을 나타내는 플래그 설정)를 수행하면 어떤 이점이 있습니까?

이것이 일반적인 관행입니까?

안전한가요?

장점은 기록을 유지하고 (감사에 적합), 삭제하려는 행을 참조하는 데이터베이스의 다양한 다른 테이블을 통해 계단식으로 삭제하는 것에 대해 걱정할 필요가 없다는 것입니다. 단점은 플래그를 고려하기 위해보고 / 표시 방법을 코딩해야한다는 것입니다.

그것이 일반적인 관행 인 한-나는 예라고 말하고 싶지만, 사용 여부는 비즈니스 요구에 달려 있습니다.

편집 : 또 다른 단점에 대한 생각-테이블에 고유 인덱스가있는 경우 삭제 된 레코드는 여전히 "하나"레코드를 차지하므로 그 가능성을 중심으로 코딩해야합니다 (예 : 고유 인덱스가있는 사용자 테이블). 사용자 이름; 삭제 된 레코드는 새 레코드에 대한 삭제 된 사용자 사용자 이름을 계속 차단합니다.이 문제를 해결하려면 삭제 된 사용자 이름 열에 GUID를 추가 할 수 있지만 권장하지 않는 매우 엉뚱한 해결 방법입니다. 아마도 그런 상황에서는 그렇게 될 것입니다. 사용자 이름이 사용되면 절대로 바꿀 수 없다는 규칙을 갖는 것이 좋습니다.)

논리적 삭제가 일반적인 관행입니까? 예, 여러 곳에서 이것을 보았습니다. 안전한가요? 그것은 정말로 당신이 그것을 삭제하기 전의 데이터보다 덜 안전합니까?

제가 기술 책임자 였을 때 저는 우리 팀이 모든 데이터를 보관할 것을 요구했습니다. 당시에는 요구 사항이 무엇인지 몰랐지만 그 모든 데이터를 사용하여 다양한 BI 애플리케이션을 구축 할 것이라는 것을 당시에는 알고있었습니다. 있다. 감사, 문제 해결 및보고의 관점에서는 좋았지 만 (B2B 거래를위한 전자 상거래 / 도구 사이트였으며 누군가 도구를 사용했다면 나중에 계정이 꺼져도 기록하고 싶었습니다), 몇 가지 단점이있었습니다.

단점은 다음과 같습니다 (이미 언급 된 다른 항목 제외).

모든 데이터를 보관하는 성능 시사점, 다양한 아카이빙 전략을 개발합니다. 예를 들어 애플리케이션의 한 영역은 일주일에 약 1Gb의 데이터를 생성하는 데 가까워졌습니다.
데이터 유지 비용은 시간이 지남에 따라 증가하지만 디스크 공간은 저렴하지만 온라인과 오프라인 모두에서 테라 바이트 단위의 데이터를 유지하고 관리하기위한 인프라의 양은 많습니다. 중복성을 위해 많은 디스크가 필요하고 백업이 신속하게 이동하는지 확인하는 데 사람들의 시간이 걸립니다.

논리적, 물리적 삭제 또는 보관을 사용하기로 결정할 때 다음과 같은 질문을합니다.

테이블에 다시 삽입해야 할 수있는 데이터입니다. 예를 들어 사용자 계정을 활성화하거나 비활성화 할 수 있으므로 사용자 계정이이 범주에 해당합니다. 이 경우 논리적 삭제가 가장 적합합니다.
데이터 저장에 내재적 가치가 있습니까? 그렇다면 얼마나 많은 데이터가 생성 될 것입니다. 이것에 따라 논리적 삭제를 수행하거나 보관 전략을 구현합니다. 논리적으로 삭제 된 레코드는 항상 보관할 수 있습니다.

조금 늦을 수도 있지만 모든 사람에게 논리적 / 소프트 삭제에 대한 Pinal Dave의 블로그 게시물 을 확인하는 것이 좋습니다 .

나는 이런 종류의 디자인 [소프트 삭제]이 전혀 마음에 들지 않습니다. 저는 필요한 데이터 만 단일 테이블에 있어야하고 쓸모없는 데이터는 보관 된 테이블로 이동해야하는 아키텍처를 굳게 믿습니다. isDeleted 열을 따르는 대신 두 개의 다른 테이블을 사용하는 것이 좋습니다. 하나는 주문이 있고 다른 하나는 삭제 된 주문입니다. 이 경우 두 테이블을 모두 유지해야하지만 실제로 유지 관리가 매우 쉽습니다. isDeleted 열에 UPDATE 문을 쓸 때 INSERT INTO를 다른 테이블에 쓰고 원래 테이블에서 DELETE하십시오. 상황이 롤백 인 경우 다른 INSERT INTO 및 DELETE를 역순으로 작성하십시오. 실패한 트랜잭션이 걱정된다면이 코드를 TRANSACTION으로 감싼다.

위에서 설명한 상황에서 작은 테이블 대 큰 테이블의 장점은 무엇입니까?

더 작은 테이블은 유지하기 쉽습니다.

인덱스 다시 작성 작업이 훨씬 빠릅니다.

아카이브 데이터를 다른 파일 그룹으로 이동하면 기본 파일 그룹의로드가 감소합니다 (모든 파일 그룹이 다른 시스템에 있다는 점을 고려). 이렇게하면 백업 속도도 빨라집니다.

더 작은 크기로 인해 통계가 자주 업데이트되며 이는 리소스 집약적입니다.

인덱스의 크기가 작아집니다.

테이블 크기가 작을수록 테이블 성능이 향상됩니다.

저는 NoSQL 개발자이고, 마지막 직장에서 누군가에게 항상 중요한 데이터로 작업했으며, 생성 된 당일에 실수로 삭제 된 경우 마지막 백업에서 찾을 수 없었습니다. 어제! 그런 상황에서 소프트 삭제는 항상 그날을 저장했습니다.

타임 스탬프를 사용하여 소프트 삭제를 수행하고 문서가 삭제 된 날짜를 등록했습니다.

IsDeleted = 20150310  //yyyyMMdd

매주 일요일에는 프로세스가 데이터베이스를 탐색하고 IsDeleted현장을 확인했습니다 . 현재 날짜와 타임 스탬프의 차이가 N 일보다 크면 문서가 영구 삭제 된 것입니다. 일부 백업에서 문서를 계속 사용할 수 있다는 점을 고려하면 안전하게 수행 할 수 있습니다.

편집 : 이 NoSQL 사용 사례는 데이터베이스에서 생성되는 대용량 문서에 대한 것입니다. 매일 수십 또는 수백 개가 생성되지만 수천 또는 수백만 개는 아닙니다. 일반적으로 워크 플로 프로세스의 상태, 데이터 및 첨부 파일이있는 문서였습니다. 이것이 사용자가 중요한 문서를 삭제할 가능성이있는 이유입니다. 이 사용자는 관리자 권한이있는 사람이거나 문서의 소유자 일 수 있습니다.

TL; DR 내 사용 사례는 빅 데이터가 아닙니다. 이 경우 다른 접근 방식이 필요합니다.

내가 사용한 한 가지 패턴은 미러 테이블을 만들고 기본 테이블에 트리거를 연결하는 것이므로 모든 삭제 (및 원하는 경우 업데이트)가 미러 테이블에 기록됩니다.

이를 통해 삭제 / 변경된 레코드를 "재구성"할 수 있으며 기본 테이블에서 하드 삭제하고 "정리"상태로 유지할 수 있습니다. 또한 "실행 취소"기능을 생성 할 수 있으며 날짜, 시간을 기록 할 수도 있습니다. , 미러 테이블에서 작업을 수행 한 사용자 (마녀 사냥 상황에서 매우 유용함).

다른 장점은 미러 테이블의 레코드를 포함하는 문제를 고의로 수행하지 않는 한 기본에서 쿼리 할 때 실수로 삭제 된 레코드를 포함 할 가능성이 없다는 것입니다 (라이브 및 삭제 된 레코드를 표시 할 수 있음).

또 다른 장점은 미러 테이블을 독립적으로 제거 할 수 있다는 것입니다. 실제 외래 키 참조가 없어야하기 때문에 소프트 삭제를 사용하지만 여전히 다른 테이블에 대한 참조 연결이있는 기본 테이블에서 제거하는 것과 비교할 때 비교적 간단한 작업입니다.

다른 장점은 무엇입니까? -프로젝트를 진행하는 코더가 많고, 기술과 세부 수준에 대한주의가 혼합 된 데이터베이스에서 읽기를 수행하는 경우, 그중 한 명이 삭제 된 것을 포함하지 않기를 바라며 밤을 지날 필요가 없습니다. 기록 (웃음, 삭제 된 기록을 포함하지 않음 = True). 고객이 사용 가능한 현금 포지션을 과장해서 말한 다음 (즉, 거래 시스템에서와 같이) 주식을 사러가는 것과 같은 결과가 발생합니다. 초기 "오버 헤드"가 약간 더 많더라도 강력한 솔루션의 가치를 매우 빠르게 찾을 수 있습니다.

예외 :
-가이드로 사용자, 카테고리 등과 같은 "참조"데이터에 대해서는 소프트 삭제를 사용하고 "사실"유형 데이터 (예 : 트랜잭션 내역)에 대해서는 미러 테이블에 하드 삭제를 사용하십시오.

저는 일반적으로 논리적 삭제를 사용합니다. '삭제 된'데이터를 보관 된 테이블 (필요한 경우 검색 할 수 있음)에 간헐적으로 보관하여 애플리케이션 성능에 영향을 미치지 않을 때 잘 작동합니다.

감사를받는 경우에도 데이터가 남아 있기 때문에 잘 작동합니다. 물리적으로 삭제하면 사라집니다 !

I'm a big fan of the logical delete, especially for a Line of Business application, or in the context of user accounts. My reasons are simple: often times I don't want a user to be able to use the system anymore (so the account get's marked as deleted), but if we deleted the user, we'd lose all their work and such.

Another common scenario is that the users might get re-created a while after having been delete. It's a much nicer experience for the user to have all their data present as it was before they were deleted, rather than have to re-create it.

I usually think of deleting users more as "suspending" them indefinitely. You never know when they'll legitimately need to be back.

I almost always soft delete and here's why:

you can restore deleted data if a customer asks you to do so. More happy customers with soft deletes. Restoring specific data from backups is complex
checking for isdeleted everywhere is not an issue, you have to check for userid anyway (if the database contains data from multiple users). You can enforce the check by code, by placing those two checks on a separate function (or use views)
graceful delete. Users or processes dealing with deleted content will continue to "see" it until they hit the next refresh. This is a very desirable feature if a process is processing some data which is suddenly deleted
synchronization: if you need to design a synchronization mechanism between a database and mobile apps, you'll find soft deletes much easier to implement

Re: "Is this secure?" - that depends on what you mean.

If you mean that by doing physical delete, you'll prevent anyone from ever finding the deleted data, then yes, that's more or less true; you're safer in physically deleting the sensitive data that needs to be erased, because that means it's permanently gone from the database. (However, realize that there may be other copies of the data in question, such as in a backup, or the transaction log, or a recorded version from in transit, e.g. a packet sniffer - just because you delete from your database doesn't guarantee it wasn't saved somewhere else.)

If you mean that by doing logical delete, your data is more secure because you'll never lose any data, that's also true. This is good for audit scenarios; I tend to design this way because it admits the basic fact that once data is generated, it'll never really go away (especially if it ever had the capability of being, say, cached by an internet search engine). Of course, a real audit scenario requires that not only are deletes logical, but that updates are also logged, along with the time of the change and the actor who made the change.

If you mean that the data won't fall into the hands of anyone who isn't supposed to see it, then that's totally up to your application and its security structure. In that respect, logical delete is no more or less secure than anything else in your database.

Logical deletions if are hard on referential integrity.

It is the right think to do when there is a temporal aspect of the table data (are valid FROM_DATE - TO_DATE).

Otherwise move the data to an Auditing Table and delete the record.

On the plus side:

It is the easier way to rollback (if at all possible).

It is easy to see what was the state at a specific point in time.

It's fairly standard in cases where you'd like to keep a history of something (e.g. user accounts as @Jon Dewees mentions). And it's certainly a great idea if there's a strong chance of users asking for un-deletions.

If you're concerned about the logic of filtering out the deleted records from your queries getting messy and just complicating your queries, you can just build views that do the filtering for you and use queries against that. It'll prevent leakage of these records in reporting solutions and such.

I strongly disagree with logical delete because you are exposed to many errors.

First of all queries, each query must take care the IsDeleted field and the possibility of error becomes higher with complex queries.

Second the performance: imagine a table with 100000 recs with only 3 active, now multiply this number for the tables of your database; another performance problem is a possible conflict with new records with old (deleted records).

The only advantage I see is the history of records, but there are other methods to achieve this result, for example you can create a logging table where you can save info: TableName,OldValues,NewValues,Date,User,[..] where *Values can be varchar and write the details in this form fieldname : value; [..] or store the info as xml.

All this can be achieved via code or Triggers but you are only ONE table with all your history. Another options is to see if the specified database engine are native support for tracking change, for example on SQL Server database there are SQL Track Data Change.

There are requirements beyond system design which need to be answered. What is the legal or statutory requirement in the record retention? Depending on what the rows are related to, there may be a legal requirement that the data be kept for a certain period of time after it is 'suspended'.

On the other hand, the requirement may be that once the record is 'deleted', it is truly and irrevocably deleted. Before you make a decision, talk to your stakeholders.

Mobile apps that depend on synchronisation might impose the use of logical rather than physical delete: a server must be able to indicate to the client that a record has been (marked as) deleted, and this might not be possible if records were physically deleted.

I used to do soft-delete, just to keep old records. I realized that users don't bother to view old records as often as I thought. If users want to view old records, they can just view from archive or audit table, right? So, what's the advantage of soft-delete? It only leads to more complex query statement, etc.

Following are the things i've implemented, before I decided to not-soft-delete anymore:

implement audit, to record all activities (add,edit,delete). Ensure that there's no foreign key linked to audit, and ensure this table is secured and nobody can delete except administrators.
identify which tables are considered "transactional table", which very likely that it will be kept for long time, and very likely user may want to view the past records or reports. For example; purchase transaction. This table should not just keep the id of master table (such as dept-id), but also keep the additional info such as the name as reference (such as dept-name), or any other necessary fields for reporting.
Implement "active/inactive" or "enable/disable" or "hide/show" record of master table. So, instead of deleting record, the user can disable/inactive the master record. It is much safer this way.

Just my two cents opinion.

They don't let the database perform as it should rendering such things as the cascade functionality useless.

For simple things such as inserts, in the case of re-inserting, then the code behind it doubles.

You can't just simply insert, instead you have to check for an existence and insert if it doesn't exist before or update the deletion flag if it does whilst also updating all other columns to the new values. This is seen as an update to the database transaction log and not a fresh insert causing inaccurate audit logs.

They cause performance issues because tables are getting glogged with redundant data. It plays havock with indexing especially with uniqueness.

I'm not a big fan of logical deletes.

To reply to Tohid's comment, we faced same problem where we wanted to persist history of records and also we were not sure whether we wanted is_deleted column or not.

I am talking about our python implementation and a similar use-case we hit.

We encountered https://github.com/kvesteri/sqlalchemy-continuum which is an easy way to get versioning table for your corresponding table. Minimum lines of code and captures history for add, delete and update.

This serves more than just is_deleted column. You can always backref version table to check what happened with this entry. Whether entry got deleted, updated or added.

This way we didn't need to have is_deleted column at all and our delete function was pretty trivial. This way we also don't need to remember to mark is_deleted=False in any of our api's.

Soft Delete is a programming practice that being followed in most of the application when data is more relevant. Consider a case of financial application where a delete by the mistake of the end user can be fatal. That is the case when soft delete becomes relevant. In soft delete the user is not actually deleting the data from the record instead its being flagged as IsDeleted to true (By normal convention).

In EF 6.x or EF 7 onward Softdelete is Added as an attribute but we have to create a custom attribute for the time being now.

I strongly recommend SoftDelete In a database design and its a good convention for the programming practice.

Most of time softdeleting is used because you don't want to expose some data but you have to keep it for historical reasons (A product could become discontinued, so you don't want any new transaction with it but you still need to work with the history of sale transaction). By the way, some are copying the product information value in the sale transaction data instead of making a reference to the product to handle this.

In fact it looks more like a rewording for a visible/hidden or active/inactive feature. Because that's the meaning of "delete" in business world. I'd like to say that Terminators may delete people but boss just fire them.

This practice is pretty common pattern and used by a lot of application for a lot of reasons. As It's not the only way to achieve this, so you will have thousand of people saying that's great or bullshit and both have pretty good arguments.

From a point of view of security, SoftDelete won't replace the job of Audit and it won't replace the job of backup too. If you are afraid of "the insert/delete between two backup case", you should read about Full or Bulk recovery Models. I admit that SoftDelete could make the recovery process more trivial.

Up to you to know your requirement.

To give an alternative, we have users using remote devices updating via MobiLink. If we delete records in the server database, those records never get marked deleted in the client databases.

So we do both. We work with our clients to determine how long they wish to be able to recover data. For example, generally customers and products are active until our client say they should be deleted, but history of sales is only retained for 13 months and then deletes automatically. The client may want to keep deleted customers and products for two months but retain history for six months.

So we run a script overnight that marks things logically deleted according to these parameters and then two/six months later, anything marked logically deleted today will be hard deleted.

We're less about data security than about having enormous databases on a client device with limited memory, such as a smartphone. A client who orders 200 products twice a week for four years will have over 81,000 lines of history, of which 75% the client doesn't care if he sees.

It all depends on the use case of the system and its data.

For example, if you are talking about a government regulated system (e.g. a system at a pharmaceutical company that is considered a part of the quality system and must follow FDA guidelines for electronic records), then you darned well better not do hard deletes! An auditor from the FDA can come in and ask for all records in the system relating to product number ABC-123, and all data better be available. If your business process owner says the system shouldn't allow anyone to use product number ABC-123 on new records going forward, use the soft-delete method instead to make it "inactive" within the system, while still preserving historical data.

However, maybe your system and its data has a use case such as "tracking the weather at the North Pole". Maybe you take temperature readings once every hour, and at the end of the day aggregate a daily average. Maybe the hourly data will no longer ever be used after aggregation, and you'd hard-delete the hourly readings after creating the aggregate. (This is a made-up, trivial example.)

The point is, it all depends on the use case of the system and its data, and not a decision to be made purely from a technological standpoint.

Well! As everyone said, it depends on the situation.

If you have an index on a column like UserName or EmailID - and you never expect the same UserName or EmailID to be used again; you can go with a soft delete.

That said, always check if your SELECT operation uses the primary key. If your SELECT statement uses a primary key, adding a flag with the WHERE clause wouldn't make much difference. Let's take an example (Pseudo):

Table Users (UserID [primary key], EmailID, IsDeleted)

SELECT * FROM Users where UserID = 123456 and IsDeleted = 0

This query won't make any difference in terms of performance since the UserID column has a primary key. Initially, it will scan the table based on PK and then execute the next condition.

Cases where soft deletes cannot work at all:

Sign-up in majorly all websites take EmailID as your unique identification. We know very well, once an EmailID is used on a website like facebook, G+, it cannot be used by anyone else.

There comes a day when the user wants to delete his/her profile from the website. Now, if you make a logical delete, that user won't be able to register ever again. Also, registering again using the same EmailID wouldn't mean to restore the entire history. Everyone knows, deletion means deletion. In such scenarios, we have to make a physical delete. But in order to maintain the entire history of the account, we should always archive such records in either archive tables or deleted tables.

Yes, in situations where we have lots of foreign tables, handling is quite cumbersome.

Also keep in mind that soft/logical deletes will increase your table size, so the index size.

I have already answered in another post. However, I think my answer more fit to the question here.

My practical solution for soft-delete is archiving by creating a new table with following columns: original_id, table_name, payload, (and an optional primary key `id).

Where original_id is the original id of deleted record, table_name is the table name of the deleted record ("user" in your case), payload is JSON-stringified string from all columns of the deleted record.

I also suggest making an index on the column original_id for latter data retrievement.

By this way of archiving data. You will have these advantages

Keep track of all data in history

Have only one place to archive records from any table, regardless of the deleted record's table structure

No worry of unique index in the original table

No worry of checking foreign index in the original table

No more WHERE clause in every query to check for deletion

The is already a discussion here explaining why soft-deletion is not a good idea in practice. Soft-delete introduces some potential troubles in future such as counting records, ...

참고URL : https://stackoverflow.com/questions/378331/physical-vs-logical-soft-delete-of-database-record

'programing' 카테고리의 다른 글

특정 디렉토리에서 어떻게 git diff를 수행합니까? (0)	2020.08.13
MongoDB에서 인덱스 방향이 중요한 이유는 무엇입니까? (0)	2020.08.13
C #에서 작은 코드 샘플을 벤치마킹하면이 구현을 개선 할 수 있습니까? (0)	2020.08.13
Kotlin : 함수를 매개 변수로 다른 함수에 전달하는 방법은 무엇입니까? (0)	2020.08.13
Tkinter에서 창 닫기 이벤트를 어떻게 처리합니까? (0)	2020.08.13

현재글데이터베이스 레코드의 물리적 대 논리적 / 소프트 삭제?

nasanasa

데이터베이스 레코드의 물리적 대 논리적 / 소프트 삭제?

데이터베이스 레코드의 물리적 대 논리적 / 소프트 삭제?

'programing' 카테고리의 다른 글

'programing'의 다른글

티스토리툴바

데이터베이스 레코드의 물리적 대 논리적 / 소프트 삭제?

데이터베이스 레코드의 물리적 대 논리적 / 소프트 삭제?

'programing' 카테고리의 다른 글

'programing'의 다른글

관련글

티스토리툴바