부호없는 바이트에 대한 포화 빼기 / 더하기
두 개의 unsigned 바이트 b
와 x
. bsub
로 b - x
및 badd
로 계산해야합니다 b + x
. 그러나 이러한 작업 중에 언더 플로 / 오버플로가 발생하는 것을 원하지 않습니다. 예 (의사 코드) :
b = 3; x = 5;
bsub = b - x; // bsub must be 0, not 254
과
b = 250; x = 10;
badd = b + x; // badd must be 255, not 4
이를 수행하는 명백한 방법에는 분기가 포함됩니다.
bsub = b - min(b, x);
badd = b + min(255 - b, x);
이 작업을 수행하는 더 좋은 방법이 있는지 궁금합니다. 즉, 약간의 해키 비트 조작으로?
Branchfree Saturating Arithmetic 기사는 이에 대한 전략을 제공합니다.
추가 솔루션은 다음과 같습니다.
u32b sat_addu32b(u32b x, u32b y)
{
u32b res = x + y;
res |= -(res < x);
return res;
}
uint8_t에 대해 수정 됨 :
uint8_t sat_addu8b(uint8_t x, uint8_t y)
{
uint8_t res = x + y;
res |= -(res < x);
return res;
}
뺄셈 솔루션은 다음과 같습니다.
u32b sat_subu32b(u32b x, u32b y)
{
u32b res = x - y;
res &= -(res <= x);
return res;
}
uint8_t에 대해 수정 됨 :
uint8_t sat_subu8b(uint8_t x, uint8_t y)
{
uint8_t res = x - y;
res &= -(res <= x);
return res;
}
간단한 방법은 오버플로를 감지하고 그에 따라 값을 재설정하는 것입니다.
bsub = b - x;
if (bsub > b)
{
bsub = 0;
}
badd = b + x;
if (badd < b)
{
badd = 255;
}
GCC는 -O2로 컴파일 할 때 오버 플로우 검사를 조건부 할당으로 최적화 할 수 있습니다.
I measured how much optimization comparing with other solutions. With 1000000000+ operations on my PC, this solution and that of @ShafikYaghmour averaged 4.2 seconds, and that of @chux averaged 4.8 seconds. This solution is more readable as well.
For subtraction:
diff = (a - b)*(a >= b);
Addition:
sum = (a + b) | -(a > (255 - b))
Evolution
// sum = (a + b)*(a <= (255-b)); this fails
// sum = (a + b) | -(a <= (255 - b)) falis too
Thanks to @R_Kapp
Thanks to @NathanOliver
This exercise shows the value of simply coding.
sum = b + min(255 - b, a);
If you are using a recent enough version of gcc or clang (maybe also some others) you could use built-ins to detect overflow.
if (__builtin_add_overflow(a,b,&c))
{
c = UINT_MAX;
}
For addition:
unsigned temp = a+b; // temp>>8 will be 1 if overflow else 0
unsigned char c = temp | -(temp >> 8);
For subtraction:
unsigned temp = a-b; // temp>>8 will be 0xFF if neg-overflow else 0
unsigned char c = temp & ~(temp >> 8);
No comparison operators or multiplies required.
If you are willing to use assembly or intrinsics, I think I have an optimal solution.
For subtraction:
We can use the sbb
instruction
In MSVC we can use the intrinsic function _subborrow_u64 (also available in other bit sizes).
Here is how it is used:
// *c = a - (b + borrow)
// borrow_flag is set to 1 if (a < (b + borrow))
borrow_flag = _subborrow_u64(borrow_flag, a, b, c);
Here is how we could apply it to your situation
uint64_t sub_no_underflow(uint64_t a, uint64_t b){
uint64_t result;
borrow_flag = _subborrow_u64(0, a, b, &result);
return result * !borrow_flag;
}
For addition:
We can use the adcx
instruction
In MSVC we can use the intrinsic function _addcarry_u64 (also available in other bit sizes).
Here is how it is used:
// *c = a + b + carry
// carry_flag is set to 1 if there is a carry bit
carry_flag = _addcarry_u64(carry_flag, a, b, c);
Here is how we could apply it to your situation
uint64_t add_no_overflow(uint64_t a, uint64_t b){
uint64_t result;
carry_flag = _addcarry_u64(0, a, b, &result);
return !carry_flag * result - carry_flag;
}
I don't like this one as much as the subtraction one, but I think it is pretty nifty.
If the add overflows, carry_flag = 1
. Not-ing carry_flag
yields 0, so !carry_flag * result = 0
when there is overflow. And since 0 - 1
will set the unsigned integral value to its max, the function will return the result of the addition if there is no carry and return the max of the chosen integral value if there is carry.
what about this:
bsum = a + b;
bsum = (bsum < a || bsum < b) ? 255 : bsum;
bsub = a - b;
bsub = (bsub > a || bsub > b) ? 0 : bsub;
All can be done in unsigned byte arithmetic
// Addition without overflow
return (b > 255 - a) ? 255 : a + b
// Subtraction without underflow
return (b > a) ? 0 : a - b;
If you want to do this with two bytes, use the simplest code possible.
If you want to do this with twenty billion bytes, check what vector instructions are available on your processor and whether they can be used. You might find that your processor can do 32 of these operations with a single instruction.
You could also use the safe numerics library at Boost Library Incubator. It provides drop-in replacements for int, long, etc... which guarantee that you'll never get an undetected overflow, underflow, etc.
If you will call those methods a lot, the fastest way would be not bit manipulation but probably a look-up table. Define an array of length 511 for each operation. Example for minus (subtraction)
static unsigned char maxTable[511];
memset(maxTable, 0, 255); // If smaller, emulates cutoff at zero
maxTable[255]=0; // If equal - return zero
for (int i=0; i<256; i++)
maxTable[255+i] = i; // If greater - return the difference
The array is static and initialized only once. Now your subtraction can be defined as inline method or using pre-compiler:
#define MINUS(A,B) maxTable[A-B+255];
How it works? Well you want to pre-calculate all possible subtractions for unsigned chars. The results vary from -255 to +255, total of 511 different result. We define an array of all possible results but because in C we cannot access it from negative indices we use +255 (in [A-B+255]). You can remove this action by defining a pointer to the center of the array.
const unsigned char *result = maxTable+255;
#define MINUS(A,B) result[A-B];
use it like:
bsub = MINUS(13,15); // i.e 13-15 with zero cutoff as requested
Note that the execution is extremely fast. Only one subtraction and one pointer deference to get the result. No branching. The static arrays are very short so they will be fully loaded into CPU's cache to further speed up the calculation
Same would work for addition but with a bit different table (first 256 elements will be the indices and last 255 elements will be equal to 255 to emulate the cutoff beyond 255.
If you insist on bits operation, the answers that use (a>b) are wrong. This still might be implemented as branching. Use the sign-bit technique
// (num1>num2) ? 1 : 0
#define is_int_biggerNotEqual( num1,num2) ((((__int32)((num2)-(num1)))&0x80000000)>>31)
Now you can use it for calculation of subtraction and addition.
If you want to emulate the functions max(), min() without branching use:
inline __int32 MIN_INT(__int32 x, __int32 y){ __int32 d=x-y; return y+(d&(d>>31)); }
inline __int32 MAX_INT(__int32 x, __int32 y){ __int32 d=x-y; return x-(d&(d>>31)); }
My examples above use 32 bits integers. You can change it to 64, though I believe that 32 bits calculations run a bit faster. Up to you
참고URL : https://stackoverflow.com/questions/33481295/saturating-subtract-add-for-unsigned-bytes
'programing' 카테고리의 다른 글
bash 스크립트를 실행하는 도커 진입 점이 "권한이 거부되었습니다" (0) | 2020.09.24 |
---|---|
자바 마우스 이벤트 오른쪽 클릭 (0) | 2020.09.24 |
react-router 2.0.0-rc5에서 현재 경로를 얻는 방법 (0) | 2020.09.24 |
take (1) 대 first () (0) | 2020.09.24 |
Visual Studio 2017은 빌드 및 디버깅 중에 너무 느립니다. (0) | 2020.09.24 |