programing

nullable 값이있는 구조체의 HashSet이 엄청나게 느린 이유는 무엇입니까?

nasanasas 2020. 11. 5. 08:10
반응형

nullable 값이있는 구조체의 HashSet이 엄청나게 느린 이유는 무엇입니까?


성능 저하를 조사하고 HashSets를 늦추기 위해 추적했습니다.
기본 키로 사용되는 nullable 값이있는 구조체가 있습니다. 예를 들면 :

public struct NullableLongWrapper
{
    private readonly long? _value;

    public NullableLongWrapper(long? value)
    {
        _value = value;
    }
}

를 만드는 HashSet<NullableLongWrapper>것이 유난히 느립니다.

다음은 BenchmarkDotNet을 사용한 예입니다 . ( Install-Package BenchmarkDotNet)

using System.Collections.Generic;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Jobs;
using BenchmarkDotNet.Running;

public class Program
{
    static void Main()
    {
        BenchmarkRunner.Run<HashSets>();
    }
}

public class Config : ManualConfig
{
    public Config()
    {
        Add(Job.Dry.WithWarmupCount(1).WithLaunchCount(3).WithTargetCount(20));
    }
}

public struct NullableLongWrapper
{
    private readonly long? _value;

    public NullableLongWrapper(long? value)
    {
        _value = value;
    }

    public long? Value => _value;
}

public struct LongWrapper
{
    private readonly long _value;

    public LongWrapper(long value)
    {
        _value = value;
    }

    public long Value => _value;
}

[Config(typeof (Config))]
public class HashSets
{
    private const int ListSize = 1000;

    private readonly List<long?> _nullables;
    private readonly List<long> _longs;
    private readonly List<NullableLongWrapper> _nullableWrappers;
    private readonly List<LongWrapper> _wrappers;

    public HashSets()
    {
        _nullables = Enumerable.Range(1, ListSize).Select(i => (long?) i).ToList();
        _longs = Enumerable.Range(1, ListSize).Select(i => (long) i).ToList();
        _nullableWrappers = Enumerable.Range(1, ListSize).Select(i => new NullableLongWrapper(i)).ToList();
        _wrappers = Enumerable.Range(1, ListSize).Select(i => new LongWrapper(i)).ToList();
    }

    [Benchmark]
    public void Longs() => new HashSet<long>(_longs);

    [Benchmark]
    public void NullableLongs() => new HashSet<long?>(_nullables);

    [Benchmark(Baseline = true)]
    public void Wrappers() => new HashSet<LongWrapper>(_wrappers);

    [Benchmark]
    public void NullableWrappers() => new HashSet<NullableLongWrapper>(_nullableWrappers);
}

결과:

           방법 | 중앙값 | 축척
----------------- | ---------------- | ---------
            롱스 | 22.8682 미국 | 0.42
    NullableLongs | 39.0337 미국 | 0.62
         포장기 | 62.8877 미국 | 1.00
 NullableWrappers | 231,993.7278 미국 | 3,540.34

a가있는 구조체에 Nullable<long>비해 a 구조체를 사용하면 long3540 배 느립니다!
제 경우에는 800ms와 <1ms 사이의 차이를 만들었습니다.

BenchmarkDotNet의 환경 정보는 다음과 같습니다.

OS = Microsoft Windows NT 6.1.7601 서비스 팩 1
프로세서 = Intel (R) Core (TM) i7-5600U CPU 2.60GHz, ProcessorCount = 4
주파수 = 2536269 틱, 해상도 = 394.2799ns, 타이머 = TSC
CLR = MS.NET 4.0 .30319.42000, Arch = 64 비트 릴리스 [RyuJIT]
GC = 동시 워크 스테이션
JitModules = clrjit-v4.6.1076.0

성능이 이렇게 좋지 않은 이유는 무엇입니까?


이는의 모든 요소 _nullableWrappers가에서 반환 한 동일한 해시 코드 가지기 때문에 발생 하며 GetHashCode(), 이로 인해 해싱이 O (1)이 아닌 O (N) 액세스로 퇴화됩니다.

모든 해시 코드를 인쇄하여이를 확인할 수 있습니다.

구조체를 이렇게 수정하면 :

public struct NullableLongWrapper
{
    private readonly long? _value;

    public NullableLongWrapper(long? value)
    {
        _value = value;
    }

    public override int GetHashCode()
    {
        return _value.GetHashCode();
    }

    public long? Value => _value;
}

훨씬 더 빠르게 작동합니다.

자, 명백한 질문은 왜 모든 NullableLongWrapper것이 같은 해시 코드인지입니다 .

이에 대한 답 은이 스레드에서 논의됩니다 . 그러나 Hans의 대답은 해시 코드를 계산할 때 선택할 필드가 두 개있는 구조체를 중심으로 돌아 가기 때문에 질문에 대한 대답이 아닙니다.하지만이 코드에서는 선택할 필드가 하나 뿐이며 값 유형입니다. (a struct).

그러나이 이야기의 교훈은 다음과 같습니다. 값 유형 에 대해 기본값 GetHashCode()의존하지 마십시오 !


추가

I thought that perhaps what was happening was related to Hans' answer in the thread I linked - maybe it was taking the value of the first field (the bool) in the Nullable<T> struct), and my experiments indicate that it may be related - but it's complicated:

Consider this code and its output:

using System;

public class Program
{
    static void Main()
    {
        var a = new Test {A = 0, B = 0};
        var b = new Test {A = 1, B = 0};
        var c = new Test {A = 0, B = 1};
        var d = new Test {A = 0, B = 2};
        var e = new Test {A = 0, B = 3};

        Console.WriteLine(a.GetHashCode());
        Console.WriteLine(b.GetHashCode());
        Console.WriteLine(c.GetHashCode());
        Console.WriteLine(d.GetHashCode());
        Console.WriteLine(e.GetHashCode());
    }
}

public struct Test
{
    public int A;
    public int B;
}

Output:

346948956
346948957
346948957
346948958
346948959

Note how the second and third hash codes (for 1/0 and 0/1) are the same, but the others are all different. I find this strange because clearly changing A changes the hash code, as does changing B, but given two values X and Y, the same hash code is generated for A=X, B=Y and A=Y, B=X.

(That sounds like some XOR stuff is happening behind the scenes, but that's guess.)

Incidentally, this behaviour where BOTH fields can be shown to contribute to the hash code proves that the comment in the reference source for ValueType.GetHashType() is inaccurate or wrong:

Action: Our algorithm for returning the hashcode is a little bit complex. We look for the first non-static field and get it's hashcode. If the type has no non-static fields, we return the hashcode of the type. We can't take the hashcode of a static member because if that member is of the same type as the original type, we'll end up in an infinite loop.

If that comment was true, then four of the five hash codes in the example above would be the same, since A has the same value, 0, for all those. (That assumes A is the first field, but you get the same results if you swap the values around: Both fields clearly contribute to the hash code.)

Then I tried changing the first field to be a bool:

using System;

public class Program
{
    static void Main()
    {
        var a = new Test {A = false, B = 0};
        var b = new Test {A = true,  B = 0};
        var c = new Test {A = false, B = 1};
        var d = new Test {A = false, B = 2};
        var e = new Test {A = false, B = 3};

        Console.WriteLine(a.GetHashCode());
        Console.WriteLine(b.GetHashCode());
        Console.WriteLine(c.GetHashCode());
        Console.WriteLine(d.GetHashCode());
        Console.WriteLine(e.GetHashCode());
    }
}

public struct Test
{
    public bool A;
    public int  B;
}

Output

346948956
346948956
346948956
346948956
346948956

Wow! So making the first field a bool makes all the hash codes come out the same, regardless of the values of ANY of the fields!

This still looks like some kind of bug to me.

The bug has been fixed in .NET 4, but only for Nullable. Custom types still yield the bad behavior. source


This is due to struct GetHashCode() behavior. If it finds reference types - it tries to get hash from first non-reference type field. In your case it WAS found, and Nullable<> is also struct, so it just poped it's private boolean value (4 bytes)

참고URL : https://stackoverflow.com/questions/39391107/why-are-hashsets-of-structs-with-nullable-values-incredibly-slow

반응형