programing

C #에서 작은 코드 샘플을 벤치마킹하면이 구현을 개선 할 수 있습니까?

nasanasas 2020. 8. 13. 23:26
반응형

C #에서 작은 코드 샘플을 벤치마킹하면이 구현을 개선 할 수 있습니까?


꽤 자주 그래서 나는 어떤 구현이 가장 빠른지 확인하기 위해 작은 코드 덩어리를 벤치마킹합니다.

벤치마킹 코드가 지팅이나 가비지 수집기를 고려하지 않는다는 의견을 자주 봅니다.

천천히 진화 한 다음과 같은 간단한 벤치마킹 기능이 있습니다.

  static void Profile(string description, int iterations, Action func) {
        // warm up 
        func();
        // clean up
        GC.Collect();

        var watch = new Stopwatch();
        watch.Start();
        for (int i = 0; i < iterations; i++) {
            func();
        }
        watch.Stop();
        Console.Write(description);
        Console.WriteLine(" Time Elapsed {0} ms", watch.ElapsedMilliseconds);
    }

용법:

Profile("a descriptions", how_many_iterations_to_run, () =>
{
   // ... code being profiled
});

이 구현에 결함이 있습니까? 구현 X가 Z 반복을 통한 구현 Y보다 빠르다는 것을 보여주는 것으로 충분합니까? 이것을 개선 할 방법을 생각할 수 있습니까?

편집 시간 기반 접근 방식 (반복과 반대)이 선호된다는 것은 매우 분명합니다. 시간 확인이 성능에 영향을주지 않는 구현이있는 사람이 있습니까?


수정 된 기능은 다음과 같습니다. 커뮤니티에서 권장하는대로 커뮤니티 위키로 수정하십시오.

static double Profile(string description, int iterations, Action func) {
    //Run at highest priority to minimize fluctuations caused by other processes/threads
    Process.GetCurrentProcess().PriorityClass = ProcessPriorityClass.High;
    Thread.CurrentThread.Priority = ThreadPriority.Highest;

    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();
    }
    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
    return watch.Elapsed.TotalMilliseconds;
}

당신이 확인 최적화가 활성화 된 릴리스의 컴파일 및 Visual Studio의 테스트 외부를 실행합니다 . 이 마지막 부분은 JIT가 릴리스 모드에서도 연결된 디버거로 최적화를 표시하기 때문에 중요합니다.


GC.Collect반품 전에 완료가 반드시 완료되는 것은 아닙니다 . 마무리는 대기열에 추가 된 다음 별도의 스레드에서 실행됩니다. 이 스레드는 테스트 중에 여전히 활성화되어 결과에 영향을 미칠 수 있습니다.

테스트를 시작하기 전에 종료가 완료되었는지 확인하려면을 호출 할 수 있습니다. 그러면 GC.WaitForPendingFinalizers종료 대기열이 지워질 때까지 차단됩니다.

GC.Collect();
GC.WaitForPendingFinalizers();
GC.Collect();

방정식에서 GC 상호 작용을 제거하려면 이전이 아닌 GC.Collect 호출 후에 '준비'호출을 실행하는 것이 좋습니다. 이렇게하면 .NET에 함수의 작업 세트를 위해 OS에서 할당 된 충분한 메모리가 이미 있음을 알 수 있습니다.

각 반복에 대해 인라인되지 않은 메서드 호출을 수행하고 있으므로 테스트중인 항목을 빈 본문과 비교해야합니다. 또한 메서드 호출보다 몇 배 더 긴 시간 만 안정적으로 측정 할 수 있다는 점을 인정해야합니다.

또한 프로파일 링하는 항목의 종류에 따라 특정 반복 횟수가 아닌 일정 시간 동안 타이밍 기반 실행을 수행 할 수 있습니다. 최상의 구현을 위해서는 매우 짧은 실행을, 최악의 경우에는 매우 긴 실행을해야합니다.


나는 대리인을 전혀 전달하지 않을 것입니다.

  1. 위임 호출은 ~ 가상 메서드 호출입니다. 저렴하지 않음 : .NET에서 가장 작은 메모리 할당량의 약 25 %. 자세한 내용은 이 링크를 참조하십시오 .
  2. 익명의 대리인은 클로저 사용으로 이어질 수 있지만 눈치 채지 못할 것입니다. 다시 말하지만, 클로저 필드에 액세스하는 것은 예를 들어 스택의 변수에 액세스하는 것보다 두드러집니다.

클로저 사용으로 이어지는 예제 코드 :

public void Test()
{
  int someNumber = 1;
  Profiler.Profile("Closure access", 1000000, 
    () => someNumber + someNumber);
}

If you're not aware about closures, take a look at this method in .NET Reflector.


I think the most difficult problem to overcome with benchmarking methods like this is accounting for edge cases and the unexpected. For example - "How do the two code snippets work under high CPU load/network usage/disk thrashing/etc." They're great for basic logic checks to see if a particular algorithm works significantly faster than another. But to properly test most code performance you'd have to create a test that measures the specific bottlenecks of that particular code.

I'd still say that testing small blocks of code often has little return on investment and can encourage using overly complex code instead of simple maintainable code. Writing clear code that other developers, or myself 6 months down the line, can understand quickly will have more performance benefits than highly optimized code.


I'd call func() several times for the warm-up, not just one.


Suggestions for improvement

  1. Detecting if the execution environment is good for benchmarking (such as detecting if a debugger is attached or if jit optimization is disabled which would result in incorrect measurements).

  2. Measuring parts of the code independently (to see exactly where the bottleneck is).

  3. Comparing different versions/components/chunks of code (In your first sentence you say '... benchmarking small chunks of code to see which implementation is fastest.').

Regarding #1:

  • To detect if a debugger is attached, read the property System.Diagnostics.Debugger.IsAttached (Remember to also handle the case where the debugger is initially not attached, but is attached after some time).

  • To detect if jit optimization is disabled, read property DebuggableAttribute.IsJITOptimizerDisabled of the relevant assemblies:

    private bool IsJitOptimizerDisabled(Assembly assembly)
    {
        return assembly.GetCustomAttributes(typeof (DebuggableAttribute), false)
            .Select(customAttribute => (DebuggableAttribute) customAttribute)
            .Any(attribute => attribute.IsJITOptimizerDisabled);
    }
    

Regarding #2:

This can be done in many ways. One way is to allow several delegates to be supplied and then measure those delegates individually.

Regarding #3:

This could also be done in many ways, and different use-cases would demand very different solutions. If the benchmark is invoked manually, then writing to the console might be fine. However if the benchmark is performed automatically by the build system, then writing to the console is probably not so fine.

One way to do this is to return the benchmark result as a strongly typed object that can easily be consumed in different contexts.


Etimo.Benchmarks

Another approach is to use an existing component to perform the benchmarks. Actually, at my company we decided to release our benchmark tool to public domain. At it's core, it manages the garbage collector, jitter, warmups etc, just like some of the other answers here suggest. It also has the three features I suggested above. It manages several of the issues discussed in Eric Lippert blog.

This is an example output where two components are compared and the results are written to the console. In this case the two components compared are called 'KeyedCollection' and 'MultiplyIndexedKeyedCollection':

Etimo.Benchmarks - Sample Console Output

There is a NuGet package, a sample NuGet package and the source code is available at GitHub. There is also a blog post.

If you're in a hurry, I suggest you get the sample package and simply modify the sample delegates as needed. If you're not in a hurry, it might be a good idea to read the blog post to understand the details.


You must also run a "warm up" pass prior to actual measurement to exclude the time JIT compiler spends on jitting your code.


Depending on the code you are benchmarking and the platform it runs on, you may need to account for how code alignment affects performance. To do so would probably require a outer wrapper that ran the test multiple times (in separate app domains or processes?), some of the times first calling "padding code" to force it to be JIT compiled, so as to cause the code being benchmarked to be aligned differently. A complete test result would give the best-case and worst-case timings for the various code alignments.


If you're trying to eliminate Garbage Collection impact from the benchmark complete, is it worth setting GCSettings.LatencyMode?

If not, and you want the impact of garbage created in func to be part of the benchmark, then shouldn't you also force collection at the end of the test (inside the timer)?


The basic problem with your question is the assumption that a single measurement can answer all your questions. You need to measure multiple times to get an effective picture of the situation and especially in a garbage collected langauge like C#.

Another answer gives an okay way of measuring the basic performance.

static void Profile(string description, int iterations, Action func) {
    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();
    }
    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
}

However, this single measurement does not account for garbage collection. A proper profile additionally accounts for the worst case performance of garbage collection spread out over many calls (this number is sort of useless as the VM can terminate without ever collecting left over garbage but is still useful for comparing two different implementations of func.)

static void ProfileGarbageMany(string description, int iterations, Action func) {
    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();
    }
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
}

And one might also want to measure the worst case performance of garbage collection for a method that is only called once.

static void ProfileGarbage(string description, int iterations, Action func) {
    // warm up 
    func();

    var watch = new Stopwatch(); 

    // clean up
    GC.Collect();
    GC.WaitForPendingFinalizers();
    GC.Collect();

    watch.Start();
    for (int i = 0; i < iterations; i++) {
        func();

        GC.Collect();
        GC.WaitForPendingFinalizers();
        GC.Collect();
    }
    watch.Stop();
    Console.Write(description);
    Console.WriteLine(" Time Elapsed {0} ms", watch.Elapsed.TotalMilliseconds);
}

But more important than recommending any specific possible additional measurements to profile is the idea that one should measure multiple different statistics and not just one kind of statistic.

참고URL : https://stackoverflow.com/questions/1047218/benchmarking-small-code-samples-in-c-can-this-implementation-be-improved

반응형