MSK DOT NET #5
2016-12-07
In this talk Adam will describe how latest changes in.NET are affecting performance.
Adam wants to go through:
C# 7: ref locals and ref returns, ValueTuples.
.NET Core: Spans, Buffers, ValueTasks
And how all of these things help build zero-copy streams aka Channels/Pipelines which are going to be a game changer in the next year.
4. ValueTuple: sample
4
(double min, double max, double avg, double sum) GetStats(double[] numbers)
{
double min = double.MaxValue, max = double.MinValue, sum = 0;
for (int i = 0; i < numbers.Length; i++)
{
if (numbers[i] > max) max = numbers[i];
if (numbers[i] < min) min = numbers[i];
sum += numbers[i];
}
double avg = numbers.Length != 0 ? sum / numbers.Length : double.NaN;
return (min, max, avg, sum);
}
5. ValueTuple
5
• Tuple which is Value Type:
• less space
• better data locality
• NO GC
• deterministic deallocation for stack-allocated Value Types
You need reference to System.ValueTuple.dll
6. Value Types: the disadvantages?!
• Are expensive to copy!
• You need to study CIL and profiles to find out when it happens!
int result = readOnlyStructField.Method();
is converted to:
var copy = readOnlyStruct;
int result = copy.Method();
6
7. ref returns and locals: sample
7
ref int Max(
ref int first, ref int second, ref int third)
{
ref int max = ref first;
if (first < second) max = second;
if (second < third) max = third;
return ref max;
}
8. ref locals: Benchmarks: initialization
public void ByValue() {
for (int i = 0; i < array.Length; i++) {
BigStruct value = array[i];
value.Int1 = 1;
value.Int2 = 2;
value.Int3 = 3;
value.Int4 = 4;
value.Int5 = 5;
array[i] = value;
}
}
public void ByReference(){
for (int i = 0; i < array.Length; i++) {
ref BigStruct reference = ref array[i];
reference.Int1 = 1;
reference.Int2 = 2;
reference.Int3 = 3;
reference.Int4 = 4;
reference.Int5 = 5;
}
}
8
struct BigStruct { public int Int1, Int2, Int3, Int4, Int5; }
11. Safe vs Unsafe with RyuJit
Method Jit Mean Scaled
ByValue RyuJit 742.4910 ns 4.56
ByReference RyuJit 162.8368 ns 1.00
ByReferenceOldWay RyuJit 170.0255 ns 1.04
ByReferenceUnsafeImplicit RyuJit 201.4584 ns 1.24
ByReferenceUnsafeExplicit RyuJit 200.7698 ns 1.23
ByReferenceUnsafeExplicitExtraMethod RyuJit 171.3973 ns 1.05
11
Executing Unsafe code requires full trust. It can be a „no go” for Cloud!
No need for pinning!
13. 13
Allocation Deallocation Usage
Managed < 85 KB Very cheap
(NextObjPtr)
• non-deterministic
• Expensive!
• GC: stop the world • Very easy
• Common
• Safe
Managed: LOH Acceptable cost
(free list
management)
The same as above &:
• Fragmentation (LOH)
• LOH = Gen 2 = Full GC
Native:
Stackalloc
Very cheap • Deterministic
• Very cheap • Unsafe
• Not
common
• Limited
Native: Marshal Acceptable cost
(free list
management)
• Deterministic
• Very cheap
• On demand
14. Span (Slice)
It provides a uniform API for working with:
• Unmanaged memory buffers
• Arrays and subarrays
• Strings and substrings
It’s fully type-safe and memory-safe.
Almost no overhead.
It’s a Value Type.
15
16. Single method in the API is enough
unsafe void Handle(byte* buffer, int length) { }
void Handle(byte[] buffer) { }
void Handle(Span<T> buffer) { }
17
17. Uniform access to any kind of contiguous memory
public void Enumeration<T>(Span<T> buffer)
{
for (int i = 0; i < buffer.Length; i++)
{
Use(buffer[i]);
}
foreach (T item in buffer)
{
Use(item);
}
}
18
18. Span for new runtimes
• CoreCLR 1.2
• CLR 4.6.3? 4.6.4?
19
25. ArrayPool
• System.Buffers package
• Provides a resource pool that enables reusing instances of T[]
• Arrays allocated on managed heap with new operator
• The default maximum length of each array in the pool is 2^20
(1024*1024 = 1 048 576)
26
26. ArrayPool: Sample
var pool = ArrayPool<byte>.Shared;
byte[] buffer = pool.Rent(minLength);
try
{
Use(buffer);
}
finally
{
pool.Return(buffer);
}
27
31. Async on hotpath
Task<T> SmallMethodExecutedVeryVeryOften()
{
if(CanRunSynchronously()) // true most of the time
{
return Task.FromResult(ExecuteSynchronous());
}
return ExecuteAsync();
}
33
32. Async on hotpath: consuming method
while (true)
{
var result = await SmallMethodExecutedVeryVeryOften();
Use(result);
}
34
33. ValueTask<T>: the idea
• Wraps a TResult and Task<TResult>, only one of which is used
• It should not replace Task, but help in some scenarios when:
• method returns Task<TResult>
• and very frequently returns synchronously (fast)
• and is invoked so often that cost of allocation of
Task<TResult> is a problem
37
34. Sample implementation of ValueTask usage
ValueTask<T> SampleUsage()
{
if (IsFastSynchronousExecutionPossible())
{
return ExecuteSynchronous(); // INLINEABLE!!!
}
return new ValueTask<T>(ExecuteAsync());
}
T ExecuteSynchronous() { }
Task<T> ExecuteAsync() { }
38
35. How to consume ValueTask
var valueTask = SampleUsage(); // INLINEABLE
if(valueTask.IsCompleted)
{
Use(valueTask.Result);
}
else
{
Use(await valueTask.AsTask()); // NO INLINING
}
39
36. ValueTask<T>: usage && gains
• Sample usage:
• Sockets (already used in ASP.NET Core)
• File Streams
• ADO.NET Data readers
• Gains:
• Less heap allocations
• Method inlining is possible!
• Facts
• Skynet 146ns for Task, 16ns for ValueTask
• Tech Empower (Plaintext) +2.6%
40
39. Pipelines (Channels)
• „ high performance zero-copy buffer-pool-managed asynchronous message
pipes” – Marc Gravell from Stack Overflow
• Pipeline pushes data to you rather than having you pull.
• When writing to a pipeline, the caller allocates memory from the pipeline
directly.
• No new memory is allocated. Only pooled memory buffer is used.
43
40. Simplified Flow
Asks for a memory buffer.
Writes the data to the buffer.
Returns pooled memory.
Starts awaiting for the data.
Reads the data from buffer.
Uses low-allocating Span based
apis (parsing etc).
Returns the memory to the pool
when done.
44
41. System.Runtime.CompilerServices.Unsafe
T As<T>(object o) where T : class;
void* AsPointer<T>(ref T value);
void Copy<T>(void* destination, ref T source);
void Copy<T>(ref T destination, void* source);
void CopyBlock(void* destination, void* source, uint byteCount);
void InitBlock(void* startAddress, byte value, uint byteCount);
T Read<T>(void* source);
int SizeOf<T>();
void Write<T>(void* destination, T value);
45