Saturday 25 March 2017

C# Evolution

(and why you should care)

I may be a bit of a programming language nut. I find different languages interesting not just because they are semantically different or because they offer different features or even because they just look pretty (I'm lookin' at you, Python), but because they can teach us new things about the languages we already use.

I'm of the opinion that no technology with any staying power has no value to offer. In other words, if the tech has managed to stay around for some time, there must be something there. You can hate on VB6 as much as you want, but there has to be something there to have made it dominate desktop programming for the years that it did. That's not the focus of this discussion though.

Similarly, when new languages emerge, instead of just rolling my eyes and uttering something along the lines of "Another programming language? Why? And who honestly cares?", I prefer to take a step back and have a good long look. Creating a new language and the ecosystem required to interpret or compile that language is a non-trivial task. It takes a lot of time and effort, so even when a language seems completely impractical, I like to consider why someone might spend the time creating it. Sure, there are outliers where the only reason is "because I could" (I'm sure Shakespeare has to be one of those) or "because I hate everyone else" (Brainfuck, Whitespace and my favorite to troll on, Perl -- I jest, because Perl is a powerhouse; I just haven't ever seen a program written in Perl which didn't make me cringe, though Larry promises that is supposed to change with Perl 6).

Most languages are born because someone found the language they were dealing with was getting in the way of getting things done. A great example is Go, which was dreamed up by a bunch of smart programmers who had been doing this programming thing for a while and really just wanted to make an ecosystem which would help them to get stuff done in a multi-core world without having to watch out for silly shit like deadlocks and parallel memory management. Not that you can't hurt yourself in even the most well-designed language (indeed, if you definitely can't hurt yourself in the language you're using, you're probably bound and gagged by it -- but I'm not here to judge what you're into).

Along the same lines, it's interesting to watch the evolution of languages, especially as languages evolve out of fairly mundane, suit-and-tie beginnings. I feel like C# has done that -- and will continue to do so. Yes, a lot of cool features aren't unique or even original -- deconstruction in C#7 has been around for ages in Python, for example -- but that doesn't make them any less valuable.

I'm going to skip some of the earlier iterations and start at where I think it's interesting: C#5. Please note that this post is (obviously) opinion. Whilst I'll try to cover all of the feature changes that I can dig up, I'm most likely going to focus on the ones which I think provide the programmer the most benefit.

C#5

C#5 brought async features and caller information. Let's examine the latter before I offer up what will probably be an unpopular opinion on the former.

Caller information allowed providing attributes on optional functional parameters such that if the caller didn't provide a value, the called code could glean
  • Caller file path
  • Caller line number
  • Caller member name
This is a bonus for logging and debugging of complex systems, but also allowed tricks whereby the name of a property could be automagically passed into, for example, an MVVM framework method call in your WPF app. This makes refactor-renaming easier and removes some magic strings, not to mention making debug logging way way simpler. Not ground-breaking, but certainly [CallerMemberName] became a friend of the WPF developer. A better solution  for the exact problem of property names and framework calls came with nameof in C#6, but CallerMember* attributes are still a good thing.



C# 5 also brought async/await. On the surface, this seems like a great idea. If only C#'s async/await worked anything like the async/await in Typescript, where, under the hood, there's just a regular old promise and no hidden bullshit. C#'s async/await looks like it's just going to be Tasks and some compiler magic under the hood, but there's that attached context which comes back to bite asses more often than a mosquito with a scat fetish. There are some times when things will do exactly what you expect and other times when they won't, just because you don't know enough about the underlying system.

That's the biggest problem with async/await, in my opinion: it looks like syntactic sugar to take away the pain of parallel computing from the unwashed masses but ends up just being trickier than having to learn how to do actual multi-threading. There's also the fickle task scheduler which may decide 8 cores doesn't mean 8 parallel tasks -- but that's OK: you can swap that out for your own task scheduler... as long as you understand enough of the underlying system, again (as this test code demonstrates).

Like many problems that arise out of async / parallel programming, tracking down the cause of sporadic issues in code is non-trivial. I had a bit of code which would always fail the first time, and then work like a charm -- until I figured out it was the context that was causing issues, so I forced a null context and the problem stopped. The developer has to start learning about task continuation options and start caring about how external libraries do things. And many developers aren't even aware of when it's appropriate to use async/await, opting to use it everywhere and just add overhead to something which really didn't need to be async, like most web api controllers. Async/await makes a lot of sense in GUI applications though.

Having async/await around IO-bound stuff in web api calls may be good for high-volume web requests because it allows the worker thread to be re-assigned to handle another request. I have yet to see an actual benchmark showing better web performance from simply switching to async/await for all calls.
The time you probably most want to use it is for concurrent IO to shorten the overall request time for a single request. Some thought has to go into this though -- handling too requests concurrently may just end up with many requests timing out instead of a few requests being given a quick 503, indicating that the application needs some help with scaling. In other words, simply peppering your code with async/await could result in no net gain, especially if the code being awaited is hitting the same resource (eg, your database).

Which leads to the second part of C#'s async/await that I hate: the async zombie apocalypse. Because asking the result of a method marked async to just .Wait() is suicide (I hope you know it is; please, please, please don't do that), async/await patterns tend to propagate throughout code until everything is awaiting some async function. It's the Walking Dead, traipsing through your code, leaving little async/await turds wherever they go.




You can use ConfigureAwait() to get around deadlocking on the context selected for your async code -- but you must remember to apply it to all async results if you're trying to "un-async" a block of code. You can also set the synchronization context (I suggest the useful value of null). Like the former workaround, it's hardly elegant.

As much as I hate (fear?) async/await, there are places where it makes the code clearer and easier to deal with. Mostly in desktop applications, under event-handling code. It's not all bad, but the concept has been over-hyped and over-used (and poorly at that -- like EF's SaveChangesAsync(), which you might think, being async, is thread-safe, and you'd be DEAD WRONG).

Let's leave it here: use async/await when it provides obvious value. Question it's use at every turn. For every person who loves async/await, there's a virtual alphabet soup of blogs explaining how to get around some esoteric failure brought on by the feature. As with multi-threading in C/C++: "use with caution".



C#6

Where C#5 brought low numbers of feature changes (but one of the most damaging), C#6 brought a smorgasbord of incremental changes. There was something there for everyone:

Read-only auto properties made programming read-only properties just a little quicker, especially when the property values came from the constructor. So code like:

public class VideoGamePlumber
{
  public string Name { get { return _name; } }
  private string _name;

  public VideoGamePlumber(string name)
  {
    _name = name;
  }
}
became:

public class VideoGamePlumber
{
  public string Name { get; private set; }

  public VideoGamePlumber(string name)
  {
    Name = name;
  }
}
but that still leaves the Name property open for change within the VideoGamePlumber class, so better would be the C#6 variant:

public class VideoGamePlumber
{
  public string Name { get; }

  public VideoGamePlumber(string name)
  {
    Name = name;
  }
}
The Name property can only be set from within the constructor. Unless, of course, you resort to reflection, since the mutability of Name is enforced by the compiler, not the runtime. But I didn't tell you that.



Auto-property initializers seem quite convenient, but I'll admit that I rarely use them, I think primarily because the times that I want to initialize outside of the constructor, I generally want a private (or protected) field, and when I want to set a property at construction time, it's probably getting it's value from a constructor parameter. I don't hate the feature (at all), just don't use it much. Still, if you wanted to:
public class Doggie
{
  public property Name { get; set; }

  public Doggie()
  {
    Name = "Rex"; // set default dog name
  }
}
becomes:
public class Doggie
{
  public property Name { get; set; } = "Rex";
}
You can combine this with the read-only property syntax if you like:
public class Doggie
{
  public property Name { get; } = "Rex";
}
but then all doggies are called Rex (which is quite presumptuous) and you really should have just used a constant, which you can't modify through reflection.



Expression-bodied function members can provide a succinct syntax for a read-only, calculated property. However, I use them sparingly because anything beyond a very simple "calculation", eg:

public class Person
{
  public string FirstName { get; }
  public string LastName { get; }
  public string FullName => $"{FirstName} {LastName}";

  public Person(string firstName, string lastName)
  {
    FirstName = firstName;
    LastName = lastName;
  }
}
starts to get long and more difficult to read; though that argument is simply countered by having a property like:

public class Business
{
  public string StreetAddress { get; }
  public string Suburb { get; }
  public string Town { get; }
  public string PostalCode { get; }
  public string PostalAddress => GetPostalAddress();

  public Business(string streetAddress, string suburb, string Town, string postalCode)
  {
    StreetAddress = streetAddress;
    Suburb = suburb;
    Town = town;
    PostalCode = postalCode
  }
  
  private string GetAddress()
  {
    return string.Join("\n", new[]
    {
      StreetAddress,
      Suburb,
      Town,
      PostalCode
    });
  }
}
Where the logic for generating the full address from the many bits of involved data is tucked away in a more readable method and the property itself becomes syntactic sugar, looking less clunky than just exposing the GetAddress method.

Index initializers provide a neater syntax for initializing Dictionaries, for example:
var webErrors = new Dictionary()
{
  { 404, "Page Not Found" },
  { 302, "Page moved, but left a forwarding address" },
  { 500, "The web server can't come out to play today" }
};

Can be written as:
private Dictionary webErrors = new Dictionary
{
  [404] = "Page not Found",
  [302] = "Page moved, but left a forwarding address.",
  [500] = "The web server can't come out to play today."
};

It's not ground-breaking, but I find it a little more pleasing on the eye.

Other stuff that I hardly use includes:

Extension Add methods for collection initializers  allow your custom collections to be initialized like standard ones. Not a feature I've ever used because I haven't had the need to write a custom collection.

Improved overload resolution reduced the number of times I shook my fist at the complainer compiler. Ok, so I technically use this, but this is one of those features that, when it's working, you don't even realise it.

Exception filters made exception handling more expressive and easier to read.

await in catch and finally blocks allows the async/await zombies to stumble into your exception hanlding. Yay.



On to the good bits (that I regularly use) though:

using static made using static functions so much neater -- as if they were part of the class you were currently working in. I don't push static functions in general because using them means that testing anything which uses them has to test them too, but there are places where they make sense. One is in RandomValueGen from PeanutButter.RandomGenerators, a class which provides functions to generate random data for testing purposes. A static import means you no longer have to mention the RandomValueGen class throughout your test code:

using PeanutButter.RandomGenerators;

namespace Bovine.Tests
{
  [TestFixture]
  public class TestCows
  {
    [Test]
    public void Moo_ShouldBeAnnoying()
    {
      // Arrange
      var cow = new Cow()
      {
        Name = RandomValueGen.GetRandomString(),
        Gender = RandomValueGen.GetRandom(),
        Age = RandomValueGen.GetRandomInt()
      };
      // ...
    } 
  }
}

Can become:
using static PeanutButter.RandomGenerators.RandomValueGen;

namespace Bovine.Tests
{
  [TestFixture]
  public class TestCows
  {
    [Test]
    public void Moo_ShouldBeAnnoying()
    {
      // Arrange
      var cow = new Cow()
      {
        Name = GetRandomString(),
        Gender = GetRandom(),
        Age = GetRandomInt()
      };
      // ...
    } 
  }
}


Which is way more readable simply because there's less unnecessary cruft in there. At the point of reading (and writing) the test, source library and class for random values is not only irrelevant and unnecessary -- it's just plain noisy and ugly.


Null conditional operators. Transforming fugly multi-step checks for null into neat code:

if (thing != null &&
    thing.OtherThing != null &&
    thing.OtherThing.FavoriteChild != null &&
    // ... and so on, and so forth, turtles all the way down
    //     until, eventually
    this.OtherThing.FavoriteChild.Dog.Collar.Spike.Metal.Manufacturer.Name != null)
{
  return this.OtherThing.FavoriteChild.Dog.Collar.Spike.Metal.Manufacturer.Name;
}
return "Unknown manufacturer";

becomes:
return thing
  ?.OtherThing
  ?.FavoriteChild
  ?.Dog
  ?.Collar
  ?.Spike
  ?.Metal
  ?.Manufacturer
  ?.Name
  ?? "Unknown manufacturer";

and kitties everywhere rejoiced:


String interpolation helps you to turn disaster-prone code like this:

public void PrintHello(string salutation, string firstName, string lastName)
{
  Console.WriteLine("Hello, " + salutation + " " + firstName + " of the house " + lastName);
}

or even the less disaster-prone, more efficient, but not that amazing to read:
public void PrintHello(string salutation, string firstName, string lastName)
{
  Console.WriteLine(String.Join(" ", new[]
  {
    "Hello,",
    salutation,
    firstName,
    "of the house",
    lastName
  }));
}

Into the safe, readable:
public void PrintHello(string salutation, string firstName, string lastName)
{
  Console.WriteLine($"Hello, {salutation} {firstName} of the house {lastName}");
}



nameof is also pretty cool, not just for making your constructor null-checks impervious to refactoring:

public class Person
{
  public Person(string name)
  {
    if (name == null) throw new ArgumentNullException(nameof(name));
  }
}


(if you're into constructor-time null-checks) but also for using test case sources in NUnit:
[Test, TestCaseSource(nameof(DivideCases))]
public void DivideTest(int n, int d, int q)
{
  Assert.AreEqual( q, n / d );
}

static object[] DivideCases =
{
  new object[] { 12, 3, 4 },
  new object[] { 12, 2, 6 },
  new object[] { 12, 4, 3 } 
};

C#7

This iteration of the language brings some  neat features for making code more succinct and easier to grok at a glance.
Inline declaration of out variables makes using methods with out variables a little prettier. This is not a reason to start using out parameters: I still think that there's normally a better way to do whatever it is that you're trying to achieve with them and use of out and ref parameters is, for me, a warning signal in the code of a place where something unexpected could happen. In particular, using out parameters for methods can make the methods really clunky because you have to set them before returning, making quick returns less elegant. Part of me would have liked them to be set to the default value of the type instead, but I understand the rationale behind the compiler not permitting a return before setting an out parameter: it's far too easy to forget to set it and end up with strange behavior.

I think I can count on one hand the number of times I've written a method with an out or ref parameter and I couldn't even point you at any of them. I totally see the point of ref parameters for high-performance code where it makes sense (like manipulating sound or image data). I just really think that when you use out or ref, you should always ask yourself "is there another way to do this?". Anyway, my opinions on the subject aside, there are times when you have to interact with code not under your control and that code uses out params, for example:

public void PrintNumeric(string input)
{
  int result;
  if (int.TryParse(input, out result))
  {
    Console.WriteLine($"Your number is: {input}");
  }
  else
  {
    Console.WriteLine($"'{input}' is not a number )':");
  }
}

becomes:
public void PrintNumeric(string input)
{
  if (int.TryParse(input, out int result))
  {
    Console.WriteLine($"Your number is: {input}");
  }
  else
  {
    Console.WriteLine($"'{input}' is not a number )':");
  }
}

It's a subtle change, but if you have to use out parameters enough, it becomes a lot more convenient, less noisy.

Similarly ref locals and returns, where refs are appropriate, can make code much cleaner. The general use-case for these is to return a reference to a non-reference type for performance reasons, for example when you have a large set of ints, bytes, or structs and would like to pass off to another method to find the element you're interested in before modifying it. Instead of the finder returning an index and the outer call re-referencing into the array, the finder method can simply return a reference to the found element so the outer call can do whatever manipulations it needs to. I can see the use case for performance reasons in audio and image processing as well as large sets of structs of data. The example linked above is quite comprehensive and the usage is for more advanced code, so I'm not going to rehash it here.

Tuples have been available in .NET for a long time, but they've been unfortunately cumbersome. The new syntax in C#7 is changing that. Python tuples have always been elegant and now a similar elegance comes to C#:
public static class TippleTuple
{
  public static void Main()
  {
    // good
    (int first, int second) = MakeAnIntsTuple();
    Console.WriteLine($"first: {first}, second: {second}");
    // better
    var (flag, message) = MakeAMixedTuple();
    Console.WriteLine($"first: {flag}, second: {message}");
  }

  public static (int first, int second) MakeAnIntsTuple()
  {
    return (1, 2);
  }

  public static (bool flag, string name) MakeAMixedTuple()
  {
    return (true, "Moo");
  }
}

Some notes though: the release notes say that you should need to install the System.ValueTuple nuget package to support this feature in VS2015 and lower, but I found that I needed to install the package on VS2017 too. Resharper still doesn't have a clue about the feature as of 2016.3.2, so usages within the interpolated strings above are highlighted as errors. Still, the program above compiles and is way more elegant than using Tuple<> generic types. It's very clever that language features can be delivered by nuget packages though.

Local functions provide a mechanism for keeping little bits of re-used code local within a method. In the past, I've used a Func variable where I've had a little piece of re-usable logic:
private NameValueCollection CreateRandomSettings()
{
  var result = new NameValueCollection();
  Func randInt = () => RandomValueGen.GetRandomInt().ToString();
  result["MaxSendAttempts"] = randInt();
  result["BackoffIntervalInMinutes"] = randInt();
  result["BackoffMultiplier"] = randInt();
  result["PurgeMessageWithAgeInDays"] = randInt();
  return result;
}

which can become:
public static NameValueCollection GetSomeNamedNumericStrings()
{
  var result = new NameValueCollection();
  result["MaxSendAttempts"] = RandInt();
  result["BackoffIntervalInMinutes"] = RandInt();
  result["BackoffMultiplier"] = RandInt();
  result["PurgeMessageWithAgeInDays"] = RandInt();
  return result;

  string RandInt () => GetRandomInt().ToString();
}

Which, I think, is neater. There's also possibly a performance benefit: every time the first implementation of GetSomeNamedNumericStrings is called, the randInt Func is instantiated, where the second implementation has the function compiled at compile-time, baked into the resultant assembly. Whilst I wouldn't put it past the compiler or JIT to do clever optimisations for the first implementation, I wouldn't expect it.

Throw expressions also offer a neatening to your code:

public int ThrowIfNullResult(Func source)
{
  var result = source();
  if (result == null) 
    throw new InvalidOperationException("Null result is not allowed");
  return result;
}
can become:
public int ThrowIfNullResult(Func source)
{
  return source() ?? 
    throw new InvalidOperationException("Null result is not allowd");
}

So now you can actually write a program where every method has one return statement and nothing else.



Time for the big one: Pattern Matching. This is a language feature native to F# and anyone who has done some C#-F# crossover has whined about how it's missing from C#, including me. Pattern matching not only elevates the rather bland C# switch statement from having constant-only cases to non-constant ones:

public static string InterpretInput(string input)
{
  switch (input)
  {
    case string now when now == DateTime.Now.ToString("yyyy/mm/dd"):
      return "Today's date";
    default:
      return "Unknown input";
  }
}

It allows type matching:

public static void Main()
{
  var animals = new List()
  {
    new Dog() {Name = "Rover"},
    new Cat() {Name = "Grumplestiltskin"},
    new Lizard(),
    new MoleRat()
  };
  foreach (var animal in animals)
    PrintAnimalName(animal);
}

public static void PrintAnimalName(Animal animal)
{
  switch (animal)
  {
    case Dog dog:
      Console.WriteLine($"{dog.Name} is a {dog.GetType().Name}");
      break;
    case Cat cat:
      Console.WriteLine($"{cat.Name} is a {cat.GetType().Name}");
      break;
    case Lizard _:
      Console.WriteLine("Lizards have no name");
      break;
    default:
      Console.WriteLine($"{animal.GetType().Name} is mystery meat");
      break;
  }
}

Other new features include generalized async return typesnumeric literal syntax improvements and more expression bodied members.

Updates in C# 7.1 and 7.2

The full changes are here (7.1) and here (7.2). Whilst the changes are perhaps not mind-blowing, there are some nice ones:
  • Async Main() method now supported
  • Default literals, eg: Func foo = default;
  • Inferred tuple element names so you can construct tuples like you would with anonymous objects
  • Reference semantics with value types allow specifying modifiers:
    • in which specifies the argument is passed in by reference but may not be modified by the called code
    • ref readonly which specifies the reverse: that the caller may not modify the result set by the callee
    • readonly struct makes the struct readonly and passed in by ref
    • ref struct which specifies that a struct type access managed memory directly and must always be allocated on the stack
  • More allowance for _ in numeric literals
  • The private protected modifier, which is like protected internal except that it restricts usage to derived types only

Wrapping it up

I guess the point of this illustrated journey is that you really should keep up to date with language features as they emerge:
  • Many features save you time, replacing tedious code with shorter, yet more expressive code
  • Some features provide a level of safety (eg string interpolations)
  • Some features give you more power
So if you're writing C# like it's still .net 2.0, please take a moment to evolve as the language already has.

What's new in PeanutButter?

Retrieving the post... Please hold. If the post doesn't load properly, you can check it out here: https://github.com/fluffynuts/blog/...