Monday, July 21, 2008

Why I Am Sick Of Hearing About Deferred Execution

Since the announcement of LINQ we've heard plenty about "deferred execution", this term that has appeared like its some sort of LINQ magic feature.  Personally, I think I need to come up with my own term and claim it's something awesome too.  I'm really tired from hearing about it.

On Wednesday, July 15th I went to a Great Lakes Area .NET Users Group talk by Bill Wagner where he was talking about Extension Methods and how to make proper use of them.  Now, don't get me wrong, I have a lot of respect for Bill.  I don't mean to criticizing Bill in any way.  So Bill, if you read this, I really don't mean any disrespect by this.  It was simply your use of the term that made me recall my feelings on this topic.

Bill was doing a demo where he showed various LINQ extension methods and showed that by making use of these extension methods we were able to harness the power of DFERRED EXECUTION! 

The first example Bill showed was Enumerable.Range(Int32, Int32) where it returns an IEnumerable<Int32>.  Bill then shows that when he calls the Take() extension method it only iterates through the first x of the items in the range, not the full list of items identified by the range.  Ok yes, this is true.  We didn't have to create a new list and populate it with a million items, just to pull the first 5 items.

Bill later went on to discuss how if you use a LINQ query with variables, you can change those variables after you have defined the query.  His code looked something like the following:


var range = Enumerable.Range(0, 1000000);

var maxValue = 40;

var items = from r in range
where r < maxValue
select r;

var takenItems = items.Take(30);

maxValue = 20;

foreach (var i in takenItems)
{
Console.WriteLine(i);
}



Output:

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Now yes, you define your LINQ query, change your variable after the fact and then consume that class.  Yes, it takes into account the change in your variable.  Yes, this occurs after you defined your query, so deferred execution is a term that makes sense.

Ok, I'll give in a bit, I'm ok with the term, but not the way its talked about.  The magic isn't LINQ, and understanding what is going on is not just about understanding LINQ.  It's the fundamentals of how LINQ works which people should really understand.

I'm going to say this one more time before I move on "Deferred Execution is not a LINQ feature".  It's a closure feature/implementation pattern.

First let me try to explain the implementation pattern piece by creating my own "Deferred Execution" code which works exactly the same way as as the Range method Bill demonstrated. (Note that this is not necessarily built with production quality in mind).



public class MyRange : IEnumerable
{
private class RangeEnumerator : IEnumerator
{
private int? _current;
private bool _complete = false;
private readonly int _minValue;
private readonly int _maxValue;

public void Dispose()
{

}

public bool MoveNext()
{
if (_current == null)
{
_current = _minValue;
return true;
}

if (_current < _maxValue)
{
_current += 1;
return true;
}
else
{
_complete = true;
return false;
}
}

public void Reset()
{
_current = null;
_complete = false;
}

public int Current
{
get
{
if (_current == null || _complete)
{
throw new InvalidOperationException();
}

return _current.Value;
}
}

object IEnumerator.Current
{
get
{
return Current;
}
}

public RangeEnumerator(int minValue, int maxValue)
{
_minValue = minValue;
_maxValue = maxValue;
}
}

private readonly int _from;
private readonly int _to;

public IEnumerator GetEnumerator()
{
return new RangeEnumerator(_from, _to);
}

IEnumerator IEnumerable.GetEnumerator()
{
return GetEnumerator();
}

public MyRange(int from, int to)
{
_from = from;
_to = to;
}
}

That's actually really simple code isn't it?  There is nothing revolutionary in that code.  Any one of us could have implemented that in C# 1.0. 

Now, let's look at a case with closures.  LINQ internally is using closures (via lambda expressions) to perform its queries.  So lets say I write my own closure. 



var range = new MyRange(0, 1000000);

var maxValue = 40;

Func expression = i => i < maxValue;

maxValue = 20;

foreach (var i in range)
{
if (!expression(i))
{
break;
}
else
{
Console.WriteLine(i);
}
}


Output:

0
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19

Huh, wouldn't you guess it, it also shows this magical LINQ "deferred execution" behavior.

So what's the point of all this?  One, I'm probably too easily set off on topics like this.  Second, we shouldn't look at "deferred execution" as some sort of LINQ magic but rather a pattern that can provide us many benefits with our own code.  Deferred execution allows us to enhance performance and flexibility of our applications.  This is something we can all make use of in our algorithms, even if we aren't utilizing LINQ.

And in regards to the Extension method talk by Bill, I really enjoyed it.  It was simple enough for people to learn about new C# 3.0 features.  You talked about it well and gave good examples.  I'm just frustrated that people seem to write this stuff off as magic even though they are simple concepts.  Plus, this term seemingly just appeared with LINQ even though the concept has been around for a long time.

Saturday, July 12, 2008

Subtle Bugs When Dealing With Threads

Pop quiz, what's wrong with the following code?


public void Unsubscribe()
{
if (_request != null)
{
ThreadPool.Enqueue(() => _service.Unsubscribe(_request));
}
}

public void Subscribe(string key)
{
Unsubscribe();

if (!String.IsNullOrEmpty(key))
{
_request = new Request(key, handler);
ThreadPool.Enqueue(() => _service.Subscribe(_request));
}
}


Does everyone see the issue? There is a critical bug in the above code which isn't always readily apparent.



Try to find it...



I actually wrote code like this today (same concept, different implementation) and immediately saw some serious defects. Honestly, I'm lucky the issues popped up right away, these sorts of things tend to not appear right away, but jump up to bite you at a later point.


In this case the issue is the use of closures. When using a closure it copies the fields from outside the lambda expression into the expression meaning that my use of _request from within the lambda is actually the same reference that exists outside the lambda expression. So in the above case the Unsubscribe lambda gets executed on a new thread (from the pool) but by the time it actually executes the _request has already been changed.



In this case, you're actually unsubscribing from a request that most likely hasn't even been subscribed yet. And to top it off you haven't unsubscribed from the old request yet either. Obviously the above is a race condition where the exact output isn't guaranteed. There is a chance it works perfectly (though doubtful with a true thread pool). There is a chance the new request is subscribed first and then immediately unsubscribed as well.



The simplest way to resolve this issue is by changing the variable which is captured from one which is shared between both closures to one that is unique to each closure. As shown here:



public void Unsubscribe()
{
if (_request != null)
{
var localRequest = _request;
ThreadPool.Enqueue(() => _service.Unsubscribe(localRequest));
}
}

public void Subscribe(string key)
{
Unsubscribe();

if (!String.IsNullOrEmpty(key))
{
_request = new Request(key, handler);
var localRequest =_request;
ThreadPool.Enqueue(() => _service.Subscribe(localRequest));
}
}



However, what I really want to solve this type of problem going forward is to develop something which is process aware, much like the Saga in NServiceBus. Of course my goal is not to be a long running, persistable process like the Saga in NServiceBus, but the process portion is what I'm looking at.

Blogger Syntax Highliter