Data comprehension syntaxes
The order of clauses in SQL is as follows:
SELECT expression FROM data_source WHERE condition
Python features a similar syntax in its list comprehensions. (In this article I don’t discuss syntax like Haskell’s list comprehensions or Scala’s for-comprehensions.)
expression for item in data_source if condition
You can see that these two are roughly parallel. The order of clauses is: the expression, the data source, and then the filtering condition.
But the LINQ syntax in C# orders these clauses differently:
from data_source where condition select expression
Why is it different from the established standard? It turns out it’s a deliberate design decision by the C# team.
It’s a better mental model
One reason I’ve heard given by C# team members is that the SQL syntax doesn’t make very much sense. It might be intuitive from a natural-language standpoint, but it fails to accurately model how we can think about the data processing.
When doing the database scan, of course we have to first identify where we are getting data from. Then we have to scan each row and determine if it meets the condition, and then we add some expression in terms of the row to the set of results that we are going to return.
It’s arguably intuitive to go from more general (the entire aggregate data source) to more specific (the exact transformations on the specific data we want from the data source). This is reflected in how one would write out these loops by hand:
for foo in data_source: if foo + 1 > 10: yield foo
It’s better for tooling
One problem with the SQL syntax is that when you’ve written this:
you have a hard time of autocompleting the word starting with “f”. It could be a column name from any table in the database!
C# doesn’t have this problem because you only write the filter and expression after you’ve written the data source:
from data_source where f // easily autocompleted from data_source where foo + 1 > 10 select f // likewise
The usefulness of autocomplete in exploratory programming shouldn’t be understated; this alone is a compelling reason to use a different order for these clauses.