Ugrás a tartalomhoz

.NET Programming Technologies

Gergely Kovásznai, Csaba Biró

Eszterházy Károly College

Query Operators

Query Operators

In this section, we are walking through the list of the major query operators, divided into categories. But first, let us show an interesting aspect of them (and of LINQ queries), which is called deferred execution. Query operators, except for a few exceptions, are executed not at the moment when the query is constructed, but rather when the result of the query is iterated through. Let us see the following query as an example:

List<int> numbers = new List<int>() { 1, 2 };

           

IEnumerable<int> query = numbers.Select(n => n * 10);

numbers.Add(3);

foreach (int n in query)

        textBlock.Text += string.Format("{0} ", n);

The text in the textBlock will be „10 20 30”; i.e., the number 3 that we “sneaked” into the list is also appearing in the result of the query, since the query is only executed when being iterated through by foreach.

All the query operators we are going to introduce in the subsequent sections provide deferred execution, except for the ones in Section Hiba! A hivatkozási forrás nem található..

Filtering

Some operators are for filtering the elements of collections. In examples for comprehension syntax, we have already met the where keyword (which corresponds to the Where extension method), which is for returning those elements of a collection that fulfill a given condition. Let us see one more example (in comprehension syntax)!

IEnumerable<Book> selBooks = from b in books

                                                          where b.ReleaseDate.Year >= 2000

                                                          select b;

Overall, one can use the following extension methods for filtering:

Filtering operators

  

Where

T=>bool

Returns elements that fulfill a condition.

Distinct

Returns only distinct elements.

Take

int

Returns the first n elements.

TakeWhile

T=>bool

Returns the first elements until reaching an element that does not fulfill the condition anymore.

Skip

int

Skips the first n elements and returns the rest.

SkipWhile

T=>bool

Skips the first elements until reaching an element fulfilling the condition, and returns the rest.

The Take and Skip operators might be quite useful in real-world applications, since, by using them, one can split the result of a query into smaller chunks, e.g., for displaying only 20 elements at a time:

IEnumerable<Book> selBooks = books

        .Where(b => b.Title.Contains("world war"))

        .Take(20);

...

selBooks = books

        .Where(b => b.Title.Contains("world war"))

        .Skip(20)

        .Take(20);

The Distinct operator is very useful in any application that uses databases (Section XVII). Let us now show a rather unconventional example, which filters the distinct letters out of the characters of a string (as a collection), and, furthermore, sorts them in alphabetical order (c.f. the next section):

IEnumerable<char> letters = "Hello World"

        .Where(c => char.IsLetter(c))

        .Distinct()

        .OrderBy(c => char.ToUpper(c));

Ordering

Although we have already showed several examples for ordering, one can additionally specify the order of sorting and more than one levels of sorting. In comprehension syntax, one can use the orderby keyword, which is capable to realize even multi-level sorting, and the descending keyword, for sorting in a descending order. For instance, let us sort persons’ data in alphabetical order (primarily by last name and secondarily by first name), and then sort further the resulting list in descending order by date of birth!

IEnumerable<Person> selPersons = from p in persons

                                                                  orderby p.FirstName, p.LastName,

                                                                                   p.DateOfBirth descending

                                                                  select p;

Overall, the following extension methods can be used for ordering:

Ordering operators

  

OrderBy, ThenBy

T=>TKey

Ascending order.

OrderByDescending, ThenByDescending

T=>TKey

Descending order.

Reverse

int

Reverse order.

As can be seen, the lambda expressions used as parameters must select a “key” (TKey) to sort by. The above example can also be written in another way:

IEnumerable<Person> selPersons = persons

        .OrderBy(p => p.FirstName)

        .ThenBy(p => p.LastName)

        .ThenByDescending(p => p.DateOfBirth);

Projection

In each example we have given for comprehension syntax so far, the select keyword stands at the end of queries. By using select, one can actually select the data to be included in the query result (as a collection). In the above examples, this kind of expression appeared only in the form of „select x'', where x was a variable that occured in the query. However, right after select one can use an arbitrary expression; this expression might contain x (and, of course, it usually does). Let us see a few examples:

IEnumerable<string> firstNames = from p in persons

                                                                  select p.FirstName;

IEnumerable<string> personNames = from p in persons

                                                                        select p.FirstName + " " + p.LastName;

IEnumerable<int> personAges = from p in persons

                                                                select (DateTime.MinValue +

                                                                                DateTime.Now.Subtract(p.DateOfBirth)

                                                                ).Year;

In the latter example, a particularly complex expression takes place right after select (in order to compute the age of a given person).

In each of the above examples, only one data stands right after select. What to do if one would like to select more than one data and return them all together? This can only be done by wrapping the selected data in an object. The next example is about selecting the identifier and the name of a given person, therefore they are wrapped in an instance of a class (PersonSimple), defined by us elsewhere.

IEnumerable<PersonSimple> selPersons = from p in persons

                                                                                 select new PersonSimple

                                                                                 {

                                                                                                Id = p.Id,

                                                                                                Name = p.FirstName + " " + p.LastName

                                                                                 };

Let us image that, in our application, there exist many various queries related to the Person class! In one of them persons’ identifiers and names are selected, in another one names and ages (such as in the example below), in a third one identifiers, jobs, and dates of birth, and so on. It is very complicated and tedious to define a separate “wrapper” class for each kind of selection. Anonymous classes in C# provide convenient solution for this problem.

The compiler, with respect to an anonymous instantiation written in the source code, automatically defines an anonymous class. If two instantiations contain properties of the same type and of the same name (and in the same order), then the same anonymous class will be instantiated.

var selPersons = from p in persons

                                 select new

                                 {

                                                Name = p.FirstName + " " + p.LastName,

                                                Age = (DateTime.MinValue + DateTime.Now.Subtract(p.DateOfBirth)

                                                                        ).Year

                                 };

Since, in the above example, the collection resulted by the query consists of instances of an anonymous class (defined by the compiler), one cannot determine the explicit type of the selPersons variable (and, therefore, one does not know what class name to write right after IEnumerable). The var keyword, in C# 3.0, was invented exactly for such cases.

The type of a variable declared by the var keyword is defined precisely, even if developers do not always realize this fact. This type is declared by the compiler, and is the same as the one of the right-hand-side expression used for initializing the variable. For example, the type of x will be double in the following initialization:

var x = 15 / 2.3;

Besides satisfying the “laziness” of developers, the usage of var is unavoidable when using anonymous classes. We have already seen such an example above, but another good example could be a loop that traverses a collection (selPersons) containing instances of an anonymous class:

foreach (var p in selPersons) { ... }

One can altogether use two extension methods for projection:

Projecting operators

  

Select

T=>TResult

Projects each element to a TResult object.

SelectMany

T=>IEnumerable<TResult>

Projects each element to a collection of TResult objects.

Due to space constraints, we are not going to dig deep into the projecting operators (especially not into SelectMany); on the other hand, we recommend related literature listed in the References section. Among the various possibilities detailed in literature, we would like to mention only one related to select, namely that of nested subqueries. The point is the following: an expression right after select is allowed to include even another query (which is also allowed to include further ones). In the next example, a list of (system) directories is retrieved, and (hidden) files within each directory as well:

System.IO.DirectoryInfo[] dirs = ...;

var query = from d in dirs

                        where (d.Attributes & System.IO.FileAttributes.System) == 0

                        select new

                        {

                                DirectoryName = d.FullName,

                                Created = d.CreationTime,

                                Files = from f in d.GetFiles()

                                                where (f.Attributes & System.IO.FileAttributes.Hidden) == 0

                                                select new

                                                {

                                                        Filename = f.Name,

                                                        Length = f.Length

                                                }

                        };

Note that the result of the query will be a collection of anonymous class objects, each of which describes a directory. It is an especially interesting feature that this anonymous class has a Files property, which is a collection of instances of another anonymous class.

Grouping

In certain queries, one might want to split a collection into smaller chunks, by considering a certain criterion. In comprehension syntax, one can use the keywords group...by for this purpose. For instance, let us group the employees of a company by sections!

List<Person> persons;

var personGroups = from p in persons

                                        group p.FirstName + " " + p.LastName by p.Section;

foreach (var pGroup in personGroups)

{

        Console.WriteLine("Section: {0}", pGroup.Key);

        foreach (var p in pGroup)

                Console.WriteLine("\t{0}", p);

}

As can be seen, in the result of the query, the keys of distinct groups can be accessed via the Key property; in the above example, the key is the name of a section.

Of course, arbitrary expression can be used between the group and by keywords (e.g., instantiation of an anonymous class); in this regard, group...by follows exactly the same rules as select does.[10] Let us show an example about grouping files by extension!

System.IO.FileInfo[] files = ...;

var query = from f in files

                        group new

                        {

                                Name = f.Name.ToUpper(),

                                Date = f.CreationTime

                        } by f.Extension;

The sole grouping operator is implemented by the following extension method:

Grouping operator

 

GroupBy

T=>TKey [,T=>TResult]

Groups the elements (and transforms them into TResult objects).

Notice that the second parameter of the GroupBy method, which is for customizing projection, is optional.

It is also worth to mention the usage of the into keyword, which is for accessing a grouping via an identifier in a query.

The into keyword is for “saving” a projection, i.e., the resulting collection of a projection can be accessed via the identifier given right after into. The projection can be either a select or a group...by clause.

In the following example, groups of such files are returned whose extensions do not exceed 10 characters, and, furthermore, the resulting groups are sorted by the number of elements, in ascending order.

var query = from f in files

                        group new

                        {

                                Name = f.Name.ToUpper(),

                                Date = f.CreationTime

                        } by f.Extension

                        into g

                        where g.Key.Length <= 10

                        orderby g.Count()

                        select g;

Join

In applications that use databases (Section XVII), it is essential to join tables. There exist several kinds of joins, e.g., inner join, left join, right join, cross join etc. Besides the tools that have already been introduced in the previous sections, LINQ provides an additional opportunity for connecting collections with each other, and this is available through the join...on...equals triple keyword in the comprehension syntax. Let us see an example that joins a collection of persons (persons) and a collection of travels (travels), and generates a list about who travelled where:

var query = from p in persons

                        join t in travels on p.Id equals t.PersonId

                        select string.Format("{0} {1} travelled to {2}",

                                                                                        p.FirstName, p.LastName, t.Destination);

This query returns a collection of such strings:

Mary Butcher travelled to New Zeland

Mary Butcher travelled to Prague

Victor Hugo travelled to Naples

Sándor Kovács travelled to Zalaegerszeg

The above join can be considered typical, since each element of both collections has an identifier that is checked to be equal to each other. By the way, the above join is an inner join, meaning that such persons who did not travel anywhere will not occur in the output.

It is also possible to realize joins on multiple keys, as follows:

var query = from x in seqX

  join y in seqY on new { K1 = x.Prop1, K2 = x.Prop2 }

                       equals new { K1 = y.Prop3, K2 = y.Prop4 }

Here we exploit the fact that the same properties (having the same names and types) are used in both instantiations, therefore the compiler will instantiate the same anonymous class. Thus, equality checking will work as expected.

Comprehension syntax supports even left joins, meaning – in terms of the previous example – that every person will occur in the output, even those ones who have not travelled anywhere. In order to realize this, we need to write an into clause right after the join (c.f. the previous section). The previous example can be altered accordingly:

var query = from p in persons

                        join t in travels on p.Id equals t.PersonId

                        into personTravels

                        select new

                        {

                                PersonName = string.Format("{0} {1}", p.FirstName, p.LastName),

                                Travels = personTravels

                        };

foreach (var pt in query)

{

        Console.WriteLine(pt.PersonName);

        if (pt.Travels.Count() == 0)

                Console.WriteLine(" did not travel anywhere");

        else

        {

                Console.WriteLine(" travelled to:");

                        foreach (var t in pt.Travels)

                                Console.WriteLine("\t{0}", t.Destination);

        }

}

In the above source code, we traverse the query result by a loop, and check each person whether he or she has travelled to at least one place.

All the joins that have been introduced above are implemented by the following two extension methods:

Join operators

  

Join

IEnumerable<T2>,T=>TKey,T2=>TKey,(T,T2)=>TResult

Joins a collection of elements of type T and a collection of elements of type T2, and returns a collection of elements of type TResult.

GroupJoin

IEnumerable<T2>,T=>TKey,T2=>TKey,(T,IEnumerable<T2>)=>TResult

Same as above, but the result is further grouped by the elements of type T.

Finally, let us show an example that performs multiple joins![11] Let us first use comprehension syntax: besides joining persons and travels, we are also about connecting a collection of travel expenses (expenses) to the query, in order to list who spent how much and where.

var query = from p in persons

                        join t in travels on p.Id equals t.PersonId

                        join e in expenses on t.Id equals e.TravelId

                        select new

                        {

                                PersonName = string.Format("{0} {1}", p.FirstName, p.LastName),

                                Amount = e.Amount,

                                Place = t.Destination

                        };

By directly using extension methods, the same result could be achieved, as follows:

var query = persons

        .Join(travels, p => p.Id, t => t.PersonId, (p, t) => new

        {

                Person = p,

                Travel = t

        })

        .Join(expenses, pt => pt.Travel.Id, e => e.TravelId, (pt, e) => new

        {

                PersonName = string.Format("{0} {1}",

                                                                        pt.Person.FirstName, pt.Person.LastName),

                Amount = e.Amount,

                Place = pt.Travel.Destination

        });

This example illustrates pretty well how much easier and more intuitive it is to use comprehension syntax, in most cases. On the other hand, calling extension methods directly gives more flexibility.

Nondeferred Operators

As mentioned before, all the operators introduced in the previous sections provide deferred execution. In a real-world application, eventually it is necessary to “back up” some part of the current query result, such as the whole result or only a single element.

Conversion. The whole result of a query can be “backed up” into a collection. I.e., whatever will happen with the source of the query in the future, the current query result will be preserved in the exported e.g. array or list. Let us see an example:

List<Car> selCars = (from c in cars

                                         where c.Manufacturer == "Suzuki"

                                         orderby c.ManufactureDate descending

                                         select c

                                        ).ToList();

The following extension methods can be used for exporting/converting:

Conversion operators

  

ToArray

Converts an IEnumerable<T> collection into a T[] array.

ToList

Converts an IEnumerable<T> collection into a List<T> list.

ToDictionary

T=>TKey

Converts an IEnumerable<T> collection into a Dictionary<TKey,T> dictionary.

As can be seen, when converting into a Dictionary (i.e., hash table), one must specify what to consider as the key of elements.

Element. If one does not want to export a whole collection but rather a single element, then the following extension methods can be used:

Element operators

  

First

[T=>bool]

Returns the collection’s first element (that fulfills the optional condition).

Last

[T=>bool]

Returns the last element (that fulfills the optional condition).

ElementAt

int

Returns the element at the given index.

Aggregate. It is necessary to “aggregate” a collection in order to extract certain data, such as the average of elements or even the number of elements. The following aggregating extension methods can be used:

Aggregating operators

  

Count

[T=>bool]

Returns the number of elements (that fulfill the optional condition).

Min, Max

[T=>TResult]

Returns the minimal resp. maximal element. Projecting expression can also be specified.

Sum, Average

[T=>TResult]

Returns the sum resp. average of elements. Projecting expression can also be specified.

It is worth to understand the usefulness of the optional T=>TResult parameter, which belongs to certain aggregating operators. Let us assume that our query returns car objects and we would like to calculate their average price. One way to do this:

double average = (from c in cars

                                  where c.ManufactureDate.Year >= 2005

                                  select c

                                 ).Average(c => c.Price);

Quantifiers. For certain tasks, it might be necessary to check whether (all) the elements of a collection fulfill a given condition. This can be done by using the following extension methods:

Quantifier operators

  

Contains

T

Checks whether a given element can be found in the collection.

Any

[T=>bool]

Is there any element (that fulfills the optional condition)?

All

T=>bool

Do all the elements fulfill the condition?



[10] In fact, group...by is also a projecting operator (such as select). However, the resulting collection is not “flat”, but is rather two-level.

[11] Comprehension syntax supports arbitrary many joins, one after the other.