Writings of Urda

Using LINQ to Extract Information from XML in C#

Yesterday I talked about using C# to extract information from a simple XML file. Well today we can take it one step further. Instead of using the regular XML library and commands, we can use LINQ to build a query to extract the information we desire, and place it into our object list.

In case you are not familiar with LINQ, here is a quick overview on it:

Language Integrated Query (LINQ, pronounced “link”) is a Microsoft .NET Framework component that adds native data querying capabilities to .NET languages.

LINQ defines a set of method names (called standard query operators, or standard sequence operators), along with translation rules from so-called query expressions to expressions using these method names, lambda expressions and anonymous types. These can, for example, be used to project and filter data in arrays, enumerable classes, XML (XLINQ), relational database, and third party data sources. Other uses, which utilize query expressions as a general framework for readably composing arbitrary computations, include the construction of event handlers or monadic parsers.

Source: Language Integrated Query - Wikipedia

So if we modify the previous program to use a LINQ statement instead, we can use logic and syntax that look a lot like a SQL statement. But instead of accessing a SQL database, we are instead polling an array or chunk of data. In our case we are using LINQ to run a query against a chunk of XML data.

So let’s cut to the chase, here is the modified source code from the last post:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Xml.Linq;

namespace UsingXML
{
    // We establish a generic Person Class, and define necessary methods
    class PersonObject
    {
        public string fname { get; set; }
        public string lname { get; set; }
        public int age { get; set; }
        public char gender { get; set; }

        public PersonObject()
        {
            this.fname = null;
            this.lname = null;
            this.age = 0;
            this.gender = '0';
        }
        public PersonObject(string f, string l, int a, char g)
        {
            this.fname = f;
            this.lname = l;
            this.age = a;
            this.gender = g;
        }
    }

    class ReadAndLoad
    {
        // Declare a public XDocument for use
        public static XDocument XDoc;

        static void Main(string[] args)
        {
            // Prompt the user for file path, provide the current local
            // directory if the user wants to use a relative path.
            Console.Write("Current Local Path: ");
            Console.WriteLine(Environment.CurrentDirectory);
            Console.Write("Path to file? > ");
            string UserPath = Console.ReadLine();

            // Try to open the XML file
            try
            {
                Console.WriteLine("\nNow Loading: {0}\n", UserPath);
                XDoc = XDocument.Load(@UserPath);
            }
            // Catch "File Not Found" errors
            catch (System.IO.FileNotFoundException)
            {
                Console.WriteLine("No file found!");
                Environment.Exit(1);
            }
            // Catch Argument Exceptions
            catch (System.ArgumentException)
            {
                Console.WriteLine("Invalid path detected!");
                Environment.Exit(1);
            }
            // Catach all other errors, and print them to console.
            catch (Exception err)
            {
                Console.WriteLine("An Exception has been caught:");
                Console.WriteLine(err);
                Environment.Exit(1);
            }

            // Define a new List, to store the objects we pull out of the XML
            List<PersonObject> PersonList = new List<PersonObject>();

            // Build a LINQ query, and run through the XML building
            // the PersonObjects
            var query = from xml in XDoc.Descendants("Person")
                        select new PersonObject
                        {
                            fname = (string)xml.Element("FirstName"),
                            lname = (string)xml.Element("LastName"),
                            age = (int)xml.Element("Age"),
                            gender = ((string)xml.Element("Gender") == "M" ?
                                'M' :
                                'F')
                        };
            PersonList = query.ToList();

            // How many PersonObjects did we find in the XML?
            int ListSize = PersonList.Count;

            // Handle statements for 0, 1, or many PersonObjects
            if (ListSize == 0)
            {
                Console.WriteLine("File contains no PersonObjects.\n");
                Environment.Exit(0);
            }
            else if (ListSize == 1)
                Console.WriteLine("Contains 1 PersonObject:\n");
            else
                Console.WriteLine("Contains {0} PersonObjects:\n", ListSize);

            // Loop through the list, and print all the PersonObjects to screen
            for (int i = 0; i < ListSize; i++)
            {
                Console.WriteLine(" PersonObject {0}", i);
                Console.WriteLine("------------------------------------------");
                Console.WriteLine(" First Name : {0}", PersonList[i].fname);
                Console.WriteLine(" Last Name  : {0}", PersonList[i].lname);
                Console.WriteLine(" Age ...... : {0}", PersonList[i].age);
                Console.WriteLine(" Gender ... : {0}", PersonList[i].gender);
                Console.Write("\n");
            }

            // ...and we are done!
            Environment.Exit(0);
        }
    }
}

But here is where the real magic is:

// Build a LINQ query, and run through the XML building
// the PersonObjects
var query = from xml in XDoc.Descendants("Person")
    select new PersonObject
    {
        fname = (string)xml.Element("FirstName"),
        lname = (string)xml.Element("LastName"),
        age = (int)xml.Element("Age"),
        gender = ((string)xml.Element("Gender") == "M" ? 'M' : 'F')
    };
PersonList = query.ToList();

Looks a lot like SQL huh? Well what this statement is doing is grabbing all the objects in the XML that are a “Person”. It then uses an empty PersonObject and defines each of the variables in the object. There is a logic statement inside the query to set the char for the gender in each object (since you cannot cast a string to a char in this instance) based on the string retrieved from the XML.

Now for comparison, let’s look at the difference between the two sources:

--- ReadAndLoad.cs	2010-08-31 22:34:53.082425983 -0400
+++ ReadAndLoad02.cs	2010-08-31 22:34:50.682829175 -0400
@@ -1,6 +1,7 @@
 ﻿using System;
 using System.Collections.Generic;
-using System.Xml;
+using System.Linq;
+using System.Xml.Linq;

 namespace UsingXML
 {
@@ -12,6 +13,13 @@
         public int age { get; set; }
         public char gender { get; set; }

+        public PersonObject()
+        {
+            this.fname = null;
+            this.lname = null;
+            this.age = 0;
+            this.gender = '0';
+        }
         public PersonObject(string f, string l, int a, char g)
         {
             this.fname = f;
@@ -23,6 +31,9 @@

     class ReadAndLoad
     {
+        // Declare a public XDocument for use
+        public static XDocument XDoc;
+
         static void Main(string[] args)
         {
             // Prompt the user for file path, provide the current local
@@ -31,15 +42,12 @@
             Console.WriteLine(Environment.CurrentDirectory);
             Console.Write("Path to file? > ");
             string UserPath = Console.ReadLine();
-
-            // Declare a new XML Document
-            XmlDocument XmlDoc = new XmlDocument();
-
+
             // Try to open the XML file
             try
             {
                 Console.WriteLine("\nNow Loading: {0}\n", UserPath);
-                XmlDoc.Load(UserPath);
+                XDoc = XDocument.Load(@UserPath);
             }
             // Catch "File Not Found" errors
             catch (System.IO.FileNotFoundException)
@@ -61,26 +69,23 @@
                 Environment.Exit(1);
             }

-            // Declare the xpath for finding objects inside the XML file
-            XmlNodeList XmlDocNodes = XmlDoc.SelectNodes("/People/Person");
-
             // Define a new List, to store the objects we pull out of the XML
             List<PersonObject> PersonList = new List<PersonObject>();

-            // Loop through the nodes, extracting Person information.
-            // We can then define a person object and add it to the list.
-            foreach (XmlNode node in XmlDocNodes)
-            {
-                int TempAge = int.Parse(node["Age"].InnerText);
-                char TempGender = node["Gender"].InnerText[0];
-
-                PersonObject obj = new PersonObject(node["FirstName"].InnerText,
-                                                    node["LastName"].InnerText,
-                                                    TempAge,
-                                                    TempGender);
-                PersonList.Add(obj);
-            }
-
+            // Build a LINQ query, and run through the XML building
+            // the PersonObjects
+            var query = from xml in XDoc.Descendants("Person")
+                        select new PersonObject
+                        {
+                            fname = (string)xml.Element("FirstName"),
+                            lname = (string)xml.Element("LastName"),
+                            age = (int)xml.Element("Age"),
+                            gender = ((string)xml.Element("Gender") == "M" ?
+                                'M' :
+                                'F')
+                        };
+            PersonList = query.ToList();
+
             // How many PersonObjects did we find in the XML?
             int ListSize = PersonList.Count;

And just in case you were unsure about the gender logic part of the query, I ran this dataset:

<?xml version="1.0" encoding="utf-8" ?>
<People>
  <Person>
    <Name>Urda</Name>
    <Age>21</Age>
    <Gender>M</Gender>
  </Person>
  <Person>
    <Name>White</Name>
    <Age>30</Age>
    <Gender>M</Gender>
  </Person>
  <Person>
    <Name>Smith</Name>
    <Age>25</Age>
    <Gender>F</Gender>
  </Person>
</People>

Pretty sweet huh? So in review, we built a query to run against XML, and used LINQ statements and methods to retrieve each portion of data that we wanted. All of this was then shoved into a list at the end for storing. So hopefully this gives you a quick intro to start messing around with LINQ if you are inclined to do so.

Extracting Information From XML With C#

XML is a wonderful way to store information that needs to be read in by a machine or piece of software. It is simple to follow, and you can use it to store and transmit your custom data structures and information across an internet connection or in between bits of software on a local machine. C# has methods built in that can read and write XML files. So today I have put together a little program that will extract a few objects from an XML file in C# for you to see.

Say we want to store simple C# objects that describe a person. A person in our case has a first name, last name, age, and gender. So we can structure our XML file as shown below.

2People.xml:

<?xml version="1.0" encoding="utf-8" ?>
<People>
  <Person>
    <FirstName>Test</FirstName>
    <LastName>Urda</LastName>
    <Age>21</Age>
    <Gender>M</Gender>
  </Person>
  <Person>
    <FirstName>Joe</FirstName>
    <LastName>White</LastName>
    <Age>30</Age>
    <Gender>M</Gender>
  </Person>
</People>

This XML is easy to follow, and it would be very simple for a person to just extract the information by hand in this case. But what if we had 100 Person Objects? 1,000? 1,000,000,000? It would be much easier if we could write some software to do the extraction of this information for us. If you were to write such software, it could very well look a lot like the following:

using System;
using System.Collections.Generic;
using System.Xml;

namespace UsingXML
{
    // We establish a generic Person Class, and define necessary methods
    class PersonObject
    {
        public string fname { get; set; }
        public string lname { get; set; }
        public int age { get; set; }
        public char gender { get; set; }

        public PersonObject(string f, string l, int a, char g)
        {
            this.fname = f;
            this.lname = l;
            this.age = a;
            this.gender = g;
        }
    }

    class ReadAndLoad
    {
        static void Main(string[] args)
        {
            // Prompt the user for file path, provide the current local
            // directory if the user wants to use a relative path.
            Console.Write("Current Local Path: ");
            Console.WriteLine(Environment.CurrentDirectory);
            Console.Write("Path to file? > ");
            string UserPath = Console.ReadLine();

            // Declare a new XML Document
            XmlDocument XmlDoc = new XmlDocument();

            // Try to open the XML file
            try
            {
                Console.WriteLine("\nNow Loading: {0}\n", UserPath);
                XmlDoc.Load(UserPath);
            }
            // Catch "File Not Found" errors
            catch (System.IO.FileNotFoundException)
            {
                Console.WriteLine("No file found!");
                Environment.Exit(1);
            }
            // Catch Argument Exceptions
            catch (System.ArgumentException)
            {
                Console.WriteLine("Invalid path detected!");
                Environment.Exit(1);
            }
            // Catach all other errors, and print them to console.
            catch (Exception err)
            {
                Console.WriteLine("An Exception has been caught:");
                Console.WriteLine(err);
                Environment.Exit(1);
            }

            // Declare the xpath for finding objects inside the XML file
            XmlNodeList XmlDocNodes = XmlDoc.SelectNodes("/People/Person");

            // Define a new List, to store the objects we pull out of the XML
            List<PersonObject> PersonList = new List<PersonObject>();

            // Loop through the nodes, extracting Person information.
            // We can then define a person object and add it to the list.
            foreach (XmlNode node in XmlDocNodes)
            {
                int TempAge = int.Parse(node["Age"].InnerText);
                char TempGender = node["Gender"].InnerText[0];

                PersonObject obj = new PersonObject(node["FirstName"].InnerText,
                                                    node["LastName"].InnerText,
                                                    TempAge,
                                                    TempGender);
                PersonList.Add(obj);
            }

            // How many PersonObjects did we find in the XML?
            int ListSize = PersonList.Count;

            // Handle statements for 0, 1, or many PersonObjects
            if (ListSize == 0)
            {
                Console.WriteLine("File contains no PersonObjects.\n");
                Environment.Exit(0);
            }
            else if (ListSize == 1)
                Console.WriteLine("Contains 1 PersonObject:\n");
            else
                Console.WriteLine("Contains {0} PersonObjects:\n", ListSize);

            // Loop through the list, and print all the PersonObjects to screen
            for (int i = 0; i < ListSize; i++)
            {
                Console.WriteLine(" PersonObject {0}", i);
                Console.WriteLine("------------------------------------------");
                Console.WriteLine(" First Name : {0}", PersonList[i].fname);
                Console.WriteLine(" Last Name  : {0}", PersonList[i].lname);
                Console.WriteLine(" Age ...... : {0}", PersonList[i].age);
                Console.WriteLine(" Gender ... : {0}", PersonList[i].gender);
                Console.Write("\n");
            }

            // ...and we are done!
            Environment.Exit(0);
        }
    }
}

In a nutshell this program follows a specific routine:

Declare a PersonObject class
Prompt user for a path, attempt to open the file or handle any exceptions.
Declare a C# list for storing the PersonObjects
Read in each node that matches from the XML document
Extract the key-value pairs and load them into the appropriate variables for each PersonObject
Display the results to the user, and exit

C# has plenty more methods for reading and handling XML files and XML based information. This is just a very basic example, but it does help a lot if you are just starting to learn C# programming and XML and handling. If you are looking for some further reading, you may want to read up on the XmlDocument Class over at MSDN.

Lambda Expressions and Delegates in C#

In a previous post I discussed chaining C# delegates together. In the source code example, I created a generic DelegateMath class to house a few basic math operations. This time we will replace those functions with simpler and shorter lambda expressions.

So what exactly is a lambda expression? What does it have to do with C#? Our friends over at MSDN have this to say:

A lambda expression is an anonymous function that can contain expressions and statements, and can be used to create delegates or expression tree types.

All lambda expressions use the lambda operator =>, which is read as “goes to”. The left side of the lambda operator specifies the input parameters (if any) and the right side holds the expression or statement block. The lambda expression x => x * x is read “x goes to x times x.”

Source: Lambda Expressions (C# Programming Guide)

If we wanted to build a simple delegate using a lambda expression, it could look something like this:

delegate int del(int i);
del ADelegate = x => x * x;
int j = ADelegate(5); // j = 25

So why use lambda expression with delegates at all? First of all they can be used anywhere an anonymous delegate can be used. The defining characteristic of lambda expressions is that they can be used with expression trees (which can then be used for LINQ and SQL purposes) while anonymous delegates cannot be used with expression trees.

If we take the same example from the previous post using delegates, we can modify it to use lambda expressions instead…

using System;

namespace DelegateChainWithLambda
{
    // Declare the delegate type
    delegate void UrdaDelegate(ref int x);

    public class DelegateChainExampleWithLambda
    {
        public static void Main(string[] args)
        {
            // Create delegate objects from lambda expressions.
            UrdaDelegate delegate01 = (ref int a) => { a = a + 1; };
            UrdaDelegate delegate02 = (ref int b) => { b = b * 2; };
            UrdaDelegate delegate03 = (ref int c) => { c = c + 3; };

            // Chain the delegates together. The variable manipulation will
            // start with delegate01, then delegate02, and end with delegate03.
            UrdaDelegate DelegateChain = delegate01 + delegate02 + delegate03;

            // Define our value, and build the expected result from it
            int value = 5;
            int expected = ((value + 1) * 2) + 3;

            // Build a string for explaining the output
            string ExplanationString = "\n";
            ExplanationString += "DelegateChain(5) should produce " + expected;
            ExplanationString += " since ((" + value + " + 1) * 2) + 3 = ";
            ExplanationString += expected;

            // Pass the value into the delegate chain
            DelegateChain(ref value);

            // ...and write the explanation and result to console!
            Console.WriteLine(ExplanationString);
            Console.WriteLine("RESULT: " + value + "\n");
        }
    }
}

…and when we run the program we get this output:

We still get the exact same output (cool story, I know) however we have removed the need for the math class, and made the code a little easier to follow.

Delegate Chain of Command

Another cool thing about delegates is the ability to chain them together. Say for example you have an object modification process, and you need a given object to be manipulated in a very specific order. Well you could use a delegate chain to accomplish that. For a simple example I have written up a C# delegate chain program that evaluates a mathematical expression following the order of operations by using a delegate chain.

using System;

namespace DelegateChain
{
    // Declare the delegate type
    delegate void UrdaDelegate(ref int x);

    // A general class to store methods for use by delegates
    public class DelegateMath
    {
        public static void AddOne(ref int a)
        {
            a = a + 1;
        }
        public static void TimesTwo(ref int b)
        {
            b = b * 2;
        }
        public static void PlusThree(ref int c)
        {
            c = c + 3;
        }
    }

    public class DelegateChainExample
    {
        public static void Main(string[] args)
        {
            // Create delegate objects based on our created methods.
            UrdaDelegate delegate01 = new UrdaDelegate(DelegateMath.AddOne);
            UrdaDelegate delegate02 = new UrdaDelegate(DelegateMath.TimesTwo);
            UrdaDelegate delegate03 = new UrdaDelegate(DelegateMath.PlusThree);

            // Chain the delegates together. The variable manipulation will
            // start with delegate01, then delegate02, and end with delegate03.
            UrdaDelegate DelegateChain = delegate01 + delegate02 + delegate03;

            // Define our value, and build the expected result from it
            int value = 5;
            int expected = ((value + 1) * 2) + 3;

            // Build a string for explaining the output
            string ExplanationString = "\n";
            ExplanationString += "DelegateChain(5) should produce " + expected;
            ExplanationString += " since ((" + value + " + 1) * 2) + 3 = ";
            ExplanationString += expected;

            // Pass the value into the delegate chain
            DelegateChain(ref value);

            // ...and write the explanation and result to console!
            Console.WriteLine(ExplanationString);
            Console.WriteLine("RESULT: " + value + "\n");
        }
    }
}

…and when we run the program we get this output:

Sure you could have just performed the mathematical expression by itself without writing all of that code. But the point is you are able to link delegates together, and perform manipulation on a given variable at each function in a specific order. This can be dragged out to more complex methods, where ordered execution is mission critical.

Testing and You

Today marked the beginning of my real co-op experience and work tasks here at Mercer. The morning started off with a conference call with dev-team members from China, India, and the rest of us here in Louisville via teleconferencing. The reason for the meeting was to demo the new features and functionality of the latest release for a Mercer product. Deployment will be occurring later this week, but before that could happen some last minute testing needed to be done.

I had the opportunity to get a rundown of the internals of the product, and from there performed testing on a new file manager to check for any missed bugs or problems. Throughout all of this I learned how Mercer developed any project starting from conception, to testing, and final delivery to production. While no single step is an easy task, Mercer’s method produces undeniable results and amazing software.

Newer Posts »