Learning never exhausts the mind
Home >  Coding > C# ASP.Net MVC > Simple XML Parser in C#

Last Updated on by

In this article we will look at how to read a websites Sitemap.XML with C# and parse it's contents.

An XML Sitemap is a specially structured XML file which provides important structural information of a website to search engine crawlers for indexing purposes. The basic sitemap structure looks like this.

<urlset xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns="http://www.sitemaps.org/schemas/sitemap/0.9" xsi:schemalocation="http://www.sitemaps.org/schemas/sitemap/0.9 http://www.sitemaps.org/schemas/sitemap/0.9/sitemap.xsd">

<url>
<loc>https://lonewolfonline.net/</loc>
<priority>1.0</priority>
<lastmod>2010-09-14</lastmod>
<changefreq>daily</changefreq>
</url>

<url>
<loc>https://lonewolfonline.net/simple-xml-parser/</loc>
<priority>0.5</priority>
<lastmod>2009-09-14</lastmod>
<changefreq>monthly</changefreq>
</url>

</urlset>

Individual <url> tags are wrapped inside the containing <urlset> nodes. Each <url> represents a page on the site. Inside the <url> node, are four nodes.

The <loc> node represents the page url.

The <priority> node represents the webmaster defined site map priority.

The <lastmod> node represents the date which the page was last modified..

The <changefreq> node indicates how often the page is updated and makes a suggestion to the search engine how often to crawl it again.

Writing a Simple XML Parser in C#

For this example I am creating a small console application and outputting the resutls to the screen. I am also reading the sitemap from a file, but you can just as easily download files from a website instead.

You can also download a sample project from Github.

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using System.Xml;

namespace SitemapXMLParser
{
    class Program
    {
        static void Main(string[] args)
        {
            XmlDocument urldoc = new XmlDocument();
            urldoc.Load("Sitemap.xml");

            XmlNodeList xnList = urldoc.GetElementsByTagName("url");

            foreach (XmlNode node in xnList)
            {
                Console.WriteLine("url " + node["loc"].InnerText);
                Console.WriteLine("priority " + node["priority"].InnerText);
                Console.WriteLine("last modified " + node["lastmod"].InnerText);
                Console.WriteLine("change frequency " + node["changefreq"].InnerText);
                Console.WriteLine(Environment.NewLine);
            }
        }
    }
}
Download from GitHub

Leave a Reply

Fields marked with * are mandatory.

We respect your privacy, and will not make your email public. Hashed email address may be checked against Gravatar service to retrieve avatars. This site uses Akismet to reduce spam. Learn how your comment data is processed.