Skip to content
Back to Blog
xml sitemapsitemap.xmltechnical seogoogle search consolecrawling

How to Create an XML Sitemap Manually

Build a valid sitemap.xml by hand without any plugin: the exact urlset syntax, which tags Google actually reads, how to validate, and how to submit it to Search Console.

SZ
Founder, Molixa
12 min read
Share
How to Create an XML Sitemap Manually
Table of contents8 sections

To create an XML sitemap manually, you write a plain sitemap.xml file that lists every important URL on your site inside a <urlset> element, save it to your site root, and submit it in Google Search Console. No plugin, no CMS, no build tool required. The file is just text, and once you understand the four tags that matter, you can hand-write a valid sitemap in a few minutes.

This guide is for people on hand-coded sites, static-site generators, or frameworks where you do not have a one-click sitemap button. You will get the exact syntax, the tags Google actually reads (and the ones it quietly ignores), how to validate before you submit, and when a single file is no longer enough.

What an XML Sitemap Actually Is#

An XML sitemap is a machine-readable list of the URLs you want search engines to know about. It does not boost rankings on its own. What it does is help crawlers discover pages faster and understand your site structure, which matters most for new sites, large sites, and pages that are not well linked internally.

Think of it as a table of contents you hand to Googlebot. The crawler still decides what to index, but you have removed the excuse of "we never found that page." For small, well-linked sites Google will usually find everything anyway, but a sitemap removes guesswork and speeds up discovery of fresh content.

Key point: a sitemap is a discovery aid, not an indexing guarantee. Listing a URL does not force Google to index it. A page can sit in your sitemap and still be marked "Crawled, not indexed" if Google judges it low value.

The Minimum Valid sitemap.xml#

Here is the smallest sitemap that passes validation. Every manual sitemap is a variation on this skeleton.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
  </url>
  <url>
    <loc>https://example.com/about</loc>
  </url>
  <url>
    <loc>https://example.com/blog/first-post</loc>
  </url>
</urlset>

Three rules make this work, and breaking any one of them is the usual cause of a "sitemap could not be read" error:

  • The XML declaration and the xmlns namespace on <urlset> are mandatory. Leave the namespace off and parsers reject the file.
  • Every <loc> must be a full absolute URL including the protocol (https://), not /about or example.com/about.
  • The file must be UTF-8 encoded and the URLs must be properly escaped (more on the ampersand trap below).

The four sitemap tags, ranked by how much Google cares#

This is the part most tutorials get wrong. The sitemap protocol defines four child tags inside each <url>, but Google treats them very differently in 2026.

TagWhat it meansDoes Google use it?
<loc>The page URLYes. Required. This is the whole point.
<lastmod>Last meaningful modification dateYes, if it is accurate and honest.
<changefreq>How often the page changesNo. Google has said it ignores this.
<priority>Relative importance, 0.0 to 1.0No. Google ignores this too.

Google's own documentation confirms it reads <loc> and uses <lastmod> as a crawl signal, while <changefreq> and <priority> are effectively dead values. Other engines may still glance at them, so they are harmless to include, but do not spend a second tuning a priority of 0.8 versus 0.7. It changes nothing for Google.

The one tag worth your attention is <lastmod>. Set it to a real ISO 8601 date like 2026-06-25 or 2026-06-25T14:30:00+00:00. Crucially, only update it when the page content genuinely changes. If every URL claims it was modified today on every crawl, Google learns to distrust your dates and ignores the signal entirely.

How to Create an XML Sitemap Manually, Step by Step#

Here is the full hand-coded workflow, from a blank file to a submitted, validated sitemap. Knowing how to create an XML sitemap manually means you never depend on a plugin that might break, lag behind, or list URLs you never wanted crawled.

Step 1: List every URL you want indexed#

Open a spreadsheet or text file and write out the canonical URL for each page you want in search results. Include your homepage, key landing pages, blog posts, and product pages.

Deliberately leave out anything you do not want surfaced: thank-you pages, internal search results, tag archives, paginated duplicates, login screens, and any URL that returns a noindex tag or a non-200 status. A sitemap full of redirects, 404s, and noindexed pages sends mixed signals and is the most common reason Search Console reports sitemap warnings.

Step 2: Wrap each URL in the urlset structure#

Create a file named exactly sitemap.xml. Add the XML declaration and the opening <urlset> tag with the namespace, then wrap each URL from your list in a <url> / <loc> pair.

<?xml version="1.0" encoding="UTF-8"?>
<urlset xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <url>
    <loc>https://example.com/</loc>
    <lastmod>2026-06-25</lastmod>
  </url>
  <url>
    <loc>https://example.com/pricing</loc>
    <lastmod>2026-06-20</lastmod>
  </url>
</urlset>

Watch the special characters. Inside a <loc>, an ampersand must be written as &amp;, not a bare &. So a URL like https://example.com/search?q=a&page=2 becomes https://example.com/search?q=a&amp;page=2. A single unescaped ampersand will break the entire file. (As a rule, parameterized URLs rarely belong in a sitemap anyway.)

Step 3: Save it to your site root#

Upload sitemap.xml to your domain root so it lives at https://example.com/sitemap.xml. Root placement is the convention crawlers expect, and it also defines the scope: a sitemap at the root can list URLs across the whole domain.

A sitemap can technically live in a subdirectory, but then it is only trusted for URLs in that same path. Keep it at the root unless you have a specific reason not to. Confirm it loads by opening the URL in a browser. You should see raw XML, not a 404 or your site's HTML.

Step 4: Point robots.txt at the sitemap#

Add a one-line directive to your robots.txt file so any crawler that reads it discovers the sitemap automatically:

Sitemap: https://example.com/sitemap.xml

Use the full absolute URL, and place the line anywhere in the file (it is not tied to any user-agent block). If you do not have a robots file yet, our free robots.txt generator builds a valid one with the sitemap line already wired in.

Step 5: Validate before you submit#

Do not submit a sitemap you have not validated. A single malformed tag can cause Google to reject the whole file silently. Run the raw XML through a validator that checks both well-formedness (proper tags, escaped characters) and protocol compliance (correct namespace, absolute URLs, size limits).

The fastest path is to paste your URL into the free XML sitemap generator and validator at Molixa. It checks your structure, flags unescaped characters and relative URLs, and will even build a clean sitemap for you if you would rather not maintain the file by hand. Validating here first saves you the round trip of submitting a broken file and waiting for Search Console to complain.

Step 6: Submit it in Google Search Console#

Open Search Console, choose your property, and go to the Sitemaps report in the left menu. Enter sitemap.xml (the path relative to your domain) and click Submit.

Google will fetch the file, report how many URLs it discovered, and flag any parse errors. Submission does not trigger instant indexing. Expect Google to recrawl over days, not minutes. You can submit the same sitemap to Bing Webmaster Tools the same way, and the robots.txt line covers crawlers that never see your webmaster dashboards.

The 50,000 URL Limit and Sitemap Index Files#

A single sitemap file has two hard caps set by the protocol: a maximum of 50,000 URLs and a maximum uncompressed file size of 50 MB. Hit either limit and you must split your sitemap, then tie the pieces together with a sitemap index file.

This is the moment most plugin-only guides skip entirely. A sitemap index is a sitemap of sitemaps. Instead of listing pages, it lists your individual sitemap files.

<?xml version="1.0" encoding="UTF-8"?>
<sitemapindex xmlns="http://www.sitemaps.org/schemas/sitemap/0.9">
  <sitemap>
    <loc>https://example.com/sitemap-posts.xml</loc>
    <lastmod>2026-06-25</lastmod>
  </sitemap>
  <sitemap>
    <loc>https://example.com/sitemap-products.xml</loc>
    <lastmod>2026-06-24</lastmod>
  </sitemap>
</sitemapindex>

Note the differences: the wrapper is <sitemapindex> not <urlset>, and each entry uses <sitemap> instead of <url>. You submit only the index file in Search Console, and Google fetches every child sitemap it references.

Even if you are nowhere near 50,000 URLs, splitting by content type (posts, products, pages) is a smart move. When Search Console reports a coverage issue, an index lets you see which segment of the site has the problem instead of staring at one giant file. You can also gzip any sitemap to sitemap.xml.gz to stay under the size cap; the URL count limit still applies to the uncompressed content.

Common Manual Sitemap Mistakes#

These are the errors that turn a five-minute task into an afternoon of debugging. Most show up as a vague "couldn't fetch" or "has errors" message in Search Console.

  • Relative URLs. <loc>/about</loc> is invalid. Always use the full https:// URL.
  • Mixed protocols or hosts. Do not list http:// and https:// versions, or www and non-www. Pick your canonical version and list only that.
  • Listing non-canonical or noindex URLs. If a page canonicalizes elsewhere or carries a noindex, leaving it in the sitemap contradicts your own signals.
  • A bare ampersand. The single most common parse failure. Escape it as &amp;.
  • Stale or fake lastmod dates. Bumping every date to "today" on every deploy teaches Google to ignore the field.
  • Forgetting to update the file. A hand-coded sitemap does not update itself. New pages will not appear until you add them, which is the main downside of the manual approach versus a generator.

Warning: never include URLs you have blocked in robots.txt. Telling Google "crawl this" in the sitemap while telling it "do not crawl this" in robots.txt is a contradiction that wastes crawl budget and triggers warnings.

Manual vs Generated: Which Should You Use#

Hand-coding a sitemap is perfect for small, stable sites and for understanding exactly what the file does. The tradeoff is maintenance: every new page is a manual edit, and a typo can break the whole file silently.

For sites that publish regularly, a generator that crawls your live site and rebuilds the file automatically removes the human error and the upkeep. The middle ground many people land on is to generate the file with a tool, then read the output so you actually understand what shipped. Pairing a clean sitemap with solid on-page structure, like proper schema markup for rich results, is what compounds your technical SEO over time.

Conclusion#

Now you know how to create an XML sitemap manually: list your indexable URLs, wrap each in a <url> and <loc> inside a namespaced <urlset>, keep <lastmod> honest, ignore <priority> and <changefreq>, validate the file, drop the sitemap line into robots.txt, and submit it in Search Console. Split into a sitemap index once you cross 50,000 URLs or 50 MB.

The file is genuinely simple once you strip away the plugin mystique. Build it once by hand to learn the shape, then decide whether to keep maintaining it manually or let the XML sitemap generator keep it current for you and validate it on every change.

Frequently Asked Questions#

Do I need an XML sitemap if my site is small? Not strictly. Google can usually discover every page on a small, well-linked site through normal crawling. A sitemap still helps by speeding up discovery of new content and removing any doubt about which URLs you consider important, so it is low effort and worth having even on a handful of pages.

Does adding a URL to my sitemap guarantee it gets indexed? No. A sitemap is a discovery aid, not an indexing command. Google decides independently whether a page is worth indexing based on quality and uniqueness. A URL can appear in your sitemap and still show as "Crawled, not indexed" or "Discovered, not indexed" in Search Console.

Should I set priority and changefreq on my URLs? You can, but Google ignores both. It reads <loc> and uses <lastmod> when the dates are accurate, while <changefreq> and <priority> have no effect on Google's crawling or ranking. They are harmless to include for other engines, but do not waste time tuning them.

Where should the sitemap.xml file live? Put it at your domain root, so it is reachable at https://yourdomain.com/sitemap.xml. Root placement lets a single sitemap cover URLs across your whole domain and matches what crawlers expect. Add a Sitemap: line to robots.txt and submit the URL in Search Console so it is discovered through every channel.

How do I validate my sitemap before submitting it? Check both that the XML is well-formed (correct tags, escaped ampersands, UTF-8 encoding) and that it follows the protocol (the right namespace, absolute URLs, under the 50,000 URL and 50 MB limits). Paste your sitemap into a validator such as the Molixa sitemap tool, fix any flagged issues, then submit the clean file in Google Search Console.

What happens when I have more than 50,000 URLs? Split your URLs across multiple sitemap files, each under 50,000 URLs and 50 MB, then create a sitemap index file that lists those child sitemaps using <sitemapindex> and <sitemap> entries. Submit only the index file to Search Console, and Google will fetch every sitemap it references.

xml sitemapsitemap.xmltechnical seogoogle search consolecrawling

More from Molixa

Try Molixa Tools

50+ free AI tools for content creation, SEO, coding, and more. No signup, no watermark.

Explore all tools