Blog postings

It ain’t half hot here · by mark | 19 Jul 2022, 1:34 p.m.

I used to live in the Middle East; this is worse due to poor AC

Distressing temperature

 

Plain text sitemaps in Django · by mark | 16 Jul 2022, 9:04 a.m.

I saw yet another 404. Something was looking for sitemap.txt. This is just a plain text file with a list of URLs, one per line. Putting this in was straightforward if you already have a sitemap.xml set up.

What you do is send the sitemaps object (with all of the URLs encoded in it) through a template that renders each of the items one per line. And then you make sure it is marked as returning plain text otherwise your browser will issue very peculiar HTML rendering errors. 

The first thing we need to do is set up the sitemap.txt URL. This is straightforward if you already have sitemap.xml set up as you will already have a sitemaps object with the lists in it. So the new URL looks like

    path('sitemap.txt', sitemap, 
        {'sitemaps':sitemaps, 'template_name':'sitemap.txt','content_type':'text/plain'},
        name='django.contrib.sitemaps.views.sitemap.txt',),

The template is also very straightforward. Note careful line breaks to ensure a plain list:

{% for url in urlset %}{{ url.location }}
{% endfor %}

And now you have a plain text list of URLs that some web crawlers seem to prefer. Look at mine!

 

A new kind of sitemap: the index · by mark | 14 Jul 2022, 8:44 p.m.

I saw some 404s looking for a file called sitemap_index.xml in the root. Intrigued, I hit google. This turns out to be basically a list of other sitemaps that the web crawlers can use to build a list of urls.

I prefer not to see 404s in my logs beyond obvious bots looking for security holes. So how do we create this? It's actually incredibly easy in Django and boils down to copy and pasting what the documentation says. Of course the documentation doesn't make it clear that is what you do; odd as the docs around creating a vanilla sitemap.xml that has everything in it is crystal clear!

So. Assuming you have done the first bit and have a sitemap.xml set up using the docs for something like posts and something like static pages, you will already have a sitemap dictionary where the keys point to sitemap objects. 

sitemaps = {
    'blog': ArticleSitemap,
    'pages': StaticSitemap
}

You will also already have a url that looks like

    path('sitemap.xml', sitemap, {'sitemaps':sitemaps},
        name='django.contrib.sitemaps.views.sitemap'),

To get the index working do the following. In your urls make sure you import 

from django.contrib.sitemaps.views import sitemap, index

Now add the following two urls:

    path('sitemap_index.xml', index, {'sitemaps': sitemaps},
         name='django.contrib.sitemaps.views.index'),
    path('sitemap-<section>.xml', sitemap, {'sitemaps': sitemaps},          
         name='django.contrib.sitemaps.views.sitemap'),

You are now done. If you look at sitemap_index.xml you get links to the blog posts and static pages (and whatever else you added). The original sitemap.xml continues to work as welll. 

Very easy!

 

Syndication via Atom and RSS using Django · by mark | 13 Jul 2022, 9:28 p.m.

I noticed a bunch of web crawlers getting 404s looking for atom.xml. I'd never heard of this but it is some kind of automatic feed that can be used to push updates to people. So, joining the likes of the BBC, you can now subscribe to my rubbishy posts using a news reader. 

I knew about RSS from around 20 years ago when I used to use it to have news feeds. There was some drama and the Atom specification came out. They're kind of the same thing to most people. 

Django makes it dead easy to syndicate your app. Like very easy. You can almost literally copy and paste this snippet from the django docs into a new file (I called it feeds.py).You'll need to make a few small changes. Import your model instead of the example one, and make some appropriate text changes. 

There are two changes to the default set up I would recommend. Provide appropriate publish and edit dates. Otherwise it seems to use the current date and time which messes up feeds. If your fields are named the same as mine you need the two following methods added.

    def item_pubdate(self, item):
        return item.date

    def item_updateddate(self, item):
        return item.edited_date

You then need to define urls for your new objects. Which is easy:

    path('atom.xml',AtomSiteNewsFeed()),
    path('rss.xml',RssSiteNewsFeed()),

And now you can get feeds via Atom or RSS. Hooray!

 

I have achieved peak sunburn today · by mark | 10 Jul 2022, 8:07 p.m.

I emerged in a Celtic nation. This means I have no Sun tolerance. It was pushing 30C this weekend. I am redder than a well cooked lobster. 

A sun that seemingly never sets