Blog postings

Adding recaptcha to contact form to thwart bot spam · by mark | 27 Jul 2022, 5:05 p.m.

I have been inundated with spam through my contact form recently. Mostly benign offers of web design and so on. But I still don't want them. So I elected to install a reCAPTCHA service so they have to convince Google they're human before sending me a mail I just instantly delete. 

This was quite straightforward in the end. 

Install a package

Some kind soul has already done the work in setting up a Django form field which talks to Google for you. So all you need to do is pipenv install django-recaptcha. I'd also update the lock and requirements files; pipenv lock && pipenv lock -r > requirements.txt. Make sure to enable the package in config.py by installing its app. It is called captcha

Generate secrets

You can use the Google recaptcha admin console to register your domain with Google and also generate a public / private key pair. I'd add 127.0.0.1 to the domain list so you can test it locally a well. The secrets are to be called RECAPTCHA_PRIVATE_KEY and RECAPTCHA_PUBLIC_KEY

Add to form

You need to add the Recaptcha box to your form. This is very easy. First, in forms.py, load the objects:

from captcha.fields import ReCaptchaField
from captcha.widgets import ReCaptchaV2Checkbox

And then add this as a field. I have a ModelForm that drives the Contact app, so this was adding one line after I declared it:

class ContactForm(ModelForm):
    captcha = ReCaptchaField(widget=ReCaptchaV2Checkbox)

I also use Crispy Forms, so I just added captcha as a Field to its Layout object. 

Add invalid handling

Now the package does all the Google talking for you and the validate function will fail if Google thinks you are a robot. This is when you have to do the image clicking stuff. I wanted to communicate this to the user in case they are acting like a robot. So I added an invalid form handler to the view that my form posts to:

    def form_invalid(self,form):
        messages.add_message(self.request, messages.ERROR, form.errors)
        return super().form_invalid(form)

That was it. Already seen in the logs someone try to post me a comment and it not show up (as the form was invalid so the instance wasn't saved). I'm sure some guys will still spam me stuff out of spite but you cannot have everything. 

 

It ain’t half hot here · by mark | 19 Jul 2022, 1:34 p.m.

I used to live in the Middle East; this is worse due to poor AC

Distressing temperature

 

Plain text sitemaps in Django · by mark | 16 Jul 2022, 9:04 a.m.

I saw yet another 404. Something was looking for sitemap.txt. This is just a plain text file with a list of URLs, one per line. Putting this in was straightforward if you already have a sitemap.xml set up.

What you do is send the sitemaps object (with all of the URLs encoded in it) through a template that renders each of the items one per line. And then you make sure it is marked as returning plain text otherwise your browser will issue very peculiar HTML rendering errors. 

The first thing we need to do is set up the sitemap.txt URL. This is straightforward if you already have sitemap.xml set up as you will already have a sitemaps object with the lists in it. So the new URL looks like

    path('sitemap.txt', sitemap, 
        {'sitemaps':sitemaps, 'template_name':'sitemap.txt','content_type':'text/plain'},
        name='django.contrib.sitemaps.views.sitemap.txt',),

The template is also very straightforward. Note careful line breaks to ensure a plain list:

{% for url in urlset %}{{ url.location }}
{% endfor %}

And now you have a plain text list of URLs that some web crawlers seem to prefer. Look at mine!

 

A new kind of sitemap: the index · by mark | 14 Jul 2022, 8:44 p.m.

I saw some 404s looking for a file called sitemap_index.xml in the root. Intrigued, I hit google. This turns out to be basically a list of other sitemaps that the web crawlers can use to build a list of urls.

I prefer not to see 404s in my logs beyond obvious bots looking for security holes. So how do we create this? It's actually incredibly easy in Django and boils down to copy and pasting what the documentation says. Of course the documentation doesn't make it clear that is what you do; odd as the docs around creating a vanilla sitemap.xml that has everything in it is crystal clear!

So. Assuming you have done the first bit and have a sitemap.xml set up using the docs for something like posts and something like static pages, you will already have a sitemap dictionary where the keys point to sitemap objects. 

sitemaps = {
    'blog': ArticleSitemap,
    'pages': StaticSitemap
}

You will also already have a url that looks like

    path('sitemap.xml', sitemap, {'sitemaps':sitemaps},
        name='django.contrib.sitemaps.views.sitemap'),

To get the index working do the following. In your urls make sure you import 

from django.contrib.sitemaps.views import sitemap, index

Now add the following two urls:

    path('sitemap_index.xml', index, {'sitemaps': sitemaps},
         name='django.contrib.sitemaps.views.index'),
    path('sitemap-<section>.xml', sitemap, {'sitemaps': sitemaps},          
         name='django.contrib.sitemaps.views.sitemap'),

You are now done. If you look at sitemap_index.xml you get links to the blog posts and static pages (and whatever else you added). The original sitemap.xml continues to work as welll. 

Very easy!

 

Syndication via Atom and RSS using Django · by mark | 13 Jul 2022, 9:28 p.m.

I noticed a bunch of web crawlers getting 404s looking for atom.xml. I'd never heard of this but it is some kind of automatic feed that can be used to push updates to people. So, joining the likes of the BBC, you can now subscribe to my rubbishy posts using a news reader. 

I knew about RSS from around 20 years ago when I used to use it to have news feeds. There was some drama and the Atom specification came out. They're kind of the same thing to most people. 

Django makes it dead easy to syndicate your app. Like very easy. You can almost literally copy and paste this snippet from the django docs into a new file (I called it feeds.py).You'll need to make a few small changes. Import your model instead of the example one, and make some appropriate text changes. 

There are two changes to the default set up I would recommend. Provide appropriate publish and edit dates. Otherwise it seems to use the current date and time which messes up feeds. If your fields are named the same as mine you need the two following methods added.

    def item_pubdate(self, item):
        return item.date

    def item_updateddate(self, item):
        return item.edited_date

You then need to define urls for your new objects. Which is easy:

    path('atom.xml',AtomSiteNewsFeed()),
    path('rss.xml',RssSiteNewsFeed()),

And now you can get feeds via Atom or RSS. Hooray!