Automating Content Import in Wagtail

December 6, 2025 · 10 min read★ Featured

A guide for importing markdown documents into Wagtail automatically from code

Automating Content Import in Wagtail: From Markdown to CMS

When building a blog or content site with Wagtail, there's a fundamental question: should you create content through the CMS interface, or can you automate it? The answer is both—and having a custom Django management command bridges these worlds beautifully.

The Problem: Manual Content Creation Doesn't Scale

Imagine you're migrating from an existing blog, importing documentation, or working with content that lives in version control. Creating each post manually through the Wagtail admin becomes tedious:

  1. Navigate to the admin
  2. Click "Add child page"
  3. Copy-paste content
  4. Format everything
  5. Fill in metadata
  6. Save and publish
  7. Repeat for dozens or hundreds of posts

There has to be a better way.

The Solution: Django Management Commands

Django's management command system lets you create custom CLI tools that interact with your application. Think of commands like python manage.py migrate or python manage.py createsuperuser—you can build your own.

For content import, we can create a command that:

  • Reads markdown files from a directory
  • Parses frontmatter metadata
  • Creates or updates Wagtail pages programmatically
  • Handles publishing automatically

Understanding the Import Command

Let me walk you through how this works, using a real import command as an example.

The Basic Structure

from django.core.management.base import BaseCommand

class Command(BaseCommand):
    help = "Import Markdown files into BlogPage entries."
    
    def add_arguments(self, parser):
        # Define command-line arguments
        parser.add_argument("path", type=str)
        
    def handle(self, *args, **options):
        # Your import logic goes here
        pass

Every Django management command:

  • Extends BaseCommand
  • Defines a help string (shown in --help)
  • Uses add_arguments() to define CLI options
  • Implements handle() with the main logic

Markdown + Frontmatter: Content as Code

The key insight is storing blog posts as markdown files with YAML frontmatter:

---
title: "My Blog Post"
intro: "A brief description"
date: 2025-01-15
reading_time: 5
slug: my-blog-post
---

# Main Content

Your markdown content goes here...

This format is:

  • Version control friendly - Git diffs work perfectly
  • Editor-agnostic - Write in VS Code, Obsidian, or any text editor
  • Portable - Easy to migrate between systems
  • Readable - No database required to read your content

Parsing with Python Frontmatter

The python-frontmatter library makes parsing trivial:

import frontmatter

post = frontmatter.loads(file.read_text())

# Access metadata
title = post.get("title")
date = post.get("date")

# Access content
content = post.content

Creating Pages Programmatically

Here's where it gets interesting. Wagtail pages are just Django models, which means you can create them with Python code:

from blog.models import BlogPage, BlogIndexPage

# Find the parent page (blog index)
blog_index = BlogIndexPage.objects.get(slug="blog")

# Create a new blog post
page = BlogPage(
    title="My Post",
    slug="my-post",
    intro="Brief description",
    date=date_value,
    body=[("markdown", content)],
)

# Add as child of blog index
blog_index.add_child(instance=page)

# Save and publish
revision = page.save_revision()
revision.publish()

This is exactly what happens when you click "Publish" in the admin—we're just doing it with code.

The Complete Import Workflow

Let's trace through what happens when you run the import command:

Step 1: Command Invocation

python manage.py import_markdown posts/ --format=markdown --publish

What this does:

  • posts/ - Directory containing markdown files
  • --format=markdown - Keep content as markdown (vs converting to HTML)
  • --publish - Publish immediately (vs save as draft)

Step 2: Find or Create Blog Index

def get_or_create_blog_index_by_slug(slug: str) -> BlogIndexPage:
    existing_index = BlogIndexPage.objects.filter(slug=slug).first()
    if existing_index:
        return existing_index
    
    # Create new index if it doesn't exist
    home = Page.objects.get(slug="home")
    index = BlogIndexPage(title=title, slug=slug)
    home.add_child(instance=index)
    index.save_revision().publish()
    
    return index

This ensures we have a parent page to add blog posts to. If you import to a blog index that doesn't exist, it creates one automatically.

Step 3: Process Each Markdown File

for md_file in md_dir.glob("*.md"):
    # Parse the file
    post = frontmatter.loads(md_file.read_text())
    
    # Extract metadata
    title = post.get("title", md_file.stem)
    intro = post.get("intro", "")
    date_value = post.get("date", None)
    slug_value = post.get("slug", slugify(title))

The command walks through every .md file in the directory, parses frontmatter, and extracts metadata with sensible defaults:

  • No title? Use the filename
  • No slug? Generate from title
  • No date? Use None (can be set later)

Step 4: Create or Update

Here's the clever part—the command handles both new posts and updates:

existing_page = BlogPage.objects.filter(slug=slug_value).first()

if existing_page:
    # Update existing page
    page = existing_page
    page.title = title
    page.body = body_content
    # ... update other fields
    
    page.save_revision()
    # Output: "Updated: My Post"
else:
    # Create new page
    page = BlogPage(
        title=title,
        slug=slug_value,
        body=body_content,
    )
    
    blog_index.add_child(instance=page)
    page.save_revision()
    # Output: "Created: My Post"

This means you can:

  1. Run the import to create posts
  2. Edit markdown files
  3. Run import again to update existing posts
  4. No duplicates, no manual cleanup

Step 5: Publishing

if publish:
    revision.publish()

Wagtail's revision system means every save creates a draft. Publishing is a separate step, which the --publish flag triggers automatically.

Without --publish, posts are created as drafts—useful for reviewing before making them public.

Advanced Features

Format Selection: Markdown vs HTML

if format_type == "markdown":
    body_content = [("markdown", post.content)]
else:
    body_html = markdown.markdown(post.content, extensions=[...])
    body_content = [("html", body_html)]

Why this matters:

Markdown format (recommended):

  • Content stays editable in Wagtail admin
  • Can switch between markdown/rich text later
  • Original formatting preserved
  • Better for collaboration

HTML format:

  • Pre-rendered, slightly faster
  • No markdown dependency in production
  • Loses editability
  • Good for static content that won't change

Skip Protection

if not overwrite and existing_page.has_unpublished_changes:
    # Skip this page
    skipped += 1
    continue

This prevents accidentally overwriting edits made through the admin that haven't been published yet. Without --overwrite, the command is safe to run repeatedly.

Date Parsing

if date_value and isinstance(date_value, str):
    try:
        date_value = datetime.strptime(date_value, "%Y-%m-%d").date()
    except ValueError:
        date_value = None

Handles dates in frontmatter as strings and converts them to Python date objects. If the format is wrong, it gracefully falls back to None rather than crashing.

Real-World Use Cases

Use Case 1: Blog Migration

You're moving from Jekyll, Hugo, or another static site generator:

# Copy your old posts to a directory
cp ~/old-blog/_posts/*.md ./import/

# Import them all at once
python manage.py import_markdown import/ --publish

Result: Entire blog imported in seconds, not hours.

Use Case 2: Version-Controlled Content

Your content lives in Git alongside your code:

my-blog/
├── content/
│   ├── 2025-01-01-first-post.md
│   ├── 2025-01-15-second-post.md
│   └── 2025-02-01-third-post.md
└── ...

Workflow:

  1. Write posts in your favorite editor
  2. Commit to Git
  3. Push to repository
  4. Run import command in production
  5. Content is live

This enables:

  • Code review for blog posts (via pull requests)
  • Change history (Git blame shows who wrote what)
  • Rollbacks (revert content like you revert code)
  • Branching (draft posts in feature branches)

Use Case 3: Bulk Content Updates

You need to update metadata across many posts:

# Edit frontmatter in all markdown files
sed -i 's/reading_time: 5/reading_time: 8/' content/*.md

# Re-import to update all posts
python manage.py import_markdown content/ --overwrite --publish

Try doing that through the admin interface!

Use Case 4: Content Pipeline

You generate content programmatically:

# generate_posts.py
for topic in topics:
    content = generate_content(topic)
    
    with open(f"auto-posts/{topic.slug}.md", "w") as f:
        f.write(f"""---
title: {topic.title}
intro: {topic.description}
date: {today}
---

{content}
""")

Then import:

python generate_posts.py
python manage.py import_markdown auto-posts/ --publish

Perfect for documentation, reports, or AI-generated content.

Best Practices

1. Keep Markdown Files Organized

content/
├── 2025/
│   ├── 01/
│   │   ├── first-post.md
│   │   └── second-post.md
│   └── 02/
│       └── third-post.md
└── drafts/
    └── work-in-progress.md

Organize by date, category, or status. Import specific directories as needed.

2. Use Consistent Frontmatter

Create a template for new posts:

---
title: ""
intro: ""
date: YYYY-MM-DD
reading_time: 5
slug: ""
---

# 

This ensures all required fields are present.

3. Test Before Publishing

# First run without --publish to create drafts
python manage.py import_markdown content/

# Review in admin
# Then publish
python manage.py import_markdown content/ --publish

4. Use Meaningful Slugs

---
slug: understanding-wagtail-import-commands
---

Explicit slugs prevent collisions and give you clean URLs.

5. Handle Images Carefully

If your markdown references images:

![Alt text](images/photo.jpg)

Make sure to:

  • Upload images to Wagtail's media library
  • Update paths or use Wagtail's image handling
  • Or use absolute URLs to external CDN

Extending the Command

The import command is just Python code—you can extend it:

Add Author Support

author_name = post.get("author")
if author_name:
    author = User.objects.get(username=author_name)
    page.owner = author

Import Categories/Tags

tags = post.get("tags", [])
for tag_name in tags:
    tag, _ = Tag.objects.get_or_create(name=tag_name)
    page.tags.add(tag)

Process Images

from wagtail.images.models import Image

for image_path in extract_images(post.content):
    image = Image.objects.create(
        title=image_path.stem,
        file=image_path,
    )

Generate SEO Metadata

from wagtail.contrib.settings.models import BaseSiteSetting

page.search_description = post.get("description", intro[:155])
page.seo_title = post.get("seo_title", title)

The Command Line Interface

Here's the full interface:

python manage.py import_markdown <path> [options]

Arguments:
  path                  Directory containing .md files

Options:
  --slug SLUG          Slug of BlogIndexPage (default: "blog")
  --publish            Publish pages immediately
  --format {markdown,html}  Import format (default: markdown)
  --overwrite          Overwrite pages with unpublished changes
  --help               Show help message

Examples:

# Basic import (drafts)
python manage.py import_markdown content/

# Import and publish
python manage.py import_markdown content/ --publish

# Import to different blog section
python manage.py import_markdown tech-posts/ --slug=technology --publish

# Import as HTML
python manage.py import_markdown legacy/ --format=html --publish

# Force overwrite everything
python manage.py import_markdown content/ --overwrite --publish

The Output

When you run the command, you get clear feedback:

Created: Understanding Django Meta Classes → /blog/django-meta-classes/
Created: Automating Content Import → /blog/wagtail-import-command/
Updated: My First Post → /blog/my-first-post/
Skipped (has unpublished changes): Work In Progress

--------------------------------------------------
✓ Import finished
  Created:   2
  Updated:   1
  Skipped:   1
  Published: 3
  Format:    markdown
--------------------------------------------------

Why This Matters

This import command transforms your workflow:

Without it:

  • Manual content creation (slow)
  • No version control for content
  • Hard to bulk-update
  • Content and code separated

With it:

  • Automated content import (fast)
  • Content in Git (version controlled)
  • Bulk operations trivial
  • Content alongside code

Integration with CI/CD

Take it further with automation:

# .github/workflows/deploy.yml
name: Deploy

on:
  push:
    branches: [main]

jobs:
  deploy:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v2
      
      - name: Deploy code
        run: |
          # Deploy your Django app
          
      - name: Import content
        run: |
          python manage.py import_markdown content/ --publish

Now pushing to Git automatically updates your blog content.

Conclusion

Django management commands bridge the gap between manual CMS work and automated content management. The import command we've explored:

  • Saves time: Bulk import vs manual creation
  • Enables automation: Content from code
  • Supports workflows: Git, CI/CD, review processes
  • Maintains quality: Validation, error handling, skip protection
  • Stays flexible: Easy to extend and customize

This isn't just about avoiding the admin interface—it's about treating content as code, with all the benefits that brings: version control, code review, automated deployment, and programmatic generation.

Whether you're migrating an existing blog, building a documentation site, or creating a content pipeline, custom management commands are an essential tool in your Django/Wagtail toolkit.

Next Steps

Want to implement this in your own project?

  1. Create the management command: Start with the basic structure
  2. Test with a few files: Import a handful of markdown posts
  3. Refine the frontmatter: Add fields specific to your needs
  4. Build your workflow: Decide how content moves from creation to publication
  5. Automate: Add to your deployment process

The code for this import command is available in my GitHub repository. Feel free to adapt it for your own needs!


Have you built custom management commands for content import? What workflows do they enable for you? Share your experiences in the comments below!