Automating Content Import in Wagtail
December 6, 2025 · 10 min read★ Featured
A guide for importing markdown documents into Wagtail automatically from code
Automating Content Import in Wagtail: From Markdown to CMS
When building a blog or content site with Wagtail, there's a fundamental question: should you create content through the CMS interface, or can you automate it? The answer is both—and having a custom Django management command bridges these worlds beautifully.
The Problem: Manual Content Creation Doesn't Scale
Imagine you're migrating from an existing blog, importing documentation, or working with content that lives in version control. Creating each post manually through the Wagtail admin becomes tedious:
- Navigate to the admin
- Click "Add child page"
- Copy-paste content
- Format everything
- Fill in metadata
- Save and publish
- Repeat for dozens or hundreds of posts
There has to be a better way.
The Solution: Django Management Commands
Django's management command system lets you create custom CLI tools that interact with your application. Think of commands like python manage.py migrate or python manage.py createsuperuser—you can build your own.
For content import, we can create a command that:
- Reads markdown files from a directory
- Parses frontmatter metadata
- Creates or updates Wagtail pages programmatically
- Handles publishing automatically
Understanding the Import Command
Let me walk you through how this works, using a real import command as an example.
The Basic Structure
from django.core.management.base import BaseCommand
class Command(BaseCommand):
help = "Import Markdown files into BlogPage entries."
def add_arguments(self, parser):
# Define command-line arguments
parser.add_argument("path", type=str)
def handle(self, *args, **options):
# Your import logic goes here
pass
Every Django management command:
- Extends
BaseCommand - Defines a
helpstring (shown in--help) - Uses
add_arguments()to define CLI options - Implements
handle()with the main logic
Markdown + Frontmatter: Content as Code
The key insight is storing blog posts as markdown files with YAML frontmatter:
---
title: "My Blog Post"
intro: "A brief description"
date: 2025-01-15
reading_time: 5
slug: my-blog-post
---
# Main Content
Your markdown content goes here...
This format is:
- Version control friendly - Git diffs work perfectly
- Editor-agnostic - Write in VS Code, Obsidian, or any text editor
- Portable - Easy to migrate between systems
- Readable - No database required to read your content
Parsing with Python Frontmatter
The python-frontmatter library makes parsing trivial:
import frontmatter
post = frontmatter.loads(file.read_text())
# Access metadata
title = post.get("title")
date = post.get("date")
# Access content
content = post.content
Creating Pages Programmatically
Here's where it gets interesting. Wagtail pages are just Django models, which means you can create them with Python code:
from blog.models import BlogPage, BlogIndexPage
# Find the parent page (blog index)
blog_index = BlogIndexPage.objects.get(slug="blog")
# Create a new blog post
page = BlogPage(
title="My Post",
slug="my-post",
intro="Brief description",
date=date_value,
body=[("markdown", content)],
)
# Add as child of blog index
blog_index.add_child(instance=page)
# Save and publish
revision = page.save_revision()
revision.publish()
This is exactly what happens when you click "Publish" in the admin—we're just doing it with code.
The Complete Import Workflow
Let's trace through what happens when you run the import command:
Step 1: Command Invocation
python manage.py import_markdown posts/ --format=markdown --publish
What this does:
posts/- Directory containing markdown files--format=markdown- Keep content as markdown (vs converting to HTML)--publish- Publish immediately (vs save as draft)
Step 2: Find or Create Blog Index
def get_or_create_blog_index_by_slug(slug: str) -> BlogIndexPage:
existing_index = BlogIndexPage.objects.filter(slug=slug).first()
if existing_index:
return existing_index
# Create new index if it doesn't exist
home = Page.objects.get(slug="home")
index = BlogIndexPage(title=title, slug=slug)
home.add_child(instance=index)
index.save_revision().publish()
return index
This ensures we have a parent page to add blog posts to. If you import to a blog index that doesn't exist, it creates one automatically.
Step 3: Process Each Markdown File
for md_file in md_dir.glob("*.md"):
# Parse the file
post = frontmatter.loads(md_file.read_text())
# Extract metadata
title = post.get("title", md_file.stem)
intro = post.get("intro", "")
date_value = post.get("date", None)
slug_value = post.get("slug", slugify(title))
The command walks through every .md file in the directory, parses frontmatter, and extracts metadata with sensible defaults:
- No title? Use the filename
- No slug? Generate from title
- No date? Use None (can be set later)
Step 4: Create or Update
Here's the clever part—the command handles both new posts and updates:
existing_page = BlogPage.objects.filter(slug=slug_value).first()
if existing_page:
# Update existing page
page = existing_page
page.title = title
page.body = body_content
# ... update other fields
page.save_revision()
# Output: "Updated: My Post"
else:
# Create new page
page = BlogPage(
title=title,
slug=slug_value,
body=body_content,
)
blog_index.add_child(instance=page)
page.save_revision()
# Output: "Created: My Post"
This means you can:
- Run the import to create posts
- Edit markdown files
- Run import again to update existing posts
- No duplicates, no manual cleanup
Step 5: Publishing
if publish:
revision.publish()
Wagtail's revision system means every save creates a draft. Publishing is a separate step, which the --publish flag triggers automatically.
Without --publish, posts are created as drafts—useful for reviewing before making them public.
Advanced Features
Format Selection: Markdown vs HTML
if format_type == "markdown":
body_content = [("markdown", post.content)]
else:
body_html = markdown.markdown(post.content, extensions=[...])
body_content = [("html", body_html)]
Why this matters:
Markdown format (recommended):
- Content stays editable in Wagtail admin
- Can switch between markdown/rich text later
- Original formatting preserved
- Better for collaboration
HTML format:
- Pre-rendered, slightly faster
- No markdown dependency in production
- Loses editability
- Good for static content that won't change
Skip Protection
if not overwrite and existing_page.has_unpublished_changes:
# Skip this page
skipped += 1
continue
This prevents accidentally overwriting edits made through the admin that haven't been published yet. Without --overwrite, the command is safe to run repeatedly.
Date Parsing
if date_value and isinstance(date_value, str):
try:
date_value = datetime.strptime(date_value, "%Y-%m-%d").date()
except ValueError:
date_value = None
Handles dates in frontmatter as strings and converts them to Python date objects. If the format is wrong, it gracefully falls back to None rather than crashing.
Real-World Use Cases
Use Case 1: Blog Migration
You're moving from Jekyll, Hugo, or another static site generator:
# Copy your old posts to a directory
cp ~/old-blog/_posts/*.md ./import/
# Import them all at once
python manage.py import_markdown import/ --publish
Result: Entire blog imported in seconds, not hours.
Use Case 2: Version-Controlled Content
Your content lives in Git alongside your code:
my-blog/
├── content/
│ ├── 2025-01-01-first-post.md
│ ├── 2025-01-15-second-post.md
│ └── 2025-02-01-third-post.md
└── ...
Workflow:
- Write posts in your favorite editor
- Commit to Git
- Push to repository
- Run import command in production
- Content is live
This enables:
- Code review for blog posts (via pull requests)
- Change history (Git blame shows who wrote what)
- Rollbacks (revert content like you revert code)
- Branching (draft posts in feature branches)
Use Case 3: Bulk Content Updates
You need to update metadata across many posts:
# Edit frontmatter in all markdown files
sed -i 's/reading_time: 5/reading_time: 8/' content/*.md
# Re-import to update all posts
python manage.py import_markdown content/ --overwrite --publish
Try doing that through the admin interface!
Use Case 4: Content Pipeline
You generate content programmatically:
# generate_posts.py
for topic in topics:
content = generate_content(topic)
with open(f"auto-posts/{topic.slug}.md", "w") as f:
f.write(f"""---
title: {topic.title}
intro: {topic.description}
date: {today}
---
{content}
""")
Then import:
python generate_posts.py
python manage.py import_markdown auto-posts/ --publish
Perfect for documentation, reports, or AI-generated content.
Best Practices
1. Keep Markdown Files Organized
content/
├── 2025/
│ ├── 01/
│ │ ├── first-post.md
│ │ └── second-post.md
│ └── 02/
│ └── third-post.md
└── drafts/
└── work-in-progress.md
Organize by date, category, or status. Import specific directories as needed.
2. Use Consistent Frontmatter
Create a template for new posts:
---
title: ""
intro: ""
date: YYYY-MM-DD
reading_time: 5
slug: ""
---
#
This ensures all required fields are present.
3. Test Before Publishing
# First run without --publish to create drafts
python manage.py import_markdown content/
# Review in admin
# Then publish
python manage.py import_markdown content/ --publish
4. Use Meaningful Slugs
---
slug: understanding-wagtail-import-commands
---
Explicit slugs prevent collisions and give you clean URLs.
5. Handle Images Carefully
If your markdown references images:

Make sure to:
- Upload images to Wagtail's media library
- Update paths or use Wagtail's image handling
- Or use absolute URLs to external CDN
Extending the Command
The import command is just Python code—you can extend it:
Add Author Support
author_name = post.get("author")
if author_name:
author = User.objects.get(username=author_name)
page.owner = author
Import Categories/Tags
tags = post.get("tags", [])
for tag_name in tags:
tag, _ = Tag.objects.get_or_create(name=tag_name)
page.tags.add(tag)
Process Images
from wagtail.images.models import Image
for image_path in extract_images(post.content):
image = Image.objects.create(
title=image_path.stem,
file=image_path,
)
Generate SEO Metadata
from wagtail.contrib.settings.models import BaseSiteSetting
page.search_description = post.get("description", intro[:155])
page.seo_title = post.get("seo_title", title)
The Command Line Interface
Here's the full interface:
python manage.py import_markdown <path> [options]
Arguments:
path Directory containing .md files
Options:
--slug SLUG Slug of BlogIndexPage (default: "blog")
--publish Publish pages immediately
--format {markdown,html} Import format (default: markdown)
--overwrite Overwrite pages with unpublished changes
--help Show help message
Examples:
# Basic import (drafts)
python manage.py import_markdown content/
# Import and publish
python manage.py import_markdown content/ --publish
# Import to different blog section
python manage.py import_markdown tech-posts/ --slug=technology --publish
# Import as HTML
python manage.py import_markdown legacy/ --format=html --publish
# Force overwrite everything
python manage.py import_markdown content/ --overwrite --publish
The Output
When you run the command, you get clear feedback:
Created: Understanding Django Meta Classes → /blog/django-meta-classes/
Created: Automating Content Import → /blog/wagtail-import-command/
Updated: My First Post → /blog/my-first-post/
Skipped (has unpublished changes): Work In Progress
--------------------------------------------------
✓ Import finished
Created: 2
Updated: 1
Skipped: 1
Published: 3
Format: markdown
--------------------------------------------------
Why This Matters
This import command transforms your workflow:
Without it:
- Manual content creation (slow)
- No version control for content
- Hard to bulk-update
- Content and code separated
With it:
- Automated content import (fast)
- Content in Git (version controlled)
- Bulk operations trivial
- Content alongside code
Integration with CI/CD
Take it further with automation:
# .github/workflows/deploy.yml
name: Deploy
on:
push:
branches: [main]
jobs:
deploy:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v2
- name: Deploy code
run: |
# Deploy your Django app
- name: Import content
run: |
python manage.py import_markdown content/ --publish
Now pushing to Git automatically updates your blog content.
Conclusion
Django management commands bridge the gap between manual CMS work and automated content management. The import command we've explored:
- Saves time: Bulk import vs manual creation
- Enables automation: Content from code
- Supports workflows: Git, CI/CD, review processes
- Maintains quality: Validation, error handling, skip protection
- Stays flexible: Easy to extend and customize
This isn't just about avoiding the admin interface—it's about treating content as code, with all the benefits that brings: version control, code review, automated deployment, and programmatic generation.
Whether you're migrating an existing blog, building a documentation site, or creating a content pipeline, custom management commands are an essential tool in your Django/Wagtail toolkit.
Next Steps
Want to implement this in your own project?
- Create the management command: Start with the basic structure
- Test with a few files: Import a handful of markdown posts
- Refine the frontmatter: Add fields specific to your needs
- Build your workflow: Decide how content moves from creation to publication
- Automate: Add to your deployment process
The code for this import command is available in my GitHub repository. Feel free to adapt it for your own needs!
Have you built custom management commands for content import? What workflows do they enable for you? Share your experiences in the comments below!