Troubleshooting Common XMLTV Errors and Validation Tips

Best Tools and Workflows for XMLTV Grabbers and EPGsCreating, importing, and maintaining high-quality Electronic Program Guides (EPGs) using XMLTV requires the right combination of tools and well-defined workflows. This article covers essential tools, best practices, and example workflows for building reliable XMLTV pipelines — from grabbing raw listings to transforming, validating, enriching, and deploying EPG data for set-top boxes, media centers (Kodi, Emby, Plex), or IPTV services.


What is XMLTV (brief)

XMLTV is an XML-based file format and a collection of utilities for storing TV listings and program metadata. A typical XMLTV file contains channel entries and program elements (title, start/end times, descriptions, categories, ratings, images, credits, etc.). XMLTV files are widely used as the EPG source for DVRs, media centers, and IPTV clients.


Core components of an XMLTV workflow

A robust XMLTV workflow typically includes the following components:

  • Grabbers: fetch raw listings from providers or scrape websites.
  • Parsers/Converters: normalize data into XMLTV format.
  • Transformers/Enrichers: add images, categories, ratings, and unique IDs.
  • Timezone and date handlers: ensure correct timestamps and DST handling.
  • Validators: ensure produced XML adheres to XMLTV schema and target consumer requirements.
  • Delivery: compressing, splitting, and distributing the final XMLTV file to clients or servers.

Below is a curated list of commonly used tools and libraries organized by task. Each entry includes short notes on strengths and typical use cases.

  • XMLTV utilities (original project)

    • Strengths: reference grabbers, parsers, basic tools for conversion and validation.
    • Use: starting point; many distributions include ready-made grabbers.
  • Web grabbers (custom/scrapers)

    • Tools: Python (requests, BeautifulSoup, lxml), Node.js (axios, cheerio), Scrapy.
    • Strengths: flexibility, can target providers without public APIs.
    • Use: build custom scrapers for websites, handle pagination, and login flows.
  • API-based fetchers

    • Tools: Python/JavaScript HTTP clients, Postman for testing.
    • Strengths: reliable structured data, JSON-to-XML pipelines are straightforward.
    • Use: connect to broadcaster or aggregator APIs (when available).
  • xmltv2json / tv_grab utilities

    • Strengths: converters and helper scripts for format conversion and compatibility.
    • Use: convert between JSON and XMLTV, or between variations of XMLTV.
  • Timeshift/timezone libraries

    • Tools: pytz/dateutil (Python), luxon/moment-timezone (JS), zoneinfo (Python 3.9+).
    • Strengths: correct DST handling, timezone conversions.
    • Use: normalize start/end times into UTC or target timezone.
  • Validation tools

    • Tools: xmllint, XML schema validators, XMLTV’s own validator scripts.
    • Strengths: catch structural issues, missing required fields, invalid timestamps.
    • Use: include as CI checks before publishing EPG files.
  • Data enrichment

    • Tools/APIs: TheTVDB, TMDB, IMDb scraping, Gracenote (commercial), TVmaze.
    • Strengths: add posters, thumbnails, episode metadata, series IDs.
    • Use: enhance user experience in media centers or clients.
  • Database/Storage

    • Tools: SQLite, PostgreSQL, Redis (for caching).
    • Strengths: persist intermediate data, dedupe, and join multiple sources.
    • Use: store channel mappings, program GUIDs, and grabbing logs.
  • Automation & Orchestration

    • Tools: cron, systemd timers, Airflow, GitHub Actions, Docker Compose, Kubernetes.
    • Strengths: reliable scheduling, scaling, monitoring.
    • Use: schedule grabbers, run validations, rotate files.
  • Packaging & Delivery

    • Tools: gzip, brotli, S3/Cloud storage, rsync, HTTP servers, bittorrent (edge cases).
    • Strengths: compression reduces bandwidth, HTTP/S distribution is standard.
    • Use: publish compressed XMLTV.gz files for clients to download.

Example workflows

1) Small-scale local EPG (home server, Kodi/TVHeadend)

  1. Schedule a simple Python grabber via cron to fetch provider pages or API every 6–12 hours.
  2. Parse and normalize into XMLTV using lxml or xml.etree, ensuring times converted to local timezone with zoneinfo.
  3. Run xmllint to validate structure.
  4. gzip the XMLTV file and place it into your media server’s expected folder or configure TVHeadend to pull it.

Tools: Python (requests, lxml, zoneinfo), cron, xmllint, gzip.

2) Multi-source enrichment pipeline (community EPG project)

  1. Use multiple grabbers: API-based for major networks, scrapers for niche channels.
  2. Ingest raw outputs into PostgreSQL; dedupe by title/start-time/channel.
  3. Enrich each program by querying TMDB/TVmaze for images and episode metadata; store external IDs.
  4. Normalize categories and ratings to a canonical taxonomy.
  5. Produce per-region XMLTV files, validate with XML schema, and run automated QA checks (e.g., missing descriptions, zero-length programs).
  6. Compress and publish to S3 with versioned keys; invalidate CDN caches.

Tools: Python, Scrapy, PostgreSQL, Redis, TMDB/TVmaze APIs, GitLab CI or Airflow, AWS S3/CloudFront.

3) Enterprise/Commercial EPG (scale, SLAs)

  1. Architect a microservice-based system: independent grabber services, enrichment services, normalization service.
  2. Grabbers write raw feeds to a message queue (Kafka).
  3. Stream processors normalize timestamps, dedupe and enrich in near-real-time.
  4. Store canonical EPG in a distributed DB; expose API endpoints for clients to request custom EPG slices.
  5. Strong validation, monitoring, and rollback mechanisms. Use Kafka Connect and Debezium for auditing and replication.

Tools: Kafka, Kubernetes, Go/Python services, PostgreSQL, Elasticsearch for search, Prometheus/Grafana for observability.


Best practices and pitfalls

  • Timezones and DST: Always store times in UTC internally and convert to the client timezone only at the delivery stage. DST bugs are the most common cause of off-by-one-hour schedule errors.
  • Unique IDs: assign stable GUIDs for programs (e.g., hash of title+start+channel) so clients can track recordings and avoid duplicates.
  • Deduplication: when merging sources, prefer one canonical source per channel; use fuzzy matching (Levenshtein, token set) to dedupe program titles.
  • Throttling & respectful scraping: obey robots.txt, throttle requests, and prefer official APIs to avoid IP bans.
  • Validation in CI: run XML validation and sanity checks (no overlaps, end > start, descriptions present) on every generated file.
  • Backups & versioning: keep previous versions for troubleshooting and allow consumers to roll back.
  • Legal/commercial considerations: verify licensing for third-party metadata (images, descriptions). Some sources forbid redistribution.

Validation checklist (quick)

  • Channel list present and matches expected channels.
  • Program start/end times are in correct ISO format and timezone.
  • No overlapping programs for the same channel.
  • Required fields present: title, start, stop.
  • Images and ratings (if referenced) link to accessible URLs or embedded data.
  • File size and compressed ratio within expected bounds.
  • XML well-formed and schema-valid.

Sample XMLTV generation snippet (Python)

from datetime import datetime, timezone import xml.etree.ElementTree as ET tv = ET.Element("tv") channel = ET.SubElement(tv, "channel", id="channel-1") ET.SubElement(channel, "display-name").text = "Example Channel" prog = ET.SubElement(tv, "programme", {     "start": datetime.now(timezone.utc).strftime("%Y%m%d%H%M%S +0000"),     "stop": (datetime.now(timezone.utc).replace(hour=datetime.now().hour+1)).strftime("%Y%m%d%H%M%S +0000"),     "channel": "channel-1" }) ET.SubElement(prog, "title").text = "Sample Show" ET.SubElement(prog, "desc").text = "Short description" print(ET.tostring(tv, encoding="utf-8").decode()) 

When to build vs. use existing services

  • Build if you need full control, custom enrichment, or offline authority on data.
  • Use existing EPG providers when you want fast setup and compliance with licensing (but check costs and redistribution rights).

Closing notes

A reliable XMLTV pipeline balances dependable grabbers, strict timezone handling, robust enrichment, and automated validation. Start small, prioritize correct timestamps and stable identifiers, and iterate toward more complex enrichment and distribution as needs grow.

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *