How to Give Your Telegram Bot Conversation Memory with python-telegram-bot and OpenAI

A Telegram bot that forgets every message the moment it replies is barely more useful than a search bar. If you're forwarding user messages to OpenAI's Chat Completions API, you need to maintain a per-user message history array — otherwise the model has zero context, and multi-turn conversations are impossible. This is a common gap in beginner implementations, and fixing it cleanly requires understanding both where to store state and how to structure the messages payload.

🧠 Why the Bot Forgets

OpenAI's Chat Completions API is stateless. Every request you send must include the full conversation history in the messages array. If you only send the latest user message, the model treats it as the first message in a brand-new conversation. Your bot isn't broken — it's just not accumulating history before each API call.

The fix has two parts:

  • Maintain an in-memory (or persistent) list of message dicts per user
  • Append each new user message before calling the API, then append the assistant's reply after

📐 Architecture Overview

sequenceDiagram participant U as Telegram User participant B as Bot Process participant M as History Dict participant O as OpenAI API U->>B: Sends message B->>M: get_history(chat_id) M-->>B: [{system},{user1},{assistant1},...] B->>M: append {role:user, content:msg} B->>O: POST /chat/completions (full history) O-->>B: assistant reply B->>M: append {role:assistant, content:reply} B-->>U: reply_text(reply)

Implementation: Per-User Conversation History

The cleanest approach for a single-process bot is a plain Python dictionary keyed by Telegram chat_id. Each value is the running messages list for that user. This lives in memory for the lifetime of the process — good enough for development and low-traffic bots.

Complete Working Example

import os
from openai import OpenAI
from telegram import Update
from telegram.ext import ApplicationBuilder, MessageHandler, filters, ContextTypes

# In-memory store: { chat_id: [ {role, content}, ... ] }
conversation_history: dict[int, list[dict]] = {}

SYSTEM_PROMPT = "You are a helpful assistant."
OPENAI_CLIENT = OpenAI(api_key=os.environ["OPENAI_API_KEY"])


def get_history(chat_id: int) -> list[dict]:
    """Return existing history or initialise with system prompt."""
    if chat_id not in conversation_history:
        conversation_history[chat_id] = [
            {"role": "system", "content": SYSTEM_PROMPT}
        ]
    return conversation_history[chat_id]


async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    chat_id = update.effective_chat.id
    user_text = update.message.text

    # 1. Retrieve this user's history
    history = get_history(chat_id)

    # 2. Append the new user turn
    history.append({"role": "user", "content": user_text})

    # 3. Call OpenAI with the full history
    response = OPENAI_CLIENT.chat.completions.create(
        model="gpt-4o-mini",
        messages=history,
    )
    assistant_reply = response.choices[0].message.content

    # 4. Append the assistant turn so future calls include it
    history.append({"role": "assistant", "content": assistant_reply})

    # 5. Reply on Telegram
    await update.message.reply_text(assistant_reply)


if __name__ == "__main__":
    token = os.environ["TELEGRAM_BOT_TOKEN"]
    app = ApplicationBuilder().token(token).build()
    app.add_handler(MessageHandler(filters.TEXT & ~filters.COMMAND, handle_message))
    app.run_polling()

Install dependencies with: pip install python-telegram-bot openai (use python-telegram-bot[job-queue] if you need scheduled tasks).

🔒 Capping History Length

Unbounded history will eventually exceed the model's context window and inflate your token costs. A simple sliding-window approach keeps the system prompt and the most recent N turns:

MAX_TURNS = 20  # each turn = 1 user + 1 assistant message = 2 entries

def trim_history(history: list[dict]) -> list[dict]:
    """Keep system prompt + last MAX_TURNS*2 messages."""
    system = [m for m in history if m["role"] == "system"]
    conversation = [m for m in history if m["role"] != "system"]
    trimmed = conversation[-(MAX_TURNS * 2):]
    return system + trimmed

Call history = trim_history(history) before the API call and reassign it back to conversation_history[chat_id]. This keeps memory bounded without losing the system prompt.

Resetting a Conversation

Add a /reset command handler so users can start fresh without restarting the bot:

from telegram.ext import CommandHandler

async def reset(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    chat_id = update.effective_chat.id
    conversation_history.pop(chat_id, None)
    await update.message.reply_text("Conversation reset. Starting fresh!")

# Register it:
# app.add_handler(CommandHandler("reset", reset))

Persisting History Across Restarts

In-memory state is wiped every time the process restarts. For production, serialize history to a database. A minimal approach with SQLite and Python's built-in json module works well:

  • Store each user's history as a JSON blob keyed by chat_id
  • Load on first access, save after every assistant reply
  • For higher traffic, Redis with redis-py is a natural fit — store the list as a JSON string under a key like chat:{chat_id}:history

The in-memory dict pattern shown above is intentionally simple to swap out — just replace get_history and the append lines with your storage layer.

Key Takeaways

  • OpenAI's API is stateless — you must send the full conversation history on every request; the bot is responsible for accumulating it.
  • Key your history store by chat_id so each user gets an independent conversation thread.
  • Always append both the user message and the assistant reply to history, in that order, before and after the API call respectively.
  • Cap history length with a sliding window to control token usage and avoid context-window errors.
  • For production, persist history to SQLite or Redis so conversations survive process restarts.

With this pattern in place, your bot can handle multi-turn reasoning, remember user preferences within a session, and feel like a real conversational agent rather than a stateless lookup tool. The next step is adding a persistence layer — start with SQLite if you're solo, Redis if you're expecting concurrent users or horizontal scaling.

Related Posts

Comments

Popular posts from this blog

System, User, and Assistant Roles in the OpenAI Chat API Explained

How to Host Your First Website on a Raspberry Pi (Beginner's Guide)

Podman Rootless Containers: Understanding the Copy Fail Exploit and How to Defend Against It