메모리 기초

Agent가 맥락을 유지하고 과거 경험으로부터 학습하려면 메모리(Memory) 가 필요합니다. 인간의 기억 체계와 유사하게, Agent의 메모리도 여러 유형으로 나뉩니다.

왜 Agent Memory가 핵심인가

메모리 없는 Agent의 한계

메모리가 없는 Agent는 매 대화가 “첫 만남”처럼 시작 됩니다. 사용자가 누구인지, 이전에 어떤 대화를 나눴는지, 어떤 선호를 가지고 있는지 전혀 기억하지 못합니다.

주의 비유: 메모리 없는 Agent = 매일 기억을 잃는 직원. 어제 했던 일을 오늘 또 처음부터 설명해야 합니다. 고객이 “지난주에 말씀드린 환불 건이요”라고 하면, “어떤 환불 건이요?”라고 되물어야 합니다. 이런 Agent에게 복잡한 업무를 맡길 수 있을까요?

구체적으로 메모리가 없을 때 발생하는 문제는 다음과 같습니다:

문제	설명	영향
반복 질문	매 세션마다 동일한 정보를 다시 요청	사용자 피로도 급증
맥락 단절	이전 대화와 연결된 후속 질문에 대응 불가	복잡한 업무 처리 불가
개인화 불가	사용자 선호/습관을 반영한 응답 생성 불가	일률적이고 기계적인 경험
학습 불가	과거 실수를 반복하고 성공 패턴을 축적하지 못함	Agent 품질이 정체
멀티턴 실패	긴 대화에서 초반 맥락을 잊어버림	복잡한 작업 완료율 저하

메모리가 Agent 성능에 미치는 영향

업계 벤치마크와 실제 운영 데이터에 따르면, 적절한 메모리 시스템을 도입한 Agent는 다음과 같은 성능 개선을 보입니다:

작업 완료율 +30%: 이전 맥락을 기억하므로 멀티스텝 작업에서 중간에 실패하는 빈도가 대폭 감소
사용자 만족도 +40%: 반복 질문이 줄어들고 개인화된 응답을 제공하여 체감 품질 향상
평균 대화 턴 수 -25%: 이미 알고 있는 정보를 다시 물어볼 필요가 없어 업무 처리 속도 향상
재방문율 +50%: “기억해주는 Agent”에 대한 사용자 신뢰도 상승

참고 핵심: Memory는 Agent를 단순 도구에서 신뢰할 수 있는 동료 로 격상시키는 핵심 요소입니다. 단순 Q&A 챗봇이라면 메모리 없이도 가능하지만, 업무 자동화 Agent 를 목표로 한다면 메모리 설계가 필수입니다.

메모리 유형

Short-term Memory (단기 기억)

현재 대화의 히스토리(메시지 목록)를 의미합니다. LLM의 Context Window 크기에 의해 제한되며, 모든 Agent가 기본적으로 가지는 가장 기본적인 메모리입니다.

Context Window 관리 전략

LLM의 Context Window는 무한하지 않습니다. 대화가 길어질수록 토큰 예산이 소진되며, 이를 효과적으로 관리하는 전략이 필요합니다.

전략	설명	장점	단점
Sliding Window	최근 N개 메시지만 유지	구현 간단	오래된 중요 맥락 손실
Summarization	오래된 대화를 요약하여 압축	핵심 정보 보존	요약 품질에 의존, 추가 LLM 호출 비용
Token Budget	토큰 수 기준으로 관리	정밀한 비용 제어	메시지 중간에 잘릴 수 있음
Selective Retention	중요도 점수 기반 선별 유지	최적의 정보 밀도	구현 복잡도 높음

LangGraph에서 대화 히스토리 관리

from langgraph.checkpoint.memory import MemorySaver
from langgraph.prebuilt import create_react_agent
from langchain_databricks import ChatDatabricks

# 모델 설정
model = ChatDatabricks(endpoint="databricks-claude-sonnet-4")

# 체크포인트로 대화 상태 자동 저장
checkpointer = MemorySaver()
agent = create_react_agent(model, tools=[], checkpointer=checkpointer)

# thread_id로 대화 세션 구분 — 같은 thread_id면 이전 대화를 기억
config = {"configurable": {"thread_id": "user-123-session-1"}}

# 첫 번째 대화
result1 = agent.invoke(
    {"messages": [("user", "저는 데이터 엔지니어 김철수입니다.")]},
    config
)

# 두 번째 대화 — 같은 thread_id이므로 이전 대화를 기억
result2 = agent.invoke(
    {"messages": [("user", "제 이름이 뭐였죠?")]},
    config
)
# → "김철수님이시죠" (이전 대화를 기억)

Context Window가 가득 찼을 때 전략

대화가 매우 길어져 Context Window 한계에 도달하면 세 가지 접근법을 선택할 수 있습니다: 1. 요약 후 교체 (추천)

from langchain_core.messages import SystemMessage

def summarize_and_compress(messages, model, max_tokens=50000):
    """오래된 메시지를 요약하여 Context Window 공간 확보"""
    if estimate_tokens(messages) < max_tokens:
        return messages

    # 오래된 메시지들을 요약
    old_messages = messages[:-10]  # 최근 10개는 유지
    recent_messages = messages[-10:]

    summary = model.invoke([
        SystemMessage(content="아래 대화를 핵심 정보만 간결하게 요약하세요."),
        *old_messages
    ])

    # 요약본 + 최근 메시지로 교체
    return [
        SystemMessage(content=f"[이전 대화 요약]\n{summary.content}"),
        *recent_messages
    ]

2. Sliding Window (단순하지만 효과적)

def sliding_window(messages, window_size=20):
    """최근 N개 메시지만 유지"""
    system_msgs = [m for m in messages if isinstance(m, SystemMessage)]
    conversation = [m for m in messages if not isinstance(m, SystemMessage)]
    return system_msgs + conversation[-window_size:]

3. 중요도 기반 선별 유지 (고급)

def selective_retention(messages, model, max_tokens=50000):
    """각 메시지의 중요도를 평가하여 선별 유지"""
    scored = []
    for msg in messages:
        # 숫자, 이름, 날짜 등 핵심 정보 포함 여부로 점수 부여
        score = calculate_importance(msg)
        scored.append((score, msg))

    scored.sort(key=lambda x: x[0], reverse=True)

    retained = []
    total_tokens = 0
    for score, msg in scored:
        tokens = estimate_tokens([msg])
        if total_tokens + tokens <= max_tokens:
            retained.append(msg)
            total_tokens += tokens

    # 시간순 재정렬
    return sort_by_timestamp(retained)

Long-term Memory (장기 기억)

세션을 넘어 영속적으로 저장되는 정보입니다. 일반적으로 Vector DB나 외부 데이터베이스로 구현하며, Agent가 관련성이 있을 때 과거 상호작용을 검색하여 활용합니다.

구현 아키텍처 3가지

아키텍처	저장 방식	검색 방식	장점	단점
Vector DB 기반	대화/문서를 임베딩하여 벡터 저장	의미적 유사도 검색	자연어로 검색 가능, 유연함	정확한 필터링 어려움
구조화 DB 기반	(user_id, key, value) 테이블 저장	SQL 쿼리로 정확 조회	정확하고 빠른 조회	의미적 검색 불가
하이브리드	Vector DB + 구조화 DB 결합	유사도 검색 + SQL 필터링	두 장점 결합, 가장 실용적	구현 복잡도 높음

Databricks Vector Search로 Long-term Memory 구현

from databricks.vector_search.client import VectorSearchClient
from datetime import datetime

vsc = VectorSearchClient()
index = vsc.get_index(
    endpoint_name="memory_vs_endpoint",
    index_name="catalog.schema.memory_index"
)

def store_memory(user_id: str, content: str, metadata: dict = None):
    """대화 내용을 임베딩하여 Vector Search에 저장"""
    timestamp = datetime.now().isoformat()
    record = {
        "id": f"{user_id}_{timestamp}",
        "text": content,
        "user_id": user_id,
        "timestamp": timestamp,
        "memory_type": "conversation",
    }
    if metadata:
        record.update(metadata)
    index.upsert([record])

def recall_memory(user_id: str, query: str, top_k: int = 5):
    """관련 기억을 의미 검색으로 가져오기"""
    results = index.similarity_search(
        query_text=query,
        columns=["text", "timestamp", "memory_type"],
        filters={"user_id": user_id},
        num_results=top_k
    )
    return results.get("result", {}).get("data_array", [])

# 사용 예시
store_memory("user-456", "고객이 환불 정책에 대해 문의함. 30일 이내 전액 환불 안내 완료.")
memories = recall_memory("user-456", "환불 관련 이전 문의")

Lakebase(PostgreSQL)로 구조화 메모리 구현

import psycopg2
from datetime import datetime

# Lakebase 연결 (Databricks 서버리스 PostgreSQL)
conn = psycopg2.connect(
    host="<lakebase-endpoint>.cloud.databricks.com",
    port=5432,
    dbname="agent_memory_db",
    user="token",
    password="<databricks-token>"
)
cursor = conn.cursor()

def store_user_preference(user_id: str, key: str, value: str):
    """사용자 프로필 메모리 저장 (UPSERT)"""
    cursor.execute("""
        INSERT INTO agent_memory (user_id, key, value, updated_at)
        VALUES (%s, %s, %s, NOW())
        ON CONFLICT (user_id, key)
        DO UPDATE SET value = %s, updated_at = NOW()
    """, (user_id, key, value, value))
    conn.commit()

def get_user_preferences(user_id: str) -> list:
    """사용자의 모든 저장된 선호/프로필 조회"""
    cursor.execute(
        "SELECT key, value FROM agent_memory WHERE user_id = %s ORDER BY updated_at DESC",
        (user_id,)
    )
    return cursor.fetchall()

# 사용 예시
store_user_preference("user-456", "preferred_language", "한국어")
store_user_preference("user-456", "department", "데이터 엔지니어링팀")
store_user_preference("user-456", "timezone", "Asia/Seoul")

prefs = get_user_preferences("user-456")
# → [("timezone", "Asia/Seoul"), ("department", "데이터 엔지니어링팀"), ...]

하이브리드 메모리 아키텍처 (권장)

실무에서 가장 효과적인 방식은 Vector Search + Lakebase를 함께 사용 하는 것입니다:

┌─────────────────────────────────────────────────────┐
│                 Agent System Prompt                  │
│                                                     │
│  [Profile Memory]    ← Lakebase (구조화 데이터)       │
│  이름: 김철수, 부서: DE팀, 언어: 한국어                │
│                                                     │
│  [Recent Context]    ← Short-term (현재 대화)         │
│  최근 3개 메시지                                      │
│                                                     │
│  [Relevant History]  ← Vector Search (의미 검색)      │
│  과거 유사한 대화 3건                                  │
└─────────────────────────────────────────────────────┘

Episodic Memory (에피소드 기억)

과거 문제 해결 과정의 기록입니다. “지난번에 비슷한 오류가 발생했을 때, 이렇게 해결했다…”와 같은 경험 기반 학습을 가능하게 합니다.

구현 패턴: MLflow Tracing 기반 경험 학습

Agent의 모든 실행은 MLflow Tracing을 통해 기록됩니다. 성공한 실행 경로와 실패한 경로를 분류하고, 유사한 상황에서 성공 패턴을 우선 추천하는 방식으로 Episodic Memory를 구현할 수 있습니다.

import mlflow
from mlflow.client import MlflowClient

client = MlflowClient()

def search_successful_episodes(query: str, top_k: int = 3) -> list:
    """과거 성공한 에피소드를 검색하여 참고"""
    # MLflow에서 성공한 실행 이력 조회
    runs = client.search_runs(
        experiment_ids=["<experiment_id>"],
        filter_string='attributes.status = "FINISHED" AND tags.outcome = "success"',
        order_by=["metrics.user_satisfaction DESC"],
        max_results=top_k
    )

    episodes = []
    for run in runs:
        episodes.append({
            "task": run.data.tags.get("task_description", ""),
            "approach": run.data.tags.get("approach", ""),
            "tools_used": run.data.tags.get("tools_used", ""),
            "outcome": run.data.tags.get("outcome", ""),
            "duration_sec": run.data.metrics.get("duration_sec", 0),
        })
    return episodes

def record_episode(task: str, approach: str, tools: list, outcome: str):
    """현재 에피소드를 기록하여 미래 참조용으로 저장"""
    with mlflow.start_run(tags={
        "task_description": task,
        "approach": approach,
        "tools_used": ",".join(tools),
        "outcome": outcome,
    }) as run:
        mlflow.log_metric("user_satisfaction", 1.0 if outcome == "success" else 0.0)

# 사용 예시: 유사 에피소드 검색 후 프롬프트에 주입
past_episodes = search_successful_episodes("ETL 파이프라인 오류 해결")
episode_context = "\n".join([
    f"- 과거 사례: {ep['task']} → {ep['approach']} (결과: {ep['outcome']})"
    for ep in past_episodes
])
# → System Prompt에 [Past Experiences] 섹션으로 삽입

경험 기반 학습 아키텍처

[사용자 요청] → [유사 과거 에피소드 검색]
                        ↓
               성공 에피소드 발견?
              ┌── Yes ──┐── No ──┐
              ↓                   ↓
      성공 경로 우선 시도     기본 추론으로 진행
              ↓                   ↓
         결과 기록 ←──────── 결과 기록
              ↓
      [새 에피소드로 저장]

Working Memory (작업 기억)

현재 진행 중인 추론을 위한 임시 메모장입니다. 계획(Plan), 중간 결과, 현재 상태 등을 저장하며, Agent State로 구현되는 경우가 많습니다.

LangGraph State를 활용한 Working Memory

from typing import TypedDict, Annotated
from langgraph.graph import StateGraph, END
import operator

class AgentState(TypedDict):
    """Agent의 Working Memory를 State로 관리"""
    messages: Annotated[list, operator.add]  # 대화 히스토리
    plan: list[str]                           # 현재 실행 계획
    current_step: int                         # 진행 중인 단계
    intermediate_results: dict                # 중간 결과 저장
    scratchpad: str                           # Agent의 메모장

def planning_node(state: AgentState) -> dict:
    """작업 계획을 수립하고 Working Memory에 저장"""
    plan = generate_plan(state["messages"][-1])
    return {
        "plan": plan,
        "current_step": 0,
        "scratchpad": f"총 {len(plan)}단계 계획 수립 완료"
    }

def execution_node(state: AgentState) -> dict:
    """현재 단계를 실행하고 중간 결과를 Working Memory에 기록"""
    step = state["plan"][state["current_step"]]
    result = execute_step(step, state["intermediate_results"])

    new_results = {** state["intermediate_results"], f"step_{state['current_step']}": result}
    return {
        "current_step": state["current_step"] + 1,
        "intermediate_results": new_results,
        "scratchpad": state["scratchpad"] + f"\n단계 {state['current_step']} 완료: {result}"
    }

# 그래프 구성
graph = StateGraph(AgentState)
graph.add_node("plan", planning_node)
graph.add_node("execute", execution_node)

Agent Scratchpad 패턴

Scratchpad 는 Agent가 추론 과정에서 중간 메모를 남기는 공간입니다. 복잡한 멀티스텝 작업에서 특히 유용합니다.

def build_scratchpad_prompt(state: AgentState) -> str:
    """Working Memory의 scratchpad를 프롬프트에 반영"""
    scratchpad = state.get("scratchpad", "")
    intermediate = state.get("intermediate_results", {})

    prompt_section = "[Working Memory - 현재 진행 상황]\n"
    if scratchpad:
        prompt_section += f"메모: {scratchpad}\n"
    if intermediate:
        prompt_section += "중간 결과:\n"
        for key, val in intermediate.items():
            prompt_section += f"  - {key}: {val}\n"

    return prompt_section

GenAI 핵심 개념

RAG (검색 증강 생성)

ML 핵심 개념

MCP (Model Context Protocol)

왜 Agent Memory가 핵심인가

메모리 없는 Agent의 한계

메모리가 Agent 성능에 미치는 영향

메모리 유형

Short-term Memory (단기 기억)

Context Window 관리 전략

LangGraph에서 대화 히스토리 관리

Context Window가 가득 찼을 때 전략

Long-term Memory (장기 기억)

구현 아키텍처 3가지

Databricks Vector Search로 Long-term Memory 구현

Lakebase(PostgreSQL)로 구조화 메모리 구현

하이브리드 메모리 아키텍처 (권장)

Episodic Memory (에피소드 기억)

구현 패턴: MLflow Tracing 기반 경험 학습

경험 기반 학습 아키텍처

Working Memory (작업 기억)

LangGraph State를 활용한 Working Memory

Agent Scratchpad 패턴

GenAI 핵심 개념

RAG (검색 증강 생성)

ML 핵심 개념

MCP (Model Context Protocol)

​왜 Agent Memory가 핵심인가

​메모리 없는 Agent의 한계

​메모리가 Agent 성능에 미치는 영향

​메모리 유형

​Short-term Memory (단기 기억)

​Context Window 관리 전략

​LangGraph에서 대화 히스토리 관리

​Context Window가 가득 찼을 때 전략

​Long-term Memory (장기 기억)

​구현 아키텍처 3가지

​Databricks Vector Search로 Long-term Memory 구현

​Lakebase(PostgreSQL)로 구조화 메모리 구현

​하이브리드 메모리 아키텍처 (권장)

​Episodic Memory (에피소드 기억)

​구현 패턴: MLflow Tracing 기반 경험 학습

​경험 기반 학습 아키텍처

​Working Memory (작업 기억)

​LangGraph State를 활용한 Working Memory

​Agent Scratchpad 패턴

왜 Agent Memory가 핵심인가

메모리 없는 Agent의 한계

메모리가 Agent 성능에 미치는 영향

메모리 유형

Short-term Memory (단기 기억)

Context Window 관리 전략

LangGraph에서 대화 히스토리 관리

Context Window가 가득 찼을 때 전략

Long-term Memory (장기 기억)

구현 아키텍처 3가지

Databricks Vector Search로 Long-term Memory 구현

Lakebase(PostgreSQL)로 구조화 메모리 구현

하이브리드 메모리 아키텍처 (권장)

Episodic Memory (에피소드 기억)

구현 패턴: MLflow Tracing 기반 경험 학습

경험 기반 학습 아키텍처

Working Memory (작업 기억)

LangGraph State를 활용한 Working Memory

Agent Scratchpad 패턴