mcp-iceberg-service

MCP server for Apache Iceberg, enabling data lake discovery and metadata search via SQL in Claude desktop using LLM prompts.

Visit Website

Visit Website

Introduction

MCP Iceberg Catalog

A MCP (Model Context Protocol) server implementation for interacting with Apache Iceberg. This server provides a SQL interface for querying and managing Iceberg tables through Claude desktop.

Claude Desktop as your Iceberg Data Lake Catalog

How to Install in Claude Desktop

Prerequisites
- Python 3.10 or higher
- UV package installer (recommended) or pip
- Access to an Iceberg REST catalog and S3-compatible storage
How to install in Claude Desktop Add the following configuration to claude_desktop_config.json:

{
  "mcpServers": {
    "iceberg": {
      "command": "uv",
      "args": [
        "--directory",
        "PATH_TO_/mcp-iceberg-service",
        "run",
        "mcp-server-iceberg"
      ],
      "env": {
        "ICEBERG_CATALOG_URI" : "http://localhost:8181",
        "ICEBERG_WAREHOUSE" : "YOUR ICEBERG WAREHOUSE NAME",
        "S3_ENDPOINT" : "OPTIONAL IF USING S3",
        "AWS_ACCESS_KEY_ID" : "YOUR S3 ACCESS KEY",
        "AWS_SECRET_ACCESS_KEY" : "YOUR S3 SECRET KEY"
      }
    }
  }
}

Design

Architecture

The MCP server is built on three main components:

MCP Protocol Handler
- Implements the Model Context Protocol for communication with Claude
- Handles request/response cycles through stdio
- Manages server lifecycle and initialization
Query Processor
- Parses SQL queries using sqlparse
- Supports operations:
  - LIST TABLES
  - DESCRIBE TABLE
  - SELECT
  - INSERT
Iceberg Integration
- Uses pyiceberg for table operations
- Integrates with PyArrow for efficient data handling
- Manages catalog connections and table operations

PyIceberg Integration

The server utilizes PyIceberg in several ways:

Catalog Management
- Connects to REST catalogs
- Manages table metadata
- Handles namespace operations
Data Operations
- Converts between PyIceberg and PyArrow types
- Handles data insertion through PyArrow tables
- Manages table schemas and field types
Query Execution
- Translates SQL to PyIceberg operations
- Handles data scanning and filtering
- Manages result set conversion