Runtime Launch

Local AI at Maximum Velo

Run massive models in your laptop. Zero cloud overhead, absolute privacy, unmatched throughput.

Install for macOS / Windows / Linux (Coming Soon)View Quickstart

Current version: 0.24.0 | Free Software

Capabilities

The Core Trinity

Bare-Metal Velocity

Bypasses standard bottlenecks with advanced memory routing and L3 cache maximization.

Air-Gapped Privacy

Your data never leaves your device. Secure, local, and enterprise-ready by default.

Hardware Agnostic

Deeply optimized for NVIDIA, AMD, Apple Silicon, and standard x86 CPUs.

Proven Velocity

Benchmark Metrics

benchmark_matrix.sh

Legacy Inference16.79 tok/s

GoingMerry Engine0 tok/s

Architecture

Under the Hood.

GoingMerry's speed is not an accident; it's the result of a deliberate, performance-first architecture designed to extract maximum throughput from your hardware.

LAYER 01

Go Orchestrator

Async Scheduler & Memory Broker

LAYER 02

C/C++ Tensor Engine

Hardware Compiler & L3 Cache Control

LAYER 03

CUDA / Apple Metal / AVX

Bare-Metal Local Operations

01 / COMPILER OPTIMIZATIONS

Hardware-Aware Toolchain

Custom C/C++ compiler toolchain implementing branchless hot-path optimization, AVX-512/NEON vectorization on CPUs, and tensor core instructions on GPUs.

02 / CACHE ALIGNMENT

Mastering Memory Latency

Cache-aware runtime that organizes model weights to align with CPU pre-fetching mechanisms, establishing high-bandwidth L3 cache tunnels.

03 / REAL-TIME METRICS

Predictive Performance

An empirical forecasting model that predicts local tokens-per-second output based on target memory bandwidth and model parameter sizes.

04 / CORE STACK

Go + C++ Hybrid Engine

A polished, high-level Go orchestration layer for API and network logic driving a low-level C++ tensor inference engine compiled for speed.

Local Ecosystem & Models

Llama 4

Gemma 4

DeepSeek-V3

Phi 4

Qwen 3.0

Mistral Large 3

Hermes Agent

Command R7

LLaVA 2

Llama 4

Gemma 4

DeepSeek-V3

Phi 4

Qwen 3.0

Mistral Large 3

Hermes Agent

Command R7

LLaVA 2

Llama 4

Gemma 4

DeepSeek-V3

Phi 4

Qwen 3.0

Mistral Large 3

Hermes Agent

Command R7

LLaVA 2

Claude Code

OpenClaw

Continue.dev

Open WebUI

Enchanted

Lobe Chat

Page Assist

Twinny

Obsidian Copilot

Claude Code

OpenClaw

Continue.dev

Open WebUI

Enchanted

Lobe Chat

Page Assist

Twinny

Obsidian Copilot

Claude Code

OpenClaw

Continue.dev

Open WebUI

Enchanted

Lobe Chat

Page Assist

Twinny

Obsidian Copilot

LangChain

LlamaIndex

CrewAI

LiteLLM

ChromaDB

Autogen

LangGraph

Dify

Flowise

LangChain

LlamaIndex

CrewAI

LiteLLM

ChromaDB

Autogen

LangGraph

Dify

Flowise

LangChain

LlamaIndex

CrewAI

LiteLLM

ChromaDB

Autogen

LangGraph

Dify

Flowise

Pricing Plans

Flexible Licensing for Every Scale

Choose the deployment model that fits your engineering needs. From individual developers to full enterprise clusters.

🐏 Community Edition

Unrestricted local inference runtime. Run models offline on your hardware with absolute performance and zero limits.

Free

Free local execution runtime
Bare-metal CPU/GPU hardware acceleration
Air-gapped security and data privacy
Supports Llama 4, Gemma 4, DeepSeek, and more

Download Free

Enterprise Control Plane

🛡️ Enterprise Edition

Deploy, secure, and monitor local models at scale. Built for the CTO to manage AI across thousands of employee nodes or server farms.

Custom Pricing

Advanced SSO (Okta, Entra ID) & Granular RBAC
Fleet Management: Central Dashboard & Kubernetes Operator
Private Registry: Air-gapped self-hosted hub & weight encryption
GoingMerry Swarm: Multi-node distributed sharding
Advanced Observability & Department Chargeback Analytics
24/7/365 Dedicated SLAs & Commercial Indemnification

Contact Sales

Up and running soon.

We are calibrating local compiler targets and model sharding frameworks. Join the waitlist to get notified the second GoingMerry becomes available for local installation.

Join Waitlist