MDST Engine: run GGUF
models in your browser

Built using WASM and WebGPU for 100% local inference

MDST Research module screenshot — Load, tune, run, and publish your own GGUF models in Chrome, Safari, or Edge.

5 min read

TLDR: MDST brings GGUF to WebGPU, the most popular format for LLMs, so anyone can create, edit, and review any files and collaborate from their browser without being dependent on cloud LLM providers or complicated setups.

In 2026, more people want local models that they can actually run and trust, and the hardware and software are finally catching up. Better consumer GPUs, new models and better quantizations are making “local” feel normal and accessible as never before.

So we built a WASM/JS engine that can run GGUF on WebGPU. The GGUF format is one of the most popular LLM formats and supports various quantizations. Shipped in a single-file container, it is best for consumer-grade devices and easy to download, cache, and tune.

We believe this will open a new, bigger market for GGUF: fast, local inference for anyone who just wants it to work, right in the browser.

MDST Engine in browser — Local inference in the browser is only going to get faster and more accessible.

What MDST is

MDST is a free, agentic, secure, collaborative IDE, with cloud and local agentic inference integrated directly into the workspaces.

Instead of copying context between tools and teammates, MDST can sync, merge, and store everything inside one or many projects, with shared access to your files, history, and full context for your team, while keeping everything E2E encrypted and GDPR-compliant.

With MDST, you can:

Download and run LLMs in your browser, within a click, no more complicated setups, anyone can do it from any device that supports WebGPU.
Sync projects in real-time, with GitHub or local filesystem, with MDST you'll never lose your work or any changes you made.
Stay stable under load, without getting locked into a single provider's API mood swings or quality downgrades.
Keep files and conversations private, with end-to-end encryption as a first-class default. Signal-style privacy mode included.
Benchmark models where they run, with local runs feeding a public WebGPU leaderboard.

MDST workspace — Earn research points by running and sharing your models.

Research and learning for everyone

From now on, all you need for local inference, LLM learning, and research is a modern browser (Chrome, Safari, Edge supported, Firefox coming soon), a laptop that is five years old or newer (an M1 MacBook Air works well with small models), and a GGUF model.

Open the Research module and run the local benchmark suite across tasks and difficulty levels to test and promote your model or sampling parameters on the public WebGPU leaderboard.

Every run happens on your own machine, in your browser, and produces results that remain comparable over time. The leaderboard ranks models by a weighted score across benchmark tasks and difficulty levels, so higher-difficulty tasks matter more than easy wins:

WebGPU Leaderboard

Models ranked by weighted score across benchmark tasks and difficulty levels, including code generation, sentiment analysis, and more.

Last Updated

Total Models

Total Runs

Model	MDST score	Size	Avg. time	Runs
No results yet

Disclaimer: WIP. Scores, model sets, and engine tuning and optimization are subject to change.
Some models tend to perform consistently stable across tasks and difficulty levels, while others vary more.

Why now and what's next

Right now MDST already supports these cloud and local LLM families in different quantizations, and we're working on adding more:

Type	Family	Models
Cloud	Claude	Sonnet 4.5, Opus 4.6
Cloud	OpenAI GPT	5.2, 5.1 Mini, Codex 5.2
Cloud	Gemini	3 Pro Preview
Cloud	Kimi	K2
Cloud	DeepSeek	V3.2
Local GGUF	Qwen	3 Thinking
Local GGUF	Ministral	3 Instruct
Local GGUF	LFM	2.5
Local GGUF	Gemma	3 IT

Such a powerful combination gives users and us a lot of flexibility to choose the best model for the task at hand, and learn from experiments with different prompts and quantizations.

WebGPU is finally fast enough on mainstream hardware, and GGUF has become the simplest way to ship a quantized model as a single artifact. Put together, that makes real local inference in the browser practical and accessible for everyone, even on modest hardware.

We welcome everyone to contribute to the project, and to help steer the roadmap. Sign in for free or start as an early supporter, run your favorite GGUF model, and contribute to the benchmark suite or use best cloud models for any of your tasks.

Here's invite code to skip the waitlist and x2 tokens: blog_EqA5jPF

Your feedback and support will show us what to optimize next. Please, share this post with the community — we'll appreciate your support!

Follow our progress on MDST Telegram and X channels.

Author

MDST

MDST Engine: run GGUFmodels in your browser

What MDST is

Research and learning for everyone

WebGPU Leaderboard

Why now and what's next

MDST Engine: run GGUF
models in your browser