Moogla is a self‑hosted LLM runtime and orchestration framework for power users. It combines local‑first execution with the flexibility required for production‑grade workflows. The repository starts with a minimal project layout that lets new features be added incrementally while keeping the codebase modular, observable and developer friendly.
.
├── src/ # Python package source
│ └── moogla/ # Moogla core modules
├── tests/ # Unit and integration tests
├── pyproject.toml # Build system and dependency configuration
└── README.md
Run the setup script to create a virtual environment and install dependencies:
./scripts/setup.sh
When you want the development tools, install the optional extras:
pip install -e .[dev]
After installation, run the CLI to see available commands:
moogla --help
Pull a model and start the server with a custom plugin:
moogla pull codellama:13b
moogla serve --model path/to/codellama-13b.gguf --plugin tests.dummy_plugin
Models are stored under ~/.cache/moogla/models
by default. Set MOOGLA_MODEL_DIR
before running the pull command to use a different directory.
List downloaded files with:
moogla models
Remove a cached model with:
moogla remove model.bin
Use -y
to skip the confirmation prompt:
moogla remove model.bin -y
To use a Hugging Face model ID instead of a file path:
moogla serve --model mistralai/Mistral-7B-Instruct-v0.2
You can then query the chat completion endpoint:
curl -X POST http://localhost:11434/v1/chat/completions \
-d '{"messages": [{"role": "user", "content": "hello"}]}'
The request body may also include max_tokens
, temperature
and top_p
fields:
curl -X POST http://localhost:11434/v1/chat/completions \
-d '{"messages": [{"role": "user", "content": "hello"}], "max_tokens": 32, "temperature": 0.7}'
Plugins are regular Python modules that expose optional preprocess
and
postprocess
hooks. Asynchronous variants named preprocess_async
and
postprocess_async
are also supported. Hooks receive the current text and
return the modified value.
A plugin can specify an integer order
attribute to control execution order
when multiple plugins are loaded. Lower numbers run first.
Plugin information is stored in ~/.cache/moogla/plugins.yaml
by default.
Use --config
or the MOOGLA_PLUGIN_FILE
environment variable to point
to a different location:
MOOGLA_PLUGIN_FILE=/opt/plugins.json moogla plugin list
Running servers can refresh plugins on demand with:
moogla reload-plugins
This calls the /reload-plugins
endpoint and reloads modules using the
current configuration.
Local models loaded through llama-cpp-python
do not expose an asynchronous
interface. When LLMExecutor.acomplete
is called with such a model, inference
runs in a background thread via asyncio.to_thread
. Heavy local workloads can
therefore limit overall throughput compared to fully async providers.
The project uses Typer for the CLI and FastAPI for the server. A helper script is provided for creating a virtual environment and installing the optional development tools:
./scripts/setup.sh -d -t
During development you can run the server with auto reload using uvicorn
:
uvicorn moogla.server:create_app --reload
Before running the test suite make sure the development extras are installed:
pip install -e .[dev]
pytest
To try the browser UI run the server and open the bundled web app:
moogla serve
Then navigate to http://localhost:11434/app. Double‑click a chat bubble to copy its contents and use the dark‑mode toggle in the header to switch themes.
Set MOOGLA_API_KEY
to enable simple header based authentication. Requests must
include an X-API-Key
header matching the configured value. Optionally set
MOOGLA_RATE_LIMIT
to limit the number of requests per minute from a single IP.
When rate limiting is enabled, MOOGLA_REDIS_URL
controls the Redis connection
used for tracking request counts (default redis://localhost:6379
). These values
can also be passed to create_app
or moogla serve
.
Set MOOGLA_CORS_ORIGINS
to send CORS headers for a comma-separated list of
allowed origins.
Set MOOGLA_LOG_LEVEL
to control application logging (default INFO
).
Set MOOGLA_HOST
and MOOGLA_PORT
to change the bind address
(defaults 127.0.0.1:11434
).
These values can also be provided via the --cors-origins
, --log-level
and
--host
, --port
and --token-exp-minutes
options when running moogla serve
.
The API also exposes /register
and /login
endpoints for JWT-based
authentication. POST a username and password to /register
to persist a user
record, then call /login
with the same credentials to obtain a token.
Include this token in an Authorization: Bearer <token>
header when calling the
LLM routes. Tokens are signed with a random secret and stored in an in-memory
SQLite database by default. Set MOOGLA_JWT_SECRET
and MOOGLA_DB_URL
to keep
credentials valid across restarts. When the secret is not configured a new
value is generated on every start, so previously issued tokens will no longer
work after a restart. MOOGLA_TOKEN_EXP_MINUTES
controls how long
issued tokens remain valid (default 30
). Authentication support relies on the
SQLModel
, passlib
and python-jose
packages.
Build the image and start the server:
docker build -t moogla .
docker run -p 11434:11434 moogla
The Dockerfile now uses multi-stage builds so the final image only contains the installed package and its runtime dependencies.
You can also use docker-compose
to start the service with a few sensible
defaults. The compose file mounts a local ./models
directory into /models
inside the container and exposes port 11434
. Environment variables can be
used to configure remote providers and select the model to load. MOOGLA_MODEL
defaults to codellama:13b
in the compose file and can be changed to any
local file or Hugging Face ID:
OPENAI_API_KEY=sk-... MOOGLA_MODEL=codellama:13b docker-compose up
The variables may also be placed in a .env
file so they are picked up
automatically when running docker-compose
.
Authentication data is stored in an in-memory SQLite database by default, so
all user records disappear when the server stops. Point MOOGLA_DB_URL
at a
real database to keep these records across restarts. Any SQLAlchemy compatible
URL can be used:
MOOGLA_DB_URL=sqlite:///data/moogla.db moogla serve
# or
MOOGLA_DB_URL=postgresql://user:pass@localhost/moogla moogla serve
Plugin information lives in ~/.cache/moogla/plugins.yaml
. Set
MOOGLA_PLUGIN_FILE
(or use the --config
flag) to choose a different
location. When running in Docker this file should be placed on a mounted
volume so plugin settings survive container restarts:
MOOGLA_PLUGIN_FILE=/data/plugins.yaml docker-compose up
Remember to back up your database and plugin configuration file. If the schema changes in future versions you may need to migrate existing data before upgrading.
Full documentation is available on GitHub Pages and can be built locally with:
mkdocs serve
Create a standalone binary using pyinstaller
:
pip install pyinstaller
./scripts/build_package.sh dist/moogla.exe
Place the resulting file under dist/
so /download
can serve it.
src
layout for all packages and modules.tests/
.pre-commit install
.