AI capabilities / Custom model training

Custom AI model training for niches general models miss.

Fine tuned LLMs for vertical use. Domain specific embedding models. Private model deployment on Apple Silicon or NVIDIA. We operate MEGAMIND, our own federated neural network, so we know how this work goes wrong before we ship it for a client.

01What it does

When custom training beats off the shelf.

Most small businesses do not need custom model training. They need RAG over their content with a frontier LLM. But for narrow vertical work where general models miss the vocabulary or the standards, fine tuning or domain specific embeddings can earn their keep.

There are three reasons to do custom model work. First, the domain has specialized vocabulary that general models do not handle well (specific medical sub specialties, niche compliance frameworks, technical jargon). Second, the workload is high enough that the per token cost of frontier models becomes painful (millions of queries per month). Third, the data cannot leave your environment for regulatory reasons.

For most small businesses, RAG plus prompt engineering on Claude or GPT solves the problem. We do custom training when there is a specific reason to do it, not as a default.

02How it works

The work that makes custom training succeed.

Custom model work is a data project before it is a model project. The data needs to be high quality, large enough to generalize, and representative of the inference workload. We start every custom training engagement with a data audit: what do you have, what is the quality, what is missing.

For fine tuning we use parameter efficient methods (LoRA, QLoRA) on open weight models (Llama, Mistral, Qwen) that you can deploy yourself. For domain specific embeddings we train on triplet loss over your domain data. For private deployment we run on Apple Silicon (where MEGAMIND lives) or on NVIDIA inference servers in your environment.

Sources: Joseph Anady on HuggingFace, LoRA paper, QLoRA paper, MEGAMIND federated network.

03Stack

What we build with.

Custom training stack.

Base model

starting point
LlamaMistralQwen

Fine tuning

method
LoRAQLoRAFull fine tune

Embeddings

vector model
BGEE5custom triplet

Inference

deployment
Apple SiliconNVIDIA H100vLLMllama.cpp
05Pricing and timeline

What custom training costs.

Custom training engagements start at $2,997 Enterprise tier and scale with data volume, training compute, and deployment infrastructure. Most engagements run $5,000 to $25,000.

The data audit and feasibility study is $2,997. Recommendation may be that custom training is not the right answer; in that case the audit deliverable is the recommended alternative architecture.

06FAQ

Custom model training FAQ.

Should I fine tune?

Probably not. Most small businesses get more value from RAG plus a frontier model. We start with a feasibility study to confirm fine tuning is the right answer.

Will the model run on my hardware?

Models can be sized to run on Apple Silicon (Mac mini, Mac Studio) or NVIDIA. We pick the model size to fit the inference hardware and the latency budget.

How long does training take?

LoRA fine tuning on a 7B model runs hours to a day. Full fine tuning on a 70B model runs days. Domain embedding training runs hours.

Who owns the model weights?

You do. We deliver the trained weights plus the training data plus the training code. You can re train, re deploy, or sell the model.

What about model evaluation?

Every engagement ships with an evaluation set and metrics that the model is graded against. We do not ship a model that fails the evaluation.

Ready to scope a Custom model training project?