Running AI Locally: LM Studio, Ollama, and Microsoft Foundry
You don't need to send your data to the cloud to use AI. Local LLM tools let you run powerful models on your own hardware. Here's what they can do and when they make sense.
Intro
Most people interact with AI through the cloud — ChatGPT, Claude, Gemini. You type a question, it goes to a remote server, and the answer comes back. Simple, convenient, powerful.
But what if you want to run AI on your own computer? No data leaving your machine. No per-query costs. No internet connection required. No concerns about your private data being used for training.
Local AI is a rapidly growing space. Thanks to open-source models and tools that make them easy to run, it’s now possible to run capable language models on a decent laptop or desktop. You won’t get GPT-4 levels of performance, but for many tasks, local models are surprisingly good.
This article covers the three main tools for running local AI — LM Studio, Ollama, and Microsoft Foundry — what each is good for, and how to decide which one fits your needs.
Why Run AI Locally?
There are several reasons you might want to run AI on your own hardware instead of using cloud services:
Privacy. Your data stays on your machine. If you’re working with sensitive business information, customer data, or proprietary documents, sending that data to a third-party AI service may not be acceptable. Local AI eliminates that concern entirely.
Cost predictability. Cloud AI charges per token — every query has a cost. If you’re processing large volumes of data, those costs add up fast. Local AI has a fixed cost (the hardware) and then runs free. For heavy usage, local is dramatically cheaper.
Offline capability. Local AI works without an internet connection. This matters for field operations, secure facilities, or any situation where connectivity is unreliable.
Latency. Local AI responds faster because there’s no network round trip. For interactive applications, this can make a noticeable difference.
Experimentation. Local tools make it easy to try different models, compare performance, and experiment without worrying about API costs.
LM Studio
LM Studio is a desktop application that makes running local language models as easy as installing any other program. You download it, pick a model from the built-in catalog, and start chatting.
What it’s good for: Individual users who want to experiment with local AI. Researchers who want to compare models. Anyone who wants a simple, graphical interface for running local models.
How it works: LM Studio provides a clean chat interface similar to ChatGPT. You select a model, it downloads automatically, and you can start talking to it immediately. It also provides a local API server that other applications can connect to, making it useful as a local AI backend for development.
Hardware requirements: Works best with a GPU, but can run on CPU-only machines for smaller models. 8-16GB of RAM minimum. 32GB+ recommended for larger models.
Cost: Free and open source.
Best for: Getting started with local AI quickly. The easiest tool to set up and start using.
Ollama
Ollama is a command-line tool and server for running local language models. It’s designed for developers and power users who want to integrate local AI into their workflows and applications.
What it’s good for: Developers building applications that use local AI. Teams that want to run models as a service on a shared machine. Anyone comfortable with the command line.
How it works: Install Ollama, pull a model with a single command (ollama pull llama3), and you have a running AI server. It provides a REST API that any application can call. You can also use it from the command line for quick queries.
Hardware requirements: Similar to LM Studio — GPU recommended, but CPU works for smaller models. More efficient than LM Studio for running models as a service.
Cost: Free and open source.
Best for: Developers, automation, and running models as a background service. More flexible than LM Studio for integration into other tools.
Microsoft Foundry
Microsoft Foundry (part of Azure AI Foundry) is a platform for building, testing, and deploying AI applications. It includes tools for running models locally as well as in the cloud.
What it’s good for: Organizations already using Microsoft’s ecosystem. Teams that need to move from local experimentation to cloud deployment seamlessly. Enterprise-grade AI development.
How it works: Foundry provides a unified environment for AI development. You can prototype locally using the same tools and APIs that deploy to Azure. This makes it easy to start locally and scale to the cloud when needed.
Hardware requirements: Enterprise-focused. Works on local development machines but really shines when connected to Azure compute resources.
Cost: The local tools are free. Cloud compute is pay-as-you-go through Azure.
Best for: Enterprise teams building production AI applications. Organizations that want a smooth path from local development to cloud deployment.
Which One Should You Choose?
Start with LM Studio if you want to try local AI for the first time. It’s the easiest to set up and requires no technical knowledge. Download, pick a model, and start chatting.
Use Ollama if you’re a developer or want to integrate local AI into your tools and workflows. It’s more flexible and efficient for running models as a service.
Consider Microsoft Foundry if you’re already using Azure and building production AI applications. It provides the smoothest path from development to deployment.
Run multiple tools — they complement each other. Use LM Studio for experimentation, Ollama as your local server, and Foundry for production deployment.
What You Give Up
Local AI is not a replacement for cloud AI in all cases. You’re trading some capability for privacy and cost control:
- Smaller models. Local models are typically 7-70 billion parameters. Cloud models like GPT-4 are estimated to be much larger. Local models are less capable at complex reasoning.
- Slower for large batches. Cloud GPUs are more powerful than consumer hardware. For processing large volumes, cloud is faster.
- Setup and maintenance. You need to download models, manage storage, and keep things running. Cloud AI requires none of this.
- Model selection. Cloud services give you access to the latest, most capable models. Local models lag behind by months.
How To Get Started
-
Check your hardware. Most modern computers with 16GB+ RAM can run smaller models. For larger models, a GPU with at least 8GB VRAM is recommended.
-
Install a tool. Start with LM Studio for the simplest experience, or Ollama if you’re comfortable with the command line.
-
Pick a model. For general use, try Llama 3, Mistral, or Phi-3. These are well-supported and run on consumer hardware.
-
Start experimenting. Ask questions, process documents, test use cases relevant to your business. See what local AI can do for you.
-
Evaluate the tradeoffs. Compare local results with cloud AI for your specific use cases. Some tasks work great locally. Others may still need cloud capabilities.
Conclusion
Local AI is a rapidly improving space. The tools — LM Studio, Ollama, Microsoft Foundry — make it accessible to anyone with a decent computer. For businesses with privacy requirements, high-volume processing needs, or a desire to control costs, local AI is a compelling option.
It won’t replace cloud AI entirely. Cloud models are still more capable. But as open-source models improve and hardware gets more powerful, the gap is narrowing. Running AI locally is not just possible — for many use cases, it’s the smart choice.
Ready to put AI to work?
We help businesses identify the right AI opportunities and build custom solutions that actually deliver value.
Talk to our AI teamAbout Microbian Systems
We are a full-service software consultancy helping startups and small to medium enterprises succeed by delivering modern, scalable solutions across web, desktop, and mobile. Our team excels in designing complex systems but we also know when simplicity wins. We build secure, performant applications tailored to each client's growth stage.