I try to stay at the cutting edge of everything AI, especially when it comes to LLM-enabled development. I've tried GitHub Copilot, Supermaven, and many other AI code completion tools. However, earlier this week I gave a try to locally hosted LLMs and I am not coming back.
Setup
These instructions assume that you are a macOS user.
What about LM Studio? I saw a few posts debate one over the other. LM Studio has intuitive UI; Ollama does not. However, my research led me to belief that Ollama is faster than LM Studio.
Install the model that you want to use.
ollama pull starcoder2:3b
I've evaluated a few and landed on starcoder2:3b. It provides a good balance of usefuless and interference speed.
For context, the following table shows the speed of each model.
Model
tokens/second
starcoder2:3b
99
llama3.1:8b
54
codestral:22b
21
Finally, install a continue.dev – a VSCode extension that enables tab completion (and chat) using local LLMs.
Then update continue.dev settings to use the desired model.
Ensure that you've disabled GitHub Copilot and other overlapping VSCode extensions.
How to use it?
We all know how auto-completion works. However, there are a few shortcuts that are worth memorizing:
Accept a full suggestion by pressing Tab
Reject a full suggestion with Esc
For more granular control, use cmd + → to accept parts of the suggestion word-by-word.
Pros and Cons
Pros
Offline Availability: Work anywhere without relying on an internet connection.
Privacy: Your code and prompts never leave your machine, ensuring maximum data privacy.
Customization: Ability to fine-tune models to your specific needs or codebase.
No Subscription Costs: Once set up, there are no ongoing fees unlike many cloud-based services.
Consistent Performance: No latency issues due to poor internet connection or server load.
Open Source: Many local LLMs are open-source, allowing for community improvements and transparency.
Cons
Initial Setup Time: Requires some time and technical knowledge to set up properly.
Hardware Requirements: Local LLMs can be resource-intensive, requiring a reasonably powerful machine.
Limited Model Size: Typically, local models are smaller than their cloud-based counterparts, which might affect performance for some tasks.
Manual Updates: You need to manually update models and tools to get the latest improvements.
Closing Thoughts
I was hesitant to adopt local LLMs because services like GitHub Copilot "just work." However, as I've been traveling the world, I found myself often regretting having to depend on Internet connection for my auto completions. In that sense, switching to a local model has been a huge win for me. If Internet connectivity was not issue, I think services like Supermaven are still very appealing and worth the cost.
If you are not familiar with Supermaven and if you are Okay with depending on Internet connection, then it's worth checking out. Compared to GitHub Copilot, I found Supermaven's auto completion to be much more reliable and much faster.
However, if you are like me and want your code completion to work with or without an Internet connection, then this is definitely worth a try.