In a previous post, I showed how to setup Ollama and start downloading and using free to use large language models (LLMs). I was curious what model is fastest for some simple responses. Biggest thing to note, I am running on a CPU only system, so the responses are going to be much slower then a GPU system.
There a various different criteria that you could measure such as:
- Response time
- Accuracy and Comprehension
- Creativity ability
- Bias and Ethics
- Resource Utilization
I’m taking a straightforward (and elementary) approach to this, focusing on two key factors: response time and which answer I personally find the best.
- Response time will equal what Ollama list as “approximate_total”
- For the best response, this is totally subjective to my opinion, so take that with a grain of salt.
Note: At the last minute, I decided to add llama3.1:8b during this test and decided to omit llama3.2:latest 3.2B in the scoring here at the end due to it being the smallest parameter which inherently will have the fastest results. However I did continue to include it in my results below.
Note: I didn’t include all the response from the models as I couldn’t find a clean way to add them without cluttering up the page, and second if I would get flagged by google for adding a bunch if AI generated content.
My first question I asked was “Tell me about Zion National Park“,
Model | Total Time | Comments on response |
olmo2:13b | 2 min 42 seconds | My favorite response |
olmo2:7b | 2 min 34 seconds | |
deepseek-r1:7b | 4 min 1 second | |
gemma:7b | 3 min 59 seconds | |
llama3.2:latest 3.2B | 1 min 34 seconds * see note | |
llama3.1:8b | 4 mins 17 seconds | Second favorite response |
Olmo2:7b coming in for the win here for the fastest. However I thought olmo2:13b has the best written response.
My second question was “Create me a half marathon training plan“
Model | Total Time | Comments on response |
olmo2:13b | 12 mins 13 seconds | Favorite response (12 week training plan) |
olmo2:7b | 4 mins 40 seconds | 12 week training plan |
deepseek-r1:7b | 5 mins 45 seconds | Best formatted response, but only created a 8 week training plan. Would have been tied for first if it was 12 weeks in my opinion. |
gemma:7b | 3 mins 11 seconds | 8 week training plan – terrible response |
llama3.2:latest 3.2B | 2 mins 4 seconds * see note | 12 week training plan |
llama3.1:8b | 5 mins 17 seconds | 16 week training plan – very similar to olmo2:13b response. |
Having ran a couple of marathons and half marathons, I was curious what this would provide as responses. Three of the models provided 12 week plans, while 2 of them provided 8 week plans. I personally think 8 week plans are to short, unless you are already running a ton of miles, so I personally discounted those responses. Winner goes to olmo2:13b.
My third question was “Tell me the ethical concerns related to AI“
Model | Total Time | Comments on response |
olmo2:13b | 9 mins 15 seconds | Three way tie |
olmo2:7b | 3 mins 3 seconds | Three way tie |
deepseek-r1:7b | 8 mins 10 seconds | Three way tie |
gemma:7b | 6 mins 26 seconds | |
llama3.2:latest 3.2B | 1 min 27 seconds * see note | |
llama3.1:8b | 3 mins 5 seconds |
The responses were all pretty similar, just some had more details. Gemma and llama had the least amount of detail. I would consider it a three way tie inbetween olmo2:13b, olmo2:7b, and deepseek-r1:7b
Forth question is “Outline the steps to prove that the sum of the interior angles of a triangle is 180 degrees.“
Model | Total Time | Comments on response |
olmo2:13b | 8 mins 17 seconds | nearly the same as olmo2:7b |
olmo2:7b | 3 mins 21 seconds | nearly the same as olmo2:13b |
deepseek-r1:7b | 4 mins 33 seconds | |
gemma:7b | 3 mins 3 seconds | Very simple |
llama3.2:latest 3.2B | 2 mins 20 seconds * see note | |
llama3.1:8b | 3 mins 41 seconds |
All of the answers had the same outline, but I thought the olmo2:7b and olmo2:13b had the best responses with llama3.2:latest as a close third place. Winner goes to olmo2:7b because it was faster then olmo2:13b model.
My fifth and last question is “If you could change anything about your algorithm, what would it be?“
Model | Total Time | Comments on response |
olmo2:13b | 4 mins 37 seconds | |
olmo2:7b | 2 mins 41 seconds | |
deepseek-r1:7b | 3 mins 32 seconds | |
gemma:7b | 3 min 39 seconds | |
llama3.2:latest 3.2B | 1 min 58 seconds * see note | |
llama3.1:8b | 2 mins 55 seconds | Best response |
Almost all of the answers were very similar, however the way llama3.1:8b provided the response, I thought it was the best.
Last request for the models is a rewrite based on some input, I took a popular typing sentence of “The quick brown fox jumps over the lazy dog” and a NASA space shuttle used a variant of that at message which is “The lazy yellow dog was caught by the slow red fox as he lay sleeping in the sun”
Request into the model was: “Rewrite this for better readability – the lazy yellow dog was caught by the slow red fox as he lay sleeping in the sun”
Model | Total Time | Comments on response |
olmo2:13b | 3 min 59 seconds | Response: The sleepy yellow dog was surprised when the slow red fox quietly pounced on him while he napped in the sunlight. |
olmo2:7b | 2 mins 18 seconds | Response: As the sun bathed the area in warmth, the lazy yellow dog fell asleep. Unbeknownst to him, a slow red fox stealthily approached and caught the dozing dog in its paws. |
deepseek-r1:7b | 5 mins 44 seconds | Response: While the lazy yellow dog dozed peacefully under the sun’s warmth, a slow red fox quietly snatched him in its grasp. |
gemma:7b | 4 mins 15 seconds | Response: The sluggish yellow dog, basking in the sunlight’s embrace, was unexpectedly captured by the cunning slow red fox. Its slumbering state left it vulnerable to the fox’s swift action. |
llama3.2:latest 3.2B | 1 min 5 seconds * see note | Response: The lazy yellow dog, caught off guard while resting in the warm sun, was swiftly snatched by a cunning slow red fox. |
llama3.1:8b | 2 mins 3 seconds | Response: The lazy yellow dog, who had been basking in the warm sunlight, was suddenly and unexpectedly caught off guard by the cunning slow red fox. As the dog lay sleeping, his defenses down, the fox crept up quietly behind him. The dog’s usual vigilance and quick thinking were nowhere to be found as he succumbed to the seductive warmth of the sun. The fox, taking advantage of the dog’s momentary lapse in alertness, seized the opportunity to pounce on its unsuspecting prey. The poor yellow dog was caught completely off guard, his laziness proving to be a fatal mistake. In a flash, the fox had the dog trapped, and with a cunning grin, it settled down beside him, ready to enjoy the spoils of its victory. The lazy yellow dog, on the other hand, was left to face the consequences of his own carelessness. |
I thought most of the answers were very similar, so I think this one is more preference then some of the other answers. I didn’t like how llama3.1:8b made it into a story.
Here I am going to summarize which ones are the fastest and which ones I thought were the best responses for me.
Request | Fastest | Favorite |
Tell me about Zion National Park | olmo2:7b | olmo2:13b |
Create me a half marathon training plan | gemma:7b | olmo2:13b |
Tell me the ethical concerns related to AI | olmo2:7b | olmo2:13b olmo2:7b deepseek-r1:7b |
Outline the steps to prove that the sum of the interior angles of a triangle is 180 degrees. | gemma:7b | olmo2:7b |
If you could change anything about your algorithm, what would it be? | olmo2:7b | llama3.1:8b |
Rewrite this for better readability – the lazy yellow dog was caught by the slow red fox as he lay sleeping in the sun | llama3.1:8b | olmo2:13b |
While I did note above I was going to exclude llama3.2-latest 3.2B, it did have some decent answer sometimes and ways always the fastest, so if you are looking for something quick, this might be a good model to start with.
Overall both olmo models and the llama3.1 models showed up more often then the other ones. While gemma:7b was the fastest on creating the half marathon training plan and prove the triangle interior angles is 180 degrees, I didn’t like the responses as much as others. Deepseek seemed to always be middle of the pack. I am unsure if there is a clear winner, but I do feel like there is some clear losers, and I would included deepseek-r1:7b or gemma:7b in that category.
I know this is a very elementary and subjective way to measure decide which LLM is the best, but at the end of the day, the interpreted results back are what matter the most. I enjoyed testing these out and learning a little more about each of these models.
What are your thoughts? What model has been your favorite?