6 AI Models, 6 Tough Questions, 1 clear winner?

In a previous post, I showed how to setup Ollama and start downloading and using free to use large language models (LLMs). I was curious what model is fastest for some simple responses. Biggest thing to note, I am running on a CPU only system, so the responses are going to be much slower then a GPU system.

There a various different criteria that you could measure such as:

  • Response time
  • Accuracy and Comprehension
  • Creativity ability
  • Bias and Ethics
  • Resource Utilization

I’m taking a straightforward (and elementary) approach to this, focusing on two key factors: response time and which answer I personally find the best.

  • Response time will equal what Ollama list as “approximate_total”
  • For the best response, this is totally subjective to my opinion, so take that with a grain of salt.

Note: At the last minute, I decided to add llama3.1:8b during this test and decided to omit llama3.2:latest 3.2B in the scoring here at the end due to it being the smallest parameter which inherently will have the fastest results. However I did continue to include it in my results below.

Note: I didn’t include all the response from the models as I couldn’t find a clean way to add them without cluttering up the page, and second if I would get flagged by google for adding a bunch if AI generated content.

My first question I asked was “Tell me about Zion National Park“,

ModelTotal TimeComments on response
olmo2:13b2 min 42 secondsMy favorite response
olmo2:7b2 min 34 seconds
deepseek-r1:7b4 min 1 second
gemma:7b3 min 59 seconds
llama3.2:latest 3.2B1 min 34 seconds * see note
llama3.1:8b4 mins 17 secondsSecond favorite response

Olmo2:7b coming in for the win here for the fastest. However I thought olmo2:13b has the best written response.

My second question was “Create me a half marathon training plan

ModelTotal TimeComments on response
olmo2:13b 12 mins 13 secondsFavorite response (12 week training plan)
olmo2:7b4 mins 40 seconds12 week training plan
deepseek-r1:7b5 mins 45 secondsBest formatted response, but only created a 8 week training plan. Would have been tied for first if it was 12 weeks in my opinion.
gemma:7b3 mins 11 seconds8 week training plan – terrible response
llama3.2:latest 3.2B2 mins 4 seconds * see note12 week training plan
llama3.1:8b5 mins 17 seconds16 week training plan – very similar to olmo2:13b response.

Having ran a couple of marathons and half marathons, I was curious what this would provide as responses. Three of the models provided 12 week plans, while 2 of them provided 8 week plans. I personally think 8 week plans are to short, unless you are already running a ton of miles, so I personally discounted those responses. Winner goes to olmo2:13b.

My third question was “Tell me the ethical concerns related to AI

ModelTotal TimeComments on response
olmo2:13b9 mins 15 secondsThree way tie
olmo2:7b3 mins 3 seconds
Three way tie
deepseek-r1:7b8 mins 10 seconds
Three way tie
gemma:7b6 mins 26 seconds
llama3.2:latest 3.2B1 min 27 seconds * see note
llama3.1:8b3 mins 5 seconds

The responses were all pretty similar, just some had more details. Gemma and llama had the least amount of detail. I would consider it a three way tie inbetween olmo2:13b, olmo2:7b, and deepseek-r1:7b

Forth question is “Outline the steps to prove that the sum of the interior angles of a triangle is 180 degrees.

ModelTotal TimeComments on response
olmo2:13b8 mins 17 secondsnearly the same as olmo2:7b
olmo2:7b3 mins 21 secondsnearly the same as olmo2:13b
deepseek-r1:7b4 mins 33 seconds
gemma:7b3 mins 3 secondsVery simple
llama3.2:latest 3.2B2 mins 20 seconds * see note
llama3.1:8b3 mins 41 seconds

All of the answers had the same outline, but I thought the olmo2:7b and olmo2:13b had the best responses with llama3.2:latest as a close third place. Winner goes to olmo2:7b because it was faster then olmo2:13b model.

My fifth and last question is “If you could change anything about your algorithm, what would it be?

ModelTotal TimeComments on response
olmo2:13b4 mins 37 seconds
olmo2:7b2 mins 41 seconds
deepseek-r1:7b3 mins 32 seconds
gemma:7b3 min 39 seconds
llama3.2:latest 3.2B1 min 58 seconds * see note
llama3.1:8b2 mins 55 secondsBest response

Almost all of the answers were very similar, however the way llama3.1:8b provided the response, I thought it was the best.

Last request for the models is a rewrite based on some input, I took a popular typing sentence of “The quick brown fox jumps over the lazy dog” and a NASA space shuttle used a variant of that at message which is “The lazy yellow dog was caught by the slow red fox as he lay sleeping in the sun”

Request into the model was: “Rewrite this for better readability – the lazy yellow dog was caught by the slow red fox as he lay sleeping in the sun”

ModelTotal TimeComments on response
olmo2:13b3 min 59 secondsResponse: The sleepy yellow dog was surprised when the slow red fox quietly pounced on him while he napped in the sunlight.
olmo2:7b2 mins 18 secondsResponse: As the sun bathed the area in warmth, the lazy yellow dog fell asleep. Unbeknownst to him, a slow red fox stealthily approached and caught the dozing dog in its paws.
deepseek-r1:7b5 mins 44 secondsResponse: While the lazy yellow dog dozed peacefully under the sun’s warmth, a slow red fox quietly snatched him in its grasp.
gemma:7b4 mins 15 secondsResponse: The sluggish yellow dog, basking in the sunlight’s embrace, was unexpectedly captured by the cunning slow red fox. Its slumbering state left it vulnerable to the fox’s swift action.
llama3.2:latest 3.2B1 min 5 seconds * see noteResponse: The lazy yellow dog, caught off guard while resting in the warm sun, was swiftly snatched by a cunning slow red fox.
llama3.1:8b2 mins 3 secondsResponse: The lazy yellow dog, who had been basking in the warm sunlight, was suddenly and unexpectedly caught off guard by the cunning slow red fox.

As the dog lay sleeping, his defenses down, the fox crept up quietly behind him. The dog’s usual vigilance and quick thinking were nowhere to be found as he succumbed to the seductive warmth of the sun.

The fox, taking advantage of the dog’s momentary lapse in alertness, seized the opportunity to pounce on its unsuspecting prey. The poor yellow dog was caught completely off guard, his laziness proving to be a fatal mistake.

In a flash, the fox had the dog trapped, and with a cunning grin, it settled down beside him, ready to enjoy the spoils of its victory. The lazy yellow dog, on the other hand, was left to face the consequences of his own carelessness.

I thought most of the answers were very similar, so I think this one is more preference then some of the other answers. I didn’t like how llama3.1:8b made it into a story.

Here I am going to summarize which ones are the fastest and which ones I thought were the best responses for me.

RequestFastestFavorite
Tell me about Zion National Parkolmo2:7bolmo2:13b
Create me a half marathon training plangemma:7bolmo2:13b
Tell me the ethical concerns related to AIolmo2:7bolmo2:13b
olmo2:7b
deepseek-r1:7b
Outline the steps to prove that the sum of the interior angles of a triangle is 180 degrees.gemma:7bolmo2:7b
If you could change anything about your algorithm, what would it be?olmo2:7bllama3.1:8b
Rewrite this for better readability – the lazy yellow dog was caught by the slow red fox as he lay sleeping in the sunllama3.1:8bolmo2:13b

While I did note above I was going to exclude llama3.2-latest 3.2B, it did have some decent answer sometimes and ways always the fastest, so if you are looking for something quick, this might be a good model to start with.

Overall both olmo models and the llama3.1 models showed up more often then the other ones. While gemma:7b was the fastest on creating the half marathon training plan and prove the triangle interior angles is 180 degrees, I didn’t like the responses as much as others. Deepseek seemed to always be middle of the pack. I am unsure if there is a clear winner, but I do feel like there is some clear losers, and I would included deepseek-r1:7b or gemma:7b in that category.

I know this is a very elementary and subjective way to measure decide which LLM is the best, but at the end of the day, the interpreted results back are what matter the most. I enjoyed testing these out and learning a little more about each of these models.

What are your thoughts? What model has been your favorite?

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top