Can You Trust AI to Fix Your Car?

2025-05-24

Mechanics have long had to deal with customers turning up with their own ideas of what is wrong with their car, with the diagnoses often varying significantly in quality and accuracy. With Google now displaying AI-generated summaries at the top of the search results pages for certain questions, drivers can quickly search for the remedy to the issues they are having with their vehicle. Many owners have turned to ChatGPT to diagnose issues with their cars, getting an instant response, often in some depth.

But are these AI-generated summaries accurate?

The view from the mechanics

To find out how accurate these AI-generated summaries are, we interviewed a specialist. George Palmer, from George Edward Specialist Cars, knows Porsche 911’s inside out, especially the 996 generation manufactured between 1997 and 2006. We pitted George up against ChatGPT and Google’s Gemini to answer the following dilemma:

“my porsche 996 is making a ticking noise”

The ChatGPT response

ChatGPT was quick to disclose that the issue could range from “minor to serious”, before providing a breakdown of the most common culprits that could be causing the issues. These included:

Lifter tick
Exhaust leak
Fuel injector noise
Loose spark plugs
Camshaft deviation
Worn IMS bearings

All of the suggestions included a symptom, cause and potential fix, as well as some basic model-specific customisation about known issues of the 996.

This will be followed up by suggestions of what to do, including:

To check the oil level and quality
Localise the sound
Get a professional diagnosis
Compare a warm engine to a cold engine

Gemini response

Gemini followed a similar approach to ChatGPT, indicating that there are numerous potential causes of the issue, suggesting:

Lifter ticking
Exhaust leaks
Other engine-related problems, including spark plugs, fuel injectors and engine damage

Google’s AI-generated responses included the common sources of the ticking noise, as well as a short paragraph about troubleshooting the issues.

Differences between the two AI responses

The obvious difference is that ChatGPT suggested six potential issues, whereas Gemini mustered up three possible causes. Both mentioned the lifter ticking, indicating worn or damaged hydraulic valve lifters, as well as potential exhaust leaks or engine difficulties.

What does the expert say?

George, one of the UK’s most prominent Porsche and car restoration experts from George Edward, wasn’t overly impressed.

What became evident was the lack of car-specific personalisation regarding the answers, most notably with Gemini’s response.

“The AI suggestions are along the right lines, but have no model-specific knowledge or relevance, which is crucial when maintaining specialist cars such as an old Porsche”.

This one example showcases an issue that affects a certain type of generative AI models; those trained on open-source data sources such as forums and social media.

Generative AI models can only return results based on the information that they have been trained on. Therefore, models that have been trained on general information can only generate general responses.

How to get more accurate AI results

“The key to accurate generative AI suggestions is in the training data” explains Vytas Mulevicius from NLP specialists NetGeist.

Datasets provide the crucial foundation for NLP and generative AI models. Domain-specific datasets, such as clinical notes for medical NLP tools or financial reports for financial NLP models serve a dual purpose. Not only do they provide relevant information to train the model on, but they also provide a crucial benchmark for evaluating the accuracy and reliability of the generated results.

Building a specialist large language model

Using our example of the Porsche 911 996 above, an accurate AI diagnosis tool would require a specialised large language model trained specifically for this purpose. There are three steps to follow to create this Porsche-specific model, these being:

The pre-training phase
a) Collect a diverse dataset relevant to your model’s purpose

b) Clean the data to remove any formatting issues, irrelevant information or noise

c) Tokenize the data into a randomized string of numbers that a computer could understand

d) Train the model to be able to predict a word within a sequence using text from the cleaned dataset

Supervised instruction tuning
This stage involves a human labeler providing feedback to the model regarding its predictions and the targeted response, helping the model to begin to understand the instruction, as well as the model beginning to display recall and memory as well as retrieving knowledge based on the instruction model.

Reinforcement learning from human feedback
Known as RLHF, this helps to train models to output results that align with human interests and preferences, ensuring the results are helpful, honest and harmless.

For more detailed information on the large language model training process, check out this guide.

Closing remarks
So, can you trust AI to fix your car? As you would expect, the answer isn’t black and white. If you are looking for a general sense of what might be wrong with your car, off-the-shelf tools such as ChatGPT or Gemini offer a solid starting point.

If you require an AI tool that will be able to pinpoint specific issues, you will need a custom NLP model that has been trained on specialised information. Luckily, this isn’t as daunting as it sounds. LLM specialists such as NetGeist.ai provide custom NLP solutions that solve your unique textual challenges. If you are looking to be able to diagnose issues with specialist cars from just a few written prompts, contact NetGeist.ai.

Can You Trust AI to Fix Your Car?

The view from the mechanics

The ChatGPT response

Gemini response

Differences between the two AI responses

What does the expert say?

How to get more accurate AI results

Building a specialist large language model

News

Machine Learning Platform

Legal Information

Contact