Michael Lones has a direct message for developers reaching for AI shortcuts: just because you can doesn't mean you should.
Lones, a professor at Heriot-Watt University's School of Mathematical and Computer Sciences, published a paper in the journal Patterns arguing that incorporating generative AI into machine learning systems carries serious risks that are not being adequately weighed against the cost and efficiency gains developers hope to achieve. Those risks include cyber-attacks, data breaches, and bias against underrepresented groups.
The research arrives as a growing number of organizations across sectors have begun using large language models to design, build, and run machine learning pipelines. Machine learning itself is not new. Spam filters, product recommendation engines, and social media feeds have relied on it for decades. It also operates in higher-stakes settings, including assigning patients to drug trials and processing insurance claims. The more recent push is to layer generative AI on top of those existing systems, and that is where Lones sees danger.
His paper identifies four specific ways generative AI is currently being applied within machine learning workflows: as a component inside a pipeline, to design and write code for pipelines, to generate synthetic training data, and to interpret or analyze outputs. Each carries its own set of risks. When LLMs are used for more than one of these functions within the same system, those risks do not simply add together. They interact.
"If you have Gen AI working in a number of different ways within your machine learning workflows or system, then they can interact in unpredictable and hard to understand ways," Lones said.
The problem is compounded further when LLMs operate in what researchers call an agentic mode, meaning the model can autonomously use external tools to solve problems without a human approving each step. An agentic system working across multiple parts of a machine learning pipeline can make bad decisions faster and at greater scale than a human operator would catch in time.
One of the most fundamental concerns Lones raises is simple but consequential: LLMs make mistakes. They can fabricate information, reach flawed conclusions, and produce outputs that appear confident but are wrong. In a low-stakes consumer application, a hallucinated response is an annoyance. In a system determining who receives a medical treatment or how an insurance claim is resolved, the same error becomes something else entirely.
"Machine learning developers need to be aware of the risks of using Gen AI in machine learning and find a sensible balance between improvements in capability and the risks that might come with that," Lones said.
His advice to developers, particularly those working in sectors where outcomes directly affect people's health, finances, or legal standing, is to limit complexity. Using generative AI in one clearly defined role within a system is manageable. Stacking it across multiple functions, especially without robust human oversight at each stage, is where things begin to break down in ways that are difficult to trace, understand, or fix after the fact.
The paper does not argue against generative AI outright. Lones acknowledges the legitimate efficiency gains it can offer. The concern is the pace at which it is being integrated, often driven by cost-cutting pressures, without corresponding investment in understanding how the risks scale.
