AI Adoption | January 04, 2024

Using Large Language Models in Highly Regulated Industries to Drive Safety Improvements and Efficiency. 

When it comes to driving safety and efficiency improvements within large complex industrial environments, the age-old adage that ‘Information is Power’ could not be more relevant.

Many sites will have data silos containing hundreds of thousands of separate entries of events and safety inspections which have been built up over, in some cases, many decades. In most cases, significant information is contained within free text data which is challenging for automated processing by software using traditional techniques. The use of large language models has revolutionized cataloging and access to this invaluable source of information. Large language models, such as OpenAI’s GPT-4 have the capabilities to rapidly access massive amounts of data and respond to input prompts using self-supervised or semi-supervised learning methodologies. However, using this technology in highly regulated and sensitive industries such as the civil nuclear or defence sectors comes with some specific challenges which often are insurmountable, such as security concerns and innate biases.

Most free text information from industrial sites will contain a significant amount of unlabelled or uncategorized data, together with technical acronyms or plant specific information which often vary dependent upon the individual operator who completes the report. These factors can all further complicate the training, validation, and deployment of large language models. Artificial intelligence can be a very blunt instrument unless it is trained using the data sets which provide sufficient information to give the system a complete understanding of the context in which it is working. Secondly, data security concerns often stop these projects before they even begin. In highly regulated sectors, safety and security are the overriding priorities and exposing data sources to environments which are not under the direct control of an organization would either lead to prolonged discussions internally about due diligence and risk assessments or to the project being sidelined, never to see the light of day again.

While openly available large language models such as GPT-4 currently offer the most advanced performance, they rely on transferring data to servers operated by the model providers, potentially exposing sensitive information. Large language models are an extremely rapidly developing field, and a range of open-source models are increasingly offering performance comparable to the market leaders, but with the potential to circumvent security concerns and enable more focused fine tuning of the model to accomplish specific goals. Whilst open-source models may lack some of the more sophisticated functionality of their modern cousins such as GPT-4, due to the very nature of this specific user case and the fact that interactions will not be an ongoing conversation, this functionality is not essential.

Given the ability to use a combination of functional experience from the existing plant operating team, and the correct software development expertise, open-source large language models offer the capability to deliver purpose-built applications which can automatically analyze free text data. Increasingly, open-source models are becoming multi-modal, meaning they can assess images and other input information, as well as free text. Such capabilities will allow for automation of a wide range of data processing tasks which normally require human input.

Within industrial environments, maintenance records, safety event records and other aspects of site operations are routinely recorded in systems which rely on the use of free text entry. These records contain valuable information for monitoring site performance and workings, to improve safety, security, and efficiency. In addition, legacy sites may have historic records which contain valuable information which are currently only accessible through human review. Through enabling accurate automated review of these data sources in a secure environment which protects sensitive information, open-source large language models and multi-modal artificial intelligence models will deliver significant benefits to industry in the coming years.

Training and deploying models to review industrial site data is valuable, but to deliver real day-to-day benefit it is crucial to carefully consider how such tools are integrated into existing business processes. Ada Mode has been working with industrial pioneers to bring these unique solutions to life, incorporating a blend of industry expertise and artificial intelligence to ensure that technology is deployed in the right way.

Only by having a deep understanding of highly regulated operating industrial environments and deep-seated technical capabilities can the question, “what is a useful output from the model” be answered, and true value be derived. Once this has been achieved then large language models can be used in any complex or sensitive industrial operating environment to drive continuous safety and efficiency improvements.

If you would like to learn more about how artificial intelligence could help your organization, then please reach out to us at Get in touch (

Recent blog posts