At the Indere Food Research Institute, we approach the digitalization of HACCP and self-monitoring systems not merely as administrative modernization, but as an information processing challenge. The legal and professional essence of HACCP remains the same: businesses must operate HACCP-based procedures, monitor critical points, implement corrective actions, regularly verify operational effectiveness, and substantiate all of this with appropriate documentation. The Codex and EU hygiene regulations therefore do not merely require record-keeping, but continuous, demonstrable control. Consequently, digitalization is valuable if it not only documents this control faster but also interprets it better.
The main weakness of traditional paper-based or simply "form-digitalized" systems is that they convert data into records but fail to build a semantic layer upon them. A temperature deviation, a cleaning deficiency, or a note in a shift log might be recorded, but the system doesn't understand what risk pattern these elements collectively form. WHO materials on the digital food safety ecosystem specifically emphasize that data sources in the food chain are extremely diverse, and true value comes from connecting, mining, and interpreting different data streams. This is precisely our focus: for us, the self-monitoring system is not a data repository, but an interpretive layer.
From an IT perspective, this means we are building an LLM-based and embedding-based semantic model on top of structured HACCP data. Large language models are particularly useful for processing semi-structured and free-text fields: for deviation descriptions, shift manager notes, audit notes, supplier documents, or action logs, they can normalize, summarize, assign to risk categories, and semantically link raw text with other events. This does not operate as a "magic box" but within a controlled architecture, incorporating domain dictionaries, tag representations, a rule engine, and human validation. EFSA's AI roadmaps and pilot projects also point in this direction: positioning AI to support evidence management, terminology management, text summarization, clustering, and data integration, with a human-centric operation and expert collaboration.
Our development logic is therefore hybrid. One layer is deterministic: this is where classic HACCP rules, limit values, mandatory control points, and compliance logics run. The other layer is probabilistic and semantic: this is where the LLM and machine learning component operates, searching for patterns, prioritizing deviations, grouping events, and generating concise, decision-supporting summaries. The system thus not only indicates that a deviation occurred but also identifies how that event relates to other previous deviations, which CCP or prerequisite program it connects to, and what intervention sequence is warranted. This layer provides the true digital added value.
In this model, the function of self-monitoring also changes. The goal is no longer merely to retrieve what happened yesterday, but for the system to detect deteriorating trends early. For example, if minor deviations repeatedly appear around the same operational point on a production line, the LLM-based semantic layer can treat these as a unified problem, even if their descriptions differ linguistically. This transforms classic compliance into operational decision support: faster escalation, more targeted corrective actions, and better resource allocation.
It is important, however, that we do not interpret this technology as an autonomous decision-maker. The latest EFSA practices also show that LLMs are useful for literature screening and primary processing of complex information, but their output must be followed by expert validation and human judgment. At Indere, our goal is therefore not to replace the HACCP expert, but to build an intelligent decision support system that provides the expert with higher quality, more context-rich, and prioritized information. For us, this is the true meaning of digital HACCP.
András Tóth PhD


