欧盟GMP附录22,是全球首个针对制药行业人工智能(AI)应用的专项法规,标志着AI监管从“自由探索”进入“合规落地”时代。本文将围绕AI实施的八大关键要求展开进行解读。
1.明确具体的预期用途
条款原文
3.1 Intended use.
The intended use of a model and the specific tasks it is designed to assist or automate should be described in detail based on an in-depth knowledge of the process the model is integrated in. This should include a comprehensive characterisation of the data the model is intended to use as input and all common and rare variations; i.e. the input sample space. Any limitations and possible erroneous and biased inputs should be identified. A process subject matter expert (SME) should be responsible for the adequacy of the description, and it should be documented and approved before the start of acceptance testing.
3.2 Subgroups.
Where applicable, the input sample space should be divided into subgroups based on relevant characteristics. Subgroups may be defined by characteristics like the decision output (e.g. ‘accept’ or ‘reject’), process specific baseline characteristics (e.g. geographical site or equipment), specific characteristics in material or product, and characteristics specific to the task being automated (e.g. types and severity of defects).
3.3. Human-in-the-loop.
Where a model is used to give an input to a decision made by a human operator (human-in-the-loop), and where the effort to test such model has been diminished, the description of the intended use should include the responsibility of the operator. In this case, the training and consistent performance of the operator should be monitored like any other manual process.
预期用途
模型的预期用途及其旨在辅助或自动化的具体任务,应基于对模型所集成工艺的深入了解进行详细描述。这应包括对模型预期使用的输入数据及其所有常见和罕见变异进行全面表征,即定义输入样本空间。应识别任何限制以及可能的错误和有偏差的输入。工艺主题专家(SME)应负责描述的充分性,且该描述应在开始验收测试前记录和批准。
输入数据:类型(图像/文本/传感器数据)、格式、范围(如“温度数据范围15-30℃”)。
数据变异:常见变异(如正常生产波动)、罕见变异(如设备突发故障导致的数据异常)。
局限性:明确模型 “不能处理的情况”(如“当片剂表面有油污时,模型可能误判”)。
子组
在适用的情况下,输入样本空间应根据相关特征划分为子组。子组可根据决策输出、工艺特定基线特征、物料或产品的特定特征,以及自动化任务的特定特征来定义。
人机协同
当模型作为人工操作员决策的输入,并且模型的测试工作因此有所减少时,预期用途的描述应包含操作员的责任。在这种情况下,操作员的培训和持续表现应像任何其他手动过程一样受到监控。
原因:动态模型的持续学习特性无法保证生产过程的稳定性。
风险场景:若AI模型在生产中持续学习新数据(如自动调整灭菌温度阈值),可能因异常数据污染导致参数漂移。
监管逻辑:药品生产必须确保每一步骤的可重复性,静态模型才能满足GMP对工艺稳定性的要求。
2.验收标准
条款原文
4.1 Test metrics.
Suitable, case dependent test metrics, should be defined to measure the performance of the model according to the intended use. As an example, suitable test metrics for a model used to classify products (e.g. ‘accept’ or ‘reject’) may include, but may not be limited to, a confusion matrix, sensitivity, specificity, accuracy, precision and/or F1 score.
4.2. Acceptance criteria.
Acceptance criteria for the defined test metrics should be established by which the performance of the model should be considered acceptable for the intended use. The acceptance criteria may differ for specific subgroups within the intended use. A process subject matter expert (SME) should be responsible for the definition of the acceptance criteria, which should be documented and approved before the start of acceptance testing.
应根据预期用途定义合适且依赖具体情况的测试指标,以衡量模型的性能。
应为定义的测试指标建立验收标准,以此判断模型的性能是否符合预期用途。
3.数据要求
条款原文
5.1.Selection.
Test data should be representative of and expand the full sample space of the intended use. It should be stratified, include all subgroups, and reflect the limitations, complexity and all common and rare variations within the intended use of the model. The criteria and rationale for selection of test data should be documented.
5.2. Sufficient in size.
The test dataset, and any of its subgroups, should be sufficient in size to calculate the test metrics with adequate statistical confidence.
5.3. Labelling.
The labelling of test data should be verified following a process that ensures a very high degree of correctness. This may include independent verification by multiple experts, validated equipment or laboratory tests.
5.4. Pre-processing.
Any pre-processing of the test data, e.g. transformation, normalisation, or standardisation, should be pre-specified and a rationale should be provided, that it represents intended use conditions.
5.5. Exclusion.
Any cleaning or exclusion of test data should be documented and fully justified.
5.6. Data generation.
Generation of test data or labels, e.g. by means of generative AI, is not recommended and any use hereof should be fully justified.
代表性:必须代表并扩展预期用途的完整样本空间。测试数据要能反映模型实际运行中可能遇到的各种情况。例如,在药品包装尺寸检测中,测试数据需涵盖不同批次、不同供应商提供的包装材料,以及可能出现的尺寸偏差范围。
分层:包括所有子组,反映预期用途内的所有复杂性和变异。将输入样本空间按不同特征分层,从各层中独立抽样。如在药品稳定性测试中,可按温度(高温、常温、低温)、湿度(高湿、中湿、低湿)等因素分层,分别抽取测试数据,确保模型在各种环境条件下的性能都得到验证。
充足的规模:足以计算具有充分统计置信度的测试指标。
高质量标注:标注人员需经过培训,确保标注一致性。对于复杂标注任务,可采用多人交叉标注、设置标注审核环节等方式,提高标注质量。例如,在药品图像标注中,明确规定不同缺陷类型的标注符号和规则,标注完成后由资深专家审核。
数据清洗:记录数据清洗过程中删除或修正的数据点,以及清洗依据。如在生产设备传感器数据清洗中,发现某时间段数据异常波动,经排查是传感器故障所致,需记录删除该部分数据的时间范围和原因。
数据生成:不推荐使用生成式AI生成测试数据或标签,任何此类使用必须充分论证。
4.测试数据独立性
条款原文
6.1.Independence.
Effective measures consisting of technical and/or procedural controls should be implemented to ensure the independency of test data, i.e. that data which will be used to test a model, is not used during development, training or validation of the model. This may be by capturing test data only after completion of training and validation, or by splitting test data from a complete pool of data before training has started.
6.2. Data split.
If test data is split from a complete pool of data before training of the model, it is essential that employees involved in the development and training of the model have never had access to the test data. The test data should be protected by access control and audit trail functionality logging accesses and changes to these. There should be no copies of test data outside this repository.
6.3. Identification.
It should be recorded which data has been used for testing, when and how many times.
6.4. Physical objects.
When test data originates from physical objects, it should be ensured, that the objects used for the final test of the model have not previously been used to train or validate the model, unless features are independent.
6.5. Staff independency.
Effective procedural and/or technical controls should be implemented to prevent staff members who have had access to test data from being involved in training and validation of the same model. In organisations where it is impossible to maintain this independency, a staff member who might have had access to test data for a model, should only have access to training and validation of the same model when working together (in pair) with a colleague who has not had this access (4-eyes principle).
必须实施有效的技术和/或程序控制,确保测试数据的独立性。
测试数据不得用于模型的开发、训练或验证。
如果从完整数据池中分离测试数据,必须确保开发人员从未访问过测试数据。
测试数据应受访问控制和审计追踪保护。
当无法保持人员独立性时,必须实施四眼原则(两人一组)。
5.测试执行
条款原文
7.1.Fit for intended use.
The test should ensure that a model is fit for intended use and is ‘generalising well’, i.e. that the model has a satisfactory performance with new data from the intended use. This includes detecting possible over- or underfitting of the model to the training data.
7.2. Test plan.
Before the test is initiated, a test plan should be prepared and approved. It should contain a summary of the intended use, the pre-defined metrics and acceptance criteria, a reference to the test data, a test script including a description of all steps necessary to conduct the test, and a description of how to calculate the test metrics. A process subject matter expert (SME) should be involved in developing the plan.
7.3. Deviation.
Any deviation from the test plan, failure to meet acceptance criteria, or omission to use all test data should be documented, investigated, and fully justified.
7.4. Test documentation.
All test documentation should be retained along with the description of the intended use, the characterisation of test data, the actual test data, and where relevant, physical test objects. In addition, documentation for access control to test data and related audit trail records, should be retained similarly to other GMP documentation.
模型测试需保障泛化性,重点检测过拟合与欠拟合。测试前须经工艺主题专家参与制定并批准测试计划,明确预期用途、指标、测试数据等核心内容。偏离计划、未达标准等情况需记录论证,相关文档及访问控制、审计追踪按GMP要求留存。
6.模型可解释性
条款原文
8.1.Feature attribution.
During testing of models used in critical GMP applications, systems should capture and record the features in the test data that have contributed to a particular classification or decision (e.g. rejection). Where applicable, techniques like feature attribution (e.g. SHAP values or LIME) or visual tools like heat maps should be used to highlight key factors contributing to the outcome.
8.2. Feature justification.
In order to ensure that a model is making decisions based on relevant and appropriate features and based on risk, a review of these features should be part of the process for approval of test results.
在关键GMP应用中使用的模型测试期间,系统应捕获并记录测试数据中导致特定分类或决策的特征。在适用情况下,应使用特征归因技术或可视化工具来突出导致结果的关键因素。基于风险,对这些特征的审查应作为测试结果批准流程的一部分。
特征捕获:以AI药品质量检测模型为例,当模型判断某批药品不合格时,系统需自动记录影响该决策的关键数据特征(例如,图片中某一区域的异常颜色、形状或纹理)导致了这个“不合格”的判断。
技术工具:SHAP值分析,可量化每个特征对模型决策的贡献;LIME(Local Interpretable Model-agnostic Explanations)则通过局部近似模型,解释模型在特定数据点上的决策依据。例如,使用SHAP值分析某药品疗效预测模型,可清晰看到患者年龄、性别、用药剂量等因素对预测结果的贡献大小。
SME审查流程:建立SME审查模板,明确审查内容和标准。SME在审查时,需对比关键特征与药品生产工艺、质量标准的关联性,判断特征合理性。
7.置信度机制
条款原文
9.1.Confidence score.
When testing a model used to predict or classify data, the system should, where applicable, log the confidence score of the model for each prediction or classification outcome.
9.2. Threshold.
Models used to predict or classify data should have an appropriate threshold setting to ensure predictions or classifications are made only when suitable. If the confidence score is very low, it should be considered whether the model should flag the outcome as ‘undecided’, rather than making potentially unreliable predictions or classifications.
系统应记录每次预测的置信度评分。应设置适当的阈值,例如:置信度<70% → 自动搁置;置信度70-90% → 强制人工复核,如果置信度分数非常低,应标记为“未决”,避免误判。
8.运行控制阶段
10.1. Change control.
A tested model, the system it is implemented in, and the whole process it is automating or assisting should be put under change control before it is deployed in operation. Any change to the model itself, the system, or the process in which it is used, including any change to physical objects the model is using as input, should be documented and evaluated to determine if the model needs to be retested. Any decision not to conduct such retest should be fully justified.
10.2. Configuration control.
A tested model should be put under configuration control before being deployed in operation, and effective measures should be used to detect any unauthorised change.
10.3. System performance monitoring.
The performance of a model as defined by its metrics should be regularly monitored to detect any changes in the computerised system (e.g.deterioration or change of a lighting condition).
10.4. Input sample space monitoring.
It should be regularly monitored whether the input data are still within the model sample space and intended use. Metrics should be defined for monitoring any drift in the input data.
10.5. Human review.
When a model is used to give an input to a decision made by a human operator (human-in-the-loop), and where the effort to test such model has been diminished, records should be kept from this process. Depending on the criticality of the process and the level of testing of the model, this may imply a consistent review and/or test of every output from the model, according to a procedure.
模型及其系统在部署前必须纳入变更控制、配置管理。模型参数调整视为重大变更,需等同于工艺变更申报。需定期监测模型性能,评估环境变化是否影响输出。定期分析输入数据是否出现“漂移”,超出模型样本空间要报警。人工审查机制适用人机协同流程,必要时逐条复核模型输出。
法规明确了AI技术在制药生产中的合规要求,不仅帮助企业更稳妥地推进数字化转型、降低监管不确定性带来的风险,也促进了国际监管协同,为制药行业带来深刻变革。
关于森松生命科技
森松生命科技是森松国际控股有限公司(森松国际,股票代码:2155.HK)的重要业务板块之一,主要由上海森松制药设备工程有限公司、森松(苏州)生命科技有限公司、上海森松生物科技有限公司、上海森众生物技术有限公司、上海森纮科技有限公司、瑞士比欧生物工程公司、瑞典森松法玛度等公司及其附属公司组成。
我们专注于提供制药、生物制药、医美、快速消费品(含化妆品、食品、保健品等)、数据中心等领域的核心设备、工艺系统和数智化整体工厂解决方案及相应服务。 公司拥有一支由工艺研发、工程设计、高端制造、验证咨询、生产执行、数智运维等多领域资深专家的精英团队。团队成员在制药、生物制药、医美、快消、数据中心等行业和领域拥有丰富的经验,熟悉不同产品的特性及工艺流程,能够针对客户的需求,从概念设计阶段起提供定制化的工艺解决方案,满足客户的个性化需求。
森松生命科技已构建起全球战略布局,在全球不同国家和地区建立先进研发、设计中心和制造基地的同时,形成了覆盖欧美、亚太及新兴市场的完善服务网络。公司国际化专业团队已成功为40多个国家和地区提供定制化解决方案,在全球项目执行方面积累了丰富经验。
作为掌握核心工艺技术、模块化建造及数智化工厂技术的跨国企业,森松生命科技致力于满足全球制药、生物制药、医美、快消、数据中心等行业和领域对生产装备的需求。公司通过不断创新和优化,助力国内企业加速实现核心和高端装备的国产替代战略布局,推动行业发展。同时,公司积极拓展海外市场,深化“全球化”业务开拓战略,为全球生命科学及相关行业贡献森松力量。
前瞻性声明
本新闻稿所发布的信息中可能会包含某些前瞻性表述。这些表述本质上具有相当风险和不确定性。在使用“预期”、“相信”、“预测”、“期望”、“打算”及其他类似词语进行表述时,凡与本公司有关的,目的均是要指明其属前瞻性表述。本公司并无义务不断地更新这些预测性陈述。
这些前瞻性表述乃基于本公司管理层在做出表述时对未来事务的现有看法、假设、期望、估计、预测和理解。这些表述并非对未来发展的保证,会受到风险、不确性及其他因素的影响,有些乃超出本公司的控制范围,难以预计。因此,受我们的业务、竞争环境、政治、经济、法律和社会情况的未来变化及发展的影响,实际结果可能会与前瞻性表述所含资料有较大差别。