OCI Runtime Specification V1.2.1
Open Container Initiative Runtime Specification V1.2.1
The Open Container Initiative develops specifications for standards on Operating System process and application containers.
开放容器倡议(OCI)制定了关于标准操作系统进程和应用容器化化的标准规范。
1 Abstract(摘要)
The Open Container Initiative Runtime Specification aims to specify the configuration, execution environment, and lifecycle of a container.
开放容器倡议运行时规范目标是规范配置,执行环境以及容器的声明周期。
A container’s configuration is specified as the config.json
for the supported platforms and details the fields that enable the creation of a container. The execution environment is specified to ensure that applications running inside a container have a consistent environment between runtimes along with common actions defined for the container’s lifecycle.
一个容器的配置内容在受支持的平台上通过config.json文件制定,该配置文件中声明了支持容器创建的各个字段。执行环境的规范保证了容器内运行的应用程序在不同运行时之间具有一致的环境,同时还定义了容器生命周期中的通用操作。
2 Platforms(平台)
Platforms defined by this specification are:
规范中定义了如下的平台:
linux
: runtime.md, config.md, features.md, config-linux.md, runtime-linux.md, and features-linux.md.solaris
: runtime.md, config.md, features.md, and config-solaris.md.windows
: runtime.md, config.md, features.md, and config-windows.md.vm
: runtime.md, config.md, features.md, and config-vm.md.zos
: runtime.md, config.md, features.md, and config-zos.md.
3 Notational Conventions (符号约定)
The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “NOT RECOMMENDED”, “MAY”, and “OPTIONAL” are to be interpreted as described in RFC 2119.
关键词 “MUST” (必须), “MUST NOT” (禁止), “REQUIRED” (必要的), “SHALL” , “SHALL NOT”, “SHOULD”(应该), “SHOULD NOT”(不应该), “RECOMMENDED” (建议), “NOT RECOMMENDED” (不建议), “MAY” (可能), “OPTIONAL” (可选的) 将按照 RFC 2119 中的描述进行解释。(参见:RFC2119:表示要求的动词)
The key words “unspecified”, “undefined”, and “implementation-defined” are to be interpreted as described in the rationale for the C99 standard.
关键词 “未指明”、“未定义” 和 “实现定义” 应按照 C99 标准的基本原理中所描述的方式进行解释。
An implementation is not compliant for a given CPU architecture if it fails to satisfy one or more of the MUST, REQUIRED, or SHALL requirements for the platforms it implements. An implementation is compliant for a given CPU architecture if it satisfies all the MUST, REQUIRED, and SHALL requirements for the platforms it implements.
若某个实现方案未能满足其所实现协议中一项或多项 MUST、MUST NOT、REQUIRED、SHALL 或 SHALL NOT 要求,则该实现方案不具备合规性。若某个实现方案满足了其实现协议中所有 MUST、MUST NOT、REQUIRED、SHALL 和 SHALL NOT 要求,则该实现方案具备合规性。
4 The 5 principles of Standard Containers (标准容器的5大准则)
Define a unit of software delivery called a Standard Container. The goal of a Standard Container is to encapsulate a software component and all its dependencies in a format that is self-describing and portable, so that any compliant runtime can run it without extra dependencies, regardless of the underlying machine and the contents of the container.
定义一种名为标准容器交付单元。标准容器的目标是以自描述且可移植的格式封装软件组件及其依赖项,以便任何兼容的运行时环境无序额外的依赖均可运行,而不受到底层机器和容器内容影响。
The specification for Standard Containers defines:
规范对于标准容器的定义:
configuration file formats
配置文件格式化
a set of standard operations
标准的操作集合
an execution environment.
可执行的环境
A great analogy for this is the physical shipping container used by the transportation industry. Shipping containers are a fundamental unit of delivery, they can be lifted, stacked, locked, loaded, unloaded and labelled. Irrespective of their contents, by standardizing the container itself it allowed for a consistent, more streamlined and efficient set of processes to be defined. For software Standard Containers offer similar functionality by being the fundamental, standardized, unit of delivery for a software package.
举一个比较形象的类比——运输行业中的实体货运集装箱。货运集装箱的基础是运输单元。他们可以被吊装,堆叠,锁定,装载,卸载,贴标签。无论集装箱中填充了什么货物,通过对集装箱本身的标准化,可以定义一套一致,精简,高效的流程。对于软件而言,容器标准也提供了类似 的标准,它包括了基础,标准化的交付单元。
4.1 Standard operations(标准操作)
Standard Containers define a set of STANDARD OPERATIONS. They can be created, started, and stopped using standard container tools; copied and snapshotted using standard filesystem tools; and downloaded and uploaded using standard network tools.
标准的容器定义了一套标准操作。它们可以使用标准的容器工具进行创建,启动或者是停止;可以使用标准文件系统工具进行复制和快照;还可以使用标准的网络工具进行上传和下载。
4.2 Content-agnostic(内容无关性)
Standard Containers are CONTENT-AGNOSTIC: all standard operations have the same effect regardless of the contents. They are started in the same way whether they contain a postgres database, a php application with its dependencies and application server, or Java build artifacts.
标准容器具有内容无关性;所有的标准操作无视内容都有相同的结果。无论容器中是Postgres 数据库、带有依赖项和应用服务器的 PHP 应用程序,还是 Java 构建制品,它们的启动方式都是相同的。
4.3 Infrastructure-agnostic(基础设施无关性)
Standard Containers are INFRASTRUCTURE-AGNOSTIC: they can be run in any OCI supported infrastructure. For example, a standard container can be bundled on a laptop, uploaded to cloud storage, downloaded, run and snapshotted by a build server at a fiber hotel in Virginia, uploaded to 10 staging servers in a home-made private cloud cluster, then sent to 30 production instances across 3 public cloud regions.
标准容器具有基础设施无关性,他们可以在任何支持开放容器倡议(OCI)的基础设施中运行。举个例子,一个标准的容器可以在你的个人笔记本上打包,上传到云端存储,随后由弗吉尼亚州光纤数据中心的构建服务器下载、运行并创建快照,再上传至自建私有云集群中的 10 台 staging 服务器,最后分发到 3 个公共云区域的 30 个生产实例中。
4.4 Designed for automation(为自动化设计)
Standard Containers are DESIGNED FOR AUTOMATION: because they offer the same standard operations regardless of content and infrastructure, Standard Containers, are extremely well-suited for automation. In fact, you could say automation is their secret weapon.
标准容器为自动化设计:无论基础设施和内容如何,标准容器都提供了相同的操作,综上所述,标准容器非常适合自动化。事实上,你也可以认为容器就是自动化的秘密武器。
Many things that once required time-consuming and error-prone human effort can now be programmed. Before Standard Containers, by the time a software component ran in production, it had been individually built, configured, bundled, documented, patched, vendored, templated, tweaked and instrumented by 10 different people on 10 different computers. Builds failed, libraries conflicted, mirrors crashed, post-it notes were lost, logs were misplaced, cluster updates were half-broken. The process was slow, inefficient and cost a fortune - and was entirely different depending on the language and infrastructure provider.
许多曾经那些耗时的或者是人工容易出错的工作现在可以通过编程实现。在标准容器出现之前,当一个软件被组装成最终形态在生产环境中运行时,它已经经过了 10 个人在 10 台不同的计算机上分别进行构建、配置、打包、文档编写、补丁更新、依赖管理、模板化、微调及监控配置等一系列操作。构建失败、库冲突、镜像崩溃、便签丢失、日志放错位置、集群更新半途而废等问题层出不穷。整个过程缓慢、低效且成本高昂,而且会因编程语言和基础设施提供商的不同而截然不同。
4.5 Industrial-grade delivery(工业级交付)
Standard Containers make INDUSTRIAL-GRADE DELIVERY of software a reality. Leveraging all of the properties listed above, Standard Containers are enabling large and small enterprises to streamline and automate their software delivery pipelines. Whether it is in-house devOps flows, or external customer-based software delivery mechanisms, Standard Containers are changing the way the community thinks about software packaging and delivery.
标准容器让软件实现工业级交付变成现实。综上所述,凭借上述特性,标准容器正在助力大型和小型企业使它们的软件交付流水线。无论是在devOps工作流亦或者是基于外部的软件交付机制。标准化容器正在改变社区对于软件的交付和打包的思考方式。
5 Filesystem Bundle(文件系统捆绑包)
5.1 Container Format (容器格式化)
This section defines a format for encoding a container as a filesystem bundle - a set of files organized in a certain way, and containing all the necessary data and metadata for any compliant runtime to perform all standard operations against it. See also MacOS application bundles for a similar use of the term bundle.
当前章节定义了将容器编码为文件系统捆绑包(filesystem bundle)的格式——这是一组按特定方式组织的文件集合,包含了所有的必要的数据和元数据信息,供任何兼容的运行时环境对容器执行所有标准操作。关于 “bundle” 一词的类似用法,可参考 macOS 应用程序捆绑包。
The definition of a bundle is only concerned with how a container, and its configuration data, are stored on a local filesystem so that it can be consumed by a compliant runtime.
捆绑包的定义仅仅关注容器及其配置数据在本地文件系统的存储方式,以便兼容的容器运行时能够使用这些数据。
A Standard Container bundle contains all the information needed to load and run a container. This includes the following artifacts:
一个标准的容器捆绑包中包含一个可以被加载并且成功运行容器的所有信息。下面是包含的内容:
config.json
: contains configuration data. This REQUIRED file MUST reside in the root of the bundle directory and MUST be namedconfig.json
. Seeconfig.json
for more details.config.json: 包含配数据。这个文件是必须位于捆绑包的根目录下,且文件必须命名为config.json。更多详情请参见config.json的细节。
container’s root filesystem: the directory referenced by
root.path
, if that property is set inconfig.json
.container’s root filesystem: 如果当前属性在
config.json
中被设值,当前目录引用自root.path
When supplied, while these artifacts MUST all be present in a single directory on the local filesystem, that directory itself is not part of the bundle. In other words, a tar archive of a bundle will have these artifacts at the root of the archive, not nested within a top-level directory.
当上述的内容被提供时,尽管它们都必须存在于本地文件系统的单个目录中,但该目录并非捆绑包的一部分。换句话说,捆绑包的tar归档文件中,上面提到的内容会归置于归档的根目录下,而且嵌套在顶层目录中。
6 Runtime and Lifecycle(运行时和生命周期)
6.1 Scope of a Container(容器的范围)
The entity using a runtime to create a container MUST be able to use the operations defined in this specification against that same container. Whether other entities using the same, or other, instance of the runtime can see that container is out of scope of this specification.
使用运行时创建一个容器实体必须遵循本规范所定义的操作。至于使用同一运行时实例或其他运行时实例的其他实体是否能看到该容器,则不在本规范的范围内。
6.2 State (状态)
The state of a container includes the following properties:
下面是容器的状态的属性定义:
ociVersion
(string, REQUIRED) is version of the Open Container Initiative Runtime Specification with which the state complies.ociVersion
(字符型,必要的)开放容器倡议容器运行时规范版本。id
(string, REQUIRED) is the container’s ID. This MUST be unique across all containers on this host. There is no requirement that it be unique across hosts.id
(字符型,必要的)容器的标识符,该标识符在宿主机上所有的容器中必须唯一,在不同宿主机之间无需保持唯一。status
(string, REQUIRED) is the runtime state of the container. The value MAY be one of:status
(字符型,必要的)容器运行时的状态,有下面几个枚举值:creating
: the container is being created (step 2 in the lifecycle)creating
容器正在被创建中(6.4 lifecycle的第二步)created
: the runtime has finished the create operation (after step 2 in the lifecycle), and the container process has neither exited nor executed the user-specified programcreated
:运行时已完成创建操作(6.4 lifecycle的第 2 步之后),且容器进程既未退出也未执行用户指定的程序running
: the container process has executed the user-specified program but has not exited (after step 8 in the lifecycle)running
:容器进程已执行用户指定的程序但尚未退出(6.4 lifecycle的第 8 步之后)stopped
: the container process has exited (step 10 in the lifecycle)stopped
: 容器进程已经退出(6.4 lifecycle的第 10 步)
Additional values MAY be defined by the runtime, however, they MUST be used to represent new runtime states not defined above.
运行时可以定义额外的值,不过,这些值必须用于表示上述未定义的新运行时状态。
pid
(int, REQUIRED whenstatus
iscreated
orrunning
on Linux, OPTIONAL on other platforms) is the ID of the container process. For hooks executed in the runtime namespace, it is the pid as seen by the runtime. For hooks executed in the container namespace, it is the pid as seen by the container.pid
(整型,在 Linux 系统上状态为 created 或 running 时为必填项,在其他平台上为可选项)是容器进程的 ID。对于在运行时命名空间中执行的钩子,它是运行时所看到的进程 ID;对于在容器命名空间中执行的钩子,它是容器所看到的进程 ID。bundle
(string, REQUIRED) is the absolute path to the container’s bundle directory. This is provided so that consumers can find the container’s configuration and root filesystem on the host.bundle
(字符型,必要的)是容器捆绑包目录的绝对路径。提供该路径是为了让使用者能够在主机上找到容器的配置文件和根文件系统。annotations
(map, OPTIONAL) contains the list of annotations associated with the container. If no annotations were provided then this property MAY either be absent or an empty map.annotations
(字典型,可选)包含与容器关联的注解列表。如果未提供任何注解,此属性可以不存在,也可以是一个空映射。
The state MAY include additional properties.
状态中可以包含额外的属性
When serialized in JSON, the format MUST adhere to the JSON Schema schema/state-schema.json
.
当以 JSON 格式序列化时,其格式必须遵循 JSON 模式文件 schema/state-schema.json 的要求。
See Query State for information on retrieving the state of a container.
有关获取容器状态的信息,请参见 “查询状态(Query State)” 部分。
6.3 Example (示例)
1 | { |
6.4 Lifecycle(生命周期)
The lifecycle describes the timeline of events that happen from when a container is created to when it ceases to exist.
生命周期描述了从容器创建到终止过程中发生的事件时间线。
OCI compliant runtime’s
create
command is invoked with a reference to the location of the bundle and a unique identifier.调用符合OCI规范的容器运行时的创建命令时,需要传入捆绑包的位置引用和一个唯一标识符。
The container’s runtime environment MUST be created according to the configuration in
config.json
. If the runtime is unable to create the environment specified in theconfig.json
, it MUST generate an error. While the resources requested in theconfig.json
MUST be created, the user-specified program (fromprocess
) MUST NOT be run at this time. Any updates toconfig.json
after this step MUST NOT affect the container.容器运行时huanj必须根据
config.json
中的配置创建。如果运行时无法创建config.json
中指定的环境,那么运行时必须产生一个错误。尽管config.json
中请求的资源必须被创建,但此时不得运行用户指定的程序(来自 process 配置)。此步骤之后对 config.json 的任何更新均不得影响该容器。The
prestart
hooks MUST be invoked by the runtime. If anyprestart
hook fails, the runtime MUST generate an error, stop the container, and continue the lifecycle at step 12.运行时必须调用 prestart 钩子。如果任何 prestart 钩子执行失败,运行时必须生成错误、停止容器,并从6.4 lifecycle的第 12 步继续执行。
The
createRuntime
hooks MUST be invoked by the runtime. If anycreateRuntime
hook fails, the runtime MUST generate an error, stop the container, and continue the lifecycle at step 12.运行时必须调用 createRuntime 钩子。如果任何 createRuntime 钩子执行失败,运行时必须生成错误、停止容器,并从6.4 lifecycle的第 12 步继续执行。
The
createContainer
hooks MUST be invoked by the runtime. If anycreateContainer
hook fails, the runtime MUST generate an error, stop the container, and continue the lifecycle at step 12.运行时必须调用 createContainer 钩子。如果任何 createContainer 钩子执行失败,运行时必须生成错误、停止容器,并从6.4 lifecycle第 12 步继续执行。
Runtime’s
start
command is invoked with the unique identifier of the container.调用运行时的启动命令时,需传入容器的唯一标识符。
The
startContainer
hooks MUST be invoked by the runtime. If anystartContainer
hook fails, the runtime MUST generate an error, stop the container, and continue the lifecycle at step 12.运行时必须调用 startContainer 钩子。如果任何 startContainer 钩子执行失败,运行时必须生成错误、停止容器,并从6.4 lifecycle的第 12 步继续执行。
The runtime MUST run the user-specified program, as specified by
process
.运行时必须按照 process 配置中指定的内容,运行用户指定的程序。
The
poststart
hooks MUST be invoked by the runtime. If anypoststart
hook fails, the runtime MUST log a warning, but the remaining hooks and lifecycle continue as if the hook had succeeded.运行时必须调用 poststart 钩子。如果任何 poststart 钩子执行失败,运行时必须记录一条警告,但其余钩子和生命周期仍会继续执行,如同该钩子执行成功一样。
The container process exits. This MAY happen due to erroring out, exiting, crashing or the runtime’s
kill
operation being invoked.容器进程退出。这可能是由于出错、正常退出、崩溃或运行时的终止(kill)操作被调用所致。
Runtime’s
delete
command is invoked with the unique identifier of the container.调用运行时的删除命令时,需传入容器的唯一标识符。
The container MUST be destroyed by undoing the steps performed during create phase (step 2).
容器必须通过撤销在创建阶段(第 2 步)执行的操作来销毁。
The
poststop
hooks MUST be invoked by the runtime. If anypoststop
hook fails, the runtime MUST log a warning, but the remaining hooks and lifecycle continue as if the hook had succeeded.运行时必须调用 poststop 钩子。如果任何 poststop 钩子执行失败,运行时必须记录一条警告,但其余钩子和生命周期仍会继续执行,如同该钩子执行成功一样。
6.5 Errors(错误)
In cases where the specified operation generates an error, this specification does not mandate how, or even if, that error is returned or exposed to the user of an implementation. Unless otherwise stated, generating an error MUST leave the state of the environment as if the operation were never attempted - modulo any possible trivial ancillary changes such as logging.
在指定操作产生错误的情况下,本规范并未强制规定该错误应以何种方式返回或向实现的使用者暴露,甚至未强制要求必须返回或暴露该错误。除非另有说明,否则产生错误时必须保持环境状态不变,就像从未尝试过该操作一样 —— 但不包括任何可能的无关辅助性变更,例如日志记录。
6.6 Warnings(警告)
In cases where the specified operation logs a warning, this specification does not mandate how, or even if, that warning is returned or exposed to the user of an implementation. Unless otherwise stated, logging a warning does not change the flow of the operation; it MUST continue as if the warning had not been logged.
在指定操作记录警告的情况下,本规范并未强制规定该警告应以何种方式返回或向实现的使用者暴露,甚至未强制要求必须返回或暴露该警告。除非另有说明,否则记录警告不会改变操作的流程;操作必须继续执行,如同未记录该警告一样。
6.7 Operations(操作)
Unless otherwise stated, runtimes MUST support the following operations.
除非另有说明,否则运行时必须支持以下操作。
Note: these operations are not specifying any command-line APIs, and the parameters are inputs for general operations.
这些操作并未规定任何命令行 API,且其中的参数均为通用操作的输入项。
6.7.1 Query State(查询状态)
1 | state <container-id> |
This operation MUST generate an error if it is not provided the ID of a container. Attempting to query a container that does not exist MUST generate an error. This operation MUST return the state of a container as specified in the State section.
若未提供容器的 ID,此操作必须生成错误。尝试查询不存在的容器必须生成错误。此操作必须按照 “状态(State)” 部分的规定返回容器的状态。
6.7.2 Create(创建)
1 | create <container-id> <path-to-bundle> |
This operation MUST generate an error if it is not provided a path to the bundle and the container ID to associate with the container. If the ID provided is not unique across all containers within the scope of the runtime, or is not valid in any other way, the implementation MUST generate an error and a new container MUST NOT be created. This operation MUST create a new container.
若未提供捆绑包的路径以及要与容器关联的容器 ID,此操作必须生成错误。如果所提供的 ID 在运行时范围内的所有容器中不唯一,或者在其他任何方面无效,实现必须生成错误,且不得创建新容器。此操作必须创建一个新容器。
All of the properties configured in config.json
except for process
MUST be applied. process.args
MUST NOT be applied until triggered by the start
operation. The remaining process
properties MAY be applied by this operation. If the runtime cannot apply a property as specified in the configuration, it MUST generate an error and a new container MUST NOT be created.
config.json 中配置的所有属性(process 除外)都必须被应用。process.args 则不得被应用,直至由启动(start)操作触发。其余的 process 属性可以通过此操作应用。如果运行时无法按照配置中的规定应用某个属性,它必须生成错误,且不得创建新容器。
The runtime MAY validate config.json
against this spec, either generically or with respect to the local system capabilities, before creating the container (step 2). Runtime callers who are interested in pre-create validation can run bundle-validation tools before invoking the create operation.
运行时可以在创建容器之前(第 2 步),根据本规范对 config.json 进行验证 —— 既可进行通用验证,也可结合本地系统能力进行验证。希望在创建前进行验证的运行时调用方,可以在调用创建(create)操作之前运行捆绑包验证工具。
Any changes made to the config.json
file after this operation will not have an effect on the container.
此操作之后对 config.json 文件所做的任何修改,都不会对容器产生影响。
6.7.3 Start(启动)
1 | start <container-id> |
This operation MUST generate an error if it is not provided the container ID. Attempting to start
a container that is not created
MUST have no effect on the container and MUST generate an error. This operation MUST run the user-specified program as specified by process
. This operation MUST generate an error if process
was not set.
若未提供容器 ID,此操作必须生成错误。尝试启动未创建的容器时,该操作不得对容器产生任何影响,且必须生成错误。此操作必须按照 process 配置中指定的内容运行用户指定的程序。如果未设置 process,此操作必须生成错误。
6.7.3 Kill (终止)
1 | kill <container-id> <signal> |
This operation MUST generate an error if it is not provided the container ID. Attempting to send a signal to a container that is neither created
nor running
MUST have no effect on the container and MUST generate an error. This operation MUST send the specified signal to the container process.
若未提供容器 ID,此操作必须生成错误。尝试向既未创建也未运行的容器发送信号时,该操作不得对容器产生任何影响,且必须生成错误。此操作必须向容器进程发送指定的信号。
6.7.4 Delete(删除)
1 | delete <container-id> |
This operation MUST generate an error if it is not provided the container ID. Attempting to delete
a container that is not stopped
MUST have no effect on the container and MUST generate an error. Deleting a container MUST delete the resources that were created during the create
step. Note that resources associated with the container, but not created by this container, MUST NOT be deleted. Once a container is deleted its ID MAY be used by a subsequent container.
*若未提供容器 ID,此操作必须生成错误。尝试删除未停止的容器时,该操作不得对容器产生任何影响,且必须生成错误。删除容器时,必须删除在创建步骤中生成的资源。请注意,与容器相关联但并非由该容器创建的资源不得被删除。容器一旦被删除,其 ID可以被后续的容器使用。*
6.8 Hooks(钩子)
Many of the operations specified in this specification have “hooks” that allow for additional actions to be taken before or after each operation. See runtime configuration for hooks for more information.
本规范中规定的许多操作都包含 “钩子(hooks)”,这些钩子允许在每个操作之前或之后执行额外的操作。有关钩子的更多信息,请参见运行时配置中的钩子部分。
7 Linux Runtime (Linux运行时)
7.1 File descriptors (文件描述符)
By default, only the stdin
, stdout
and stderr
file descriptors are kept open for the application by the runtime. The runtime MAY pass additional file descriptors to the application to support features such as socket activation. Some of the file descriptors MAY be redirected to /dev/null
even though they are open.
默认情况下,运行时只为应用程序保留 stdin、stdout 和 stderr 这三个文件描述符处于打开状态。运行时可以向应用程序传递额外的文件描述符,以支持诸如套接字激活(socket activation)等功能。某些文件描述符即便处于打开状态,也可以被重定向至 /dev/null。
7.2 Dev symbolic links(设备符号链接)
While creating the container (step 2 in the lifecycle), runtimes MUST create the following symlinks if the source file exists after processing mounts
:
在创建容器期间(6.4 lifecycle中的第 2 步),如果源文件在处理挂载后存在,运行时必须创建以下符号链接:
Source | Destination |
---|---|
/proc/self/fd | /dev/fd |
/proc/self/fd/0 | /dev/stdin |
/proc/self/fd/1 | /dev/stdout |
/proc/self/fd/2 | /dev/stderr |
8 Configuration (配置)
This configuration file contains metadata necessary to implement standard operations against the container. This includes the process to run, environment variables to inject, sandboxing features to use, etc.
此配置文件包含对容器执行标准操作所需的元数据。这包括要运行的进程、要注入的环境变量、要使用的沙箱功能等。
The canonical schema is defined in this document, but there is a JSON Schema in schema/config-schema.json
and Go bindings in specs-go/config.go
. Platform-specific configuration schema are defined in the platform-specific documents linked below. For properties that are only defined for some platforms, the Go property has a platform
tag listing those protocols (e.g. platform:"linux,solaris"
).
规范的模式定义于本文档中,但在 schema/config-schema.json
中提供了 JSON 模式,在 specs-go/config.go
中提供了 Go 语言绑定。特定平台的配置模式定义于下方链接的特定平台文档中。对于仅为部分平台定义的属性,Go 语言中的对应属性会带有平台标签,列出适用的平台(例如 platform:"linux,solaris"
)。
Below is a detailed description of each field defined in the configuration format and valid values are specified. Platform-specific fields are identified as such. For all platform-specific configuration values, the scope defined below in the Platform-specific configuration section applies.
以下是配置格式中每个字段的详细说明,并规定了其有效值。特定平台字段会被明确标识。对于所有特定平台的配置值,均适用下文 “特定平台配置” 部分中定义的范围。
8.1 Specification version(规范版本)
ociVersion
(string, REQUIRED) MUST be in SemVer v2.0.0 format and specifies the version of the Open Container Initiative Runtime Specification with which the bundle complies. The Open Container Initiative Runtime Specification follows semantic versioning and retains forward and backward compatibility within major versions. For example, if a configuration is compliant with version 1.1 of this specification, it is compatible with all runtimes that support any 1.1 or later release of this specification, but is not compatible with a runtime that supports 1.0 and not 1.1.ociVersion(字符串类型,必填)必须采用 SemVer v2.0.0 格式,用于指定捆绑包所遵循的开放容器倡议(Open Container Initiative)运行时规范版本。开放容器倡议运行时规范遵循语义化版本控制,在主版本号范围内保持向前和向后兼容性。例如,若某配置符合本规范的 1.1 版本,则它与所有支持本规范 1.1 版本及后续版本的运行时兼容,但与仅支持 1.0 版本而不支持 1.1 版本的运行时不兼容。
Example(示例)
1 | "ociVersion": "0.1.0" |
8.2 Root(根目录)
root
(object, OPTIONAL) specifies the container’s root filesystem. On Windows, for Windows Server Containers, this field is REQUIRED. For Hyper-V Containers, this field MUST NOT be set.
root(对象类型,可选)指定容器的根文件系统。在 Windows 系统上,对于 Windows Server 容器,此字段为*必填项;对于 Hyper-V 容器,此字段不得设置。*
On all other platforms, this field is REQUIRED.
在其他平台中,当前字段是必填字段
path
(string, REQUIRED) Specifies the path to the root filesystem for the container.path
(字符型,必填)容器中根目录的路径On Windows,
path
MUST be a volume GUID path.在 Windows 系统上,路径必须是卷 GUID 路径。
On POSIX platforms,
path
is either an absolute path or a relative path to the bundle. For example, with a bundle at/to/bundle
and a root filesystem at/to/bundle/rootfs
, thepath
value can be either/to/bundle/rootfs
orrootfs
. The value SHOULD be the conventionalrootfs
.在 POSIX 平台上,路径可以是绝对路径,也可以是相对于捆绑包的相对路径。例如,若捆绑包位于
/to/bundle
,根文件系统位于/to/bundle/rootfs
,则路径值可以是/to/bundle/rootfs
或rootfs
。该值建议使用常规的rootfs
。
A directory MUST exist at the path declared by the field.
*在该字段声明的路径下,必须存在一个目录。*
readonly
(bool, OPTIONAL) If true then the root filesystem MUST be read-only inside the container, defaults to false.readonly
(布尔类型,可选):如果为 true,则容器内的根文件系统必须为只读模式,默认值为 false。On Windows, this field MUST be omitted or false.
在Windows中,当前字段必须是false或者直接忽略不填内容。
8.2.1 Example (POSIX platforms)(POSIX platforms示例)
1 | "root": { |
8.2.2 Example (Windows)(Windows 示例)
1 | "root": { |
8.3 Mounts(挂载)
mounts
(array of objects, OPTIONAL) specifies additional mounts beyond root
. The runtime MUST mount entries in the listed order. For Linux, the parameters are as documented in mount(2) system call man page. For Solaris, the mount entry corresponds to the ‘fs’ resource in the zonecfg(1M) man page.
mounts(对象数组类型,可选)指定根目录之外的额外挂载项。运行时必须按照列表中的顺序挂载这些条目。在 Linux 系统上,相关参数可参考 mount (2) 系统调用的手册页;在 Solaris 系统上,挂载条目对应 zonecfg (1M) 手册页中的 “fs” 资源。
destination
(string, REQUIRED) Destination of mount point: path inside container.destination
(字符串类型,必填):挂载点的目标路径,即容器内部的路径。Linux: This value SHOULD be an absolute path. For compatibility with old tools and configurations, it MAY be a relative path, in which case it MUST be interpreted as relative to “/“. Relative paths are deprecated.
Linux:此值建议为绝对路径。为兼容旧工具和配置,它可以是相对路径,在此情况下,该路径必须被解释为相对于 “/” 的路径。相对路径已被弃用。
Windows: This value MUST be an absolute path. One mount destination MUST NOT be nested within another mount (e.g., c:\foo and c:\foo\bar).
Windows:此值必须为绝对路径。一个挂载目标不得嵌套在另一个挂载目标内部(例如,不得同时存在
c:\foo
和c:\foo\bar
这样的挂载)。Solaris: This value MUST be an absolute path. Corresponds to “dir” of the fs resource in zonecfg(1M).
Solaris:此值必须为绝对路径。对应于 zonecfg (1M) 中 “fs” 资源的 “dir” 属性。
For all other platforms: This value MUST be an absolute path.
对于所有其他平台:此值必须为绝对路径。
source
(string, OPTIONAL) A device name, but can also be a file or directory name for bind mounts or a dummy. Path values for bind mounts are either absolute or relative to the bundle. A mount is a bind mount if it has eitherbind
orrbind
in the options.source
(字符串类型,可选):可以是设备名称,也可作为绑定挂载(bind mounts)的文件或目录名称,或者是一个占位符。绑定挂载的路径值可以是绝对路径,也可以是相对于捆绑包(bundle)的相对路径。如果挂载选项中包含bind
或rbind
,则该挂载为绑定挂载。Windows: a local directory on the filesystem of the container host. UNC paths and mapped drives are not supported.
指容器主机文件系统上的本地目录。不支持 UNC 路径和映射驱动器。
Solaris: corresponds to “special” of the fs resource in zonecfg(1M).
Solaris:对应于 zonecfg (1M) 中 “fs” 资源的 “special” 属性。
options
(array of strings, OPTIONAL) Mount options of the filesystem to be used.options
(字符串数组类型,可选):要使用的文件系统的挂载选项。Linux: See Linux mount options below.
Linux:请参见下方的 Linux 挂载选项。
Solaris: corresponds to “options” of the fs resource in zonecfg(1M).
Solaris:对应于 zonecfg (1M) 中 “fs” 资源的 “options” 属性。
Windows: runtimes MUST support
ro
, mounting the filesystem read-only whenro
is given.Windows:运行时必须支持
ro
选项,当指定ro
时,文件系统将以只读方式挂载。
8.3.1 Linux mount options(Linux挂载选项)
Runtimes MUST/SHOULD/MAY implement the following option strings for Linux:
运行时*必须/应该/可以为 Linux 实现以下选项字符串:*
Option name | Requirement | Description |
---|---|---|
async | MUST | [^1] |
atime | MUST | [^1] |
bind | MUST | Bind mount [^2] |
defaults | MUST | [^1] |
dev | MUST | [^1] |
diratime | MUST | [^1] |
dirsync | MUST | [^1] |
exec | MUST | [^1] |
iversion | MUST | [^1] |
lazytime | MUST | [^1] |
loud | MUST | [^1] |
mand | MAY | [^1] (Deprecated in kernel 5.15, util-linux 2.38) |
noatime | MUST | [^1] |
nodev | MUST | [^1] |
nodiratime | MUST | [^1] |
noexec | MUST | [^1] |
noiversion | MUST | [^1] |
nolazytime | MUST | [^1] |
nomand | MAY | [^1] |
norelatime | MUST | [^1] |
nostrictatime | MUST | [^1] |
nosuid | MUST | [^1] |
nosymfollow | SHOULD | [^1] (Introduced in kernel 5.10, util-linux 2.38) |
private | MUST | Bind mount propagation [^2] |
ratime | SHOULD | Recursive atime [^3] |
rbind | SHOULD | Recursive dev [^3] |
rdiratime | SHOULD | Recursive diratime [^3] |
relatime | MUST | [^1] |
remount | MUST | [^1] |
rexec | SHOULD | Recursive dev [^3] |
rnoatime | SHOULD | Recursive noatime [^3] |
rnodiratime | SHOULD | Recursive nodiratime [^3] |
rnoexec | SHOULD | rnoexec |
rnorelatime | SHOULD | Recursive norelatime [^3] |
rnostrictatime | SHOULD | Recursive nostrictatime [^3] |
rnosuid | SHOULD | Recursive nosuid [^3] |
rnosymfollow | SHOULD | Recursive nosymfollow [^3] |
ro | MUST | [^1] |
rprivate | MUST | Bind mount propagation [^2] |
rrelatime | SHOULD | Recursive relatime [^3] |
rro | SHOULD | Recursive ro [^3] |
rrw | SHOULD | Recursive rw [^3] |
rshared | MUST | Bind mount propagation [^2] |
rslave | MUST | Bind mount propagation [^2] |
rstrictatime | SHOULD | Recursive strictatime [^3] |
rsuid | SHOULD | Recursive suid [^3] |
rsymfollow | SHOULD | Recursive symfollow [^3] |
runbindable | MUST | Bind mount propagation [^2] |
rw | MUST | [^1] |
shared | MUST | [^1] |
silent | MUST | [^1] |
slave | MUST | Bind mount propagation [^2] |
strictatime | MUST | [^1] |
suid | MUST | [^1] |
symfollow | SHOULD | Opposite of nosymfollow |
sync | MUST | [^1] |
tmpcopyup | MAY | copy up the contents to a tmpfs |
unbindable | MUST | Bind mount propagation [^2] |
idmap | SHOULD | Indicates that the mount MUST have an idmapping applied. This option SHOULD NOT be passed to the underlying mount(2) call. If uidMappings or gidMappings are specified for the mount, the runtime MUST use those values for the mount’s mapping. If they are not specified, the runtime MAY use the container’s user namespace mapping, otherwise an error MUST be returned. If there are no uidMappings and gidMappings specified and the container isn’t using user namespaces, an error MUST be returned. This SHOULD be implemented using mount_setattr(MOUNT_ATTR_IDMAP) , available since Linux 5.12.(表示该挂载必须应用 ID 映射。此选项不应传递给底层的 mount (2) 系统调用。如果为该挂载指定了 uidMappings 或 gidMappings,运行时必须使用这些值进行挂载映射。若未指定这些值,运行时可以使用容器的用户命名空间映射,否则必须返回错误。如果未指定 uidMappings 和 gidMappings,且容器未使用用户命名空间,则必须返回错误。此功能应使用 mount_setattr (MOUNT_ATTR_IDMAP) 实现。该函数自 Linux 5.12 版本起可用。) |
ridmap | SHOULD | Indicates that the mount MUST have an idmapping applied, and the mapping is applied recursively [^3]. This option SHOULD NOT be passed to the underlying mount(2) call. If uidMappings or gidMappings are specified for the mount, the runtime MUST use those values for the mount’s mapping. If they are not specified, the runtime MAY use the container’s user namespace mapping, otherwise an error MUST be returned. If there are no uidMappings and gidMappings specified and the container isn’t using user namespaces, an error MUST be returned. This SHOULD be implemented using mount_setattr(MOUNT_ATTR_IDMAP) , available since Linux 5.12.(表示该挂载必须应用 ID 映射,且映射需递归生效 [ˆ3]。此选项不应传递给底层的 mount (2) 系统调用。如果为该挂载指定了 uidMappings 或 gidMappings,运行时必须使用这些值进行挂载映射。若未指定这些值,运行时可以使用容器的用户命名空间映射,否则必须返回错误。如果未指定 uidMappings 和 gidMappings,且容器未使用用户命名空间,则必须返回错误。此功能应使用 mount_setattr (MOUNT_ATTR_IDMAP) 实现,该函数自 Linux 5.12 版本起可用。) |
[^1]: Corresponds to mount(8)
(filesystem-independent). [^2]: Corresponds to bind mounts and shared subtrees. [^3]: These AT_RECURSIVE
options need kernel 5.12 or later. See mount_setattr(2)
[ˆ1]:对应于 mount (8)(与文件系统无关)。
[ˆ2]:对应于绑定挂载(bind mounts)和共享子树(shared subtrees)。
[ˆ3]:这些 AT_RECURSIVE 选项需要内核 5.12 或更高版本。请参见 mount_setattr (2)。
The “MUST” options correspond to mount(8)
.
“MUST” 选项对应于 mount (8)。
Runtimes MAY also implement custom option strings that are not listed in the table above. If a custom option string is already recognized by mount(8)
, the runtime SHOULD follow the behavior of mount(8)
.
运行时也可以实现上表中未列出的自定义选项字符串。如果某个自定义选项字符串已被 mount (8) 识别,运行时应当遵循 mount (8) 的行为。
Runtimes SHOULD treat unknown options as filesystem-specific ones) and pass those as a comma-separated string to the fifth (const void *data
) argument of mount(2)
.
运行时应当将未知选项视为文件系统特定选项,并将这些选项以逗号分隔的字符串形式传递给mount(2)的第五个参数(const void *data
)。
8.3.2 Example (Windows 示例)
1 | "mounts": [ |
8.3.3 POSIX-platform Mounts (POSIX-platform 挂载)
For POSIX platforms the mounts
structure has the following fields:
对于 POSIX 平台,挂载结构包含以下字段:
type
(string, OPTIONAL) The type of the filesystem to be mounted.type
(字符串类型,可选):要挂载的文件系统类型。Linux: filesystem types supported by the kernel as listed in /proc/filesystems (e.g., “minix”, “ext2”, “ext3”, “jfs”, “xfs”, “reiserfs”, “msdos”, “proc”, “nfs”, “iso9660”). For bind mounts (when
options
include eitherbind
orrbind
), the type is a dummy, often “none” (not listed in /proc/filesystems).Linux:内核支持的文件系统类型列于
/proc/filesystems
中(例如 “minix”、“ext2”、“ext3”、“jfs”、“xfs”、“reiserfs”、“msdos”、“proc”、“nfs”、“iso9660”)。对于绑定挂载(当选项中包含bind
或rbind
时),其类型为占位符,通常为 “none”(未列于/proc/filesystems
中)。Solaris: corresponds to “type” of the fs resource in zonecfg(1M).
Solaris:对应于 zonecfg (1M) 中 “fs” 资源的 “type” 属性。
uidMappings
(array of type LinuxIDMapping, OPTIONAL) The mapping to convert UIDs from the source file system to the destination mount point. This SHOULD be implemented usingmount_setattr(MOUNT_ATTR_IDMAP)
, available since Linux 5.12. If specified, theoptions
field of themounts
structure SHOULD contain eitheridmap
orridmap
to specify whether the mapping should be applied recursively forrbind
mounts, as well as to ensure that older runtimes will not silently ignore this field. The format is the same as user namespace mappings. If specified, it MUST be specified along withgidMappings
.uidMappings(LinuxIDMapping 数组类型,可选):用于将 UID 从源文件系统转换到目标挂载点的映射。此功能应使用 mount_setattr (MOUNT_ATTR_IDMAP) 实现,该函数自 Linux 5.12 起可用。如果指定了该字段,挂载结构的 options 字段应包含
idmap
或ridmap
,以指定映射是否应对 rbind 挂载递归生效,同时确保旧版本运行时不会静默忽略此字段。其格式与用户命名空间映射相同。如果指定了该字段,则必须同时指定 gidMappings。gidMappings
(array of type LinuxIDMapping, OPTIONAL) The mapping to convert GIDs from the source file system to the destination mount point. This SHOULD be implemented usingmount_setattr(MOUNT_ATTR_IDMAP)
, available since Linux 5.12. If specified, theoptions
field of themounts
structure SHOULD contain eitheridmap
orridmap
to specify whether the mapping should be applied recursively forrbind
mounts, as well as to ensure that older runtimes will not silently ignore this field. For more details seeuidMappings
. If specified, it MUST be specified along withuidMappings
.gidMappings(LinuxIDMapping 数组类型,可选):用于将 GID 从源文件系统转换到目标挂载点的映射。此功能应使用 mount_setattr (MOUNT_ATTR_IDMAP) 实现,该函数自 Linux 5.12 起可用。如果指定了该字段,挂载结构的 options 字段应包含
idmap
或ridmap
,以指定映射是否应对 rbind 挂载递归生效,同时确保旧版本运行时不会静默忽略此字段。有关更多详细信息,请参见 uidMappings。如果指定了该字段,则必须同时指定 uidMappings。
8.3.3.1 Example (Linux 示例)
1 | "mounts": [ |
8.3.3.2 Example (Solaris 示例)
1 | "mounts": [ |
8.4 Process(进程)
process
(object, OPTIONAL) specifies the container process. This property is REQUIRED when start
is called.
process
(对象类型,可选)指定容器进程。调用 start 时,此属性为*必需项。*
terminal
(bool, OPTIONAL) specifies whether a terminal is attached to the process, defaults to false. As an example, if set to true on Linux a pseudoterminal pair is allocated for the process and the pseudoterminal pty is duplicated on the process’s standard streams.terminal
(布尔类型,可选)指定是否为进程附加终端,默认值为 false。例如,在 Linux 系统上如果将其设为 true,会为该进程分配一对伪终端(pseudoterminal pair),并且伪终端的 pty 会复制到进程的标准流中。consoleSize
(object, OPTIONAL) specifies the console size in characters of the terminal. Runtimes MUST ignore consoleSize if terminal is false or unset.consoleSize
指定终端的控制台大小(以字符为单位)。如果 terminal 为 false 或未设置,运行时必须忽略 consoleSize。height
(uint, REQUIRED)高度(无符号整型,必填)
width
(uint, REQUIRED)宽度(无符号整型,必填)
cwd
(string, REQUIRED) is the working directory that will be set for the executable. This value MUST be an absolute path.cwd
(字符串类型,必需)是将为可执行文件设置的工作目录。此值必须为绝对路径。env
(array of strings, OPTIONAL) with the same semantics as IEEE Std 1003.1-2008’senviron
.env(字符串数组类型,可选)具有与 IEEE Std 1003.1-2008 标准中 environ 相同的语义。
args
(array of strings, OPTIONAL) with similar semantics to IEEE Std 1003.1-2008execvp
‘s argv. This specification extends the IEEE standard in that at least one entry is REQUIRED (non-Windows), and that entry is used with the same semantics asexecvp
‘s file. This field is OPTIONAL on Windows, andcommandLine
is REQUIRED if this field is omitted.args
(字符串数组类型,可选)具有与 IEEE Std 1003.1-2008 标准中 execvp 的 argv 类似的语义。本规范对该 IEEE 标准进行了扩展:在非 Windows 系统上,该数组必须包含至少一个条目,且该条目与 execvp 中 file 参数的语义相同。在 Windows 系统上,此字段为可选;若省略此字段,则必须指定 commandLine。commandLine
(string, OPTIONAL) specifies the full command line to be executed on Windows. This is the preferred means of supplying the command line on Windows. If omitted, the runtime will fall back to escaping and concatenating fields fromargs
before making the system call into Windows.commandLine
(字符串类型,可选)指定在 Windows 系统上要执行的完整命令行。这是在 Windows 上提供命令行的首选方式。若省略此字段,运行时会先对 args 中的字段进行转义并拼接,再调用 Windows 系统接口。
8.4.1 POSIX process(POSIX 进程)
For systems that support POSIX rlimits (for example Linux and Solaris), the process
object supports the following process-specific properties:
对于支持 POSIX 资源限制(rlimits)的系统(例如 Linux 和 Solaris),进程对象支持以下特定于进程的属性:
rlimits
(array of objects, OPTIONAL) allows setting resource limits for the process. Each entry has the following structure:rlimits
(对象数组类型,可选)允许为进程设置资源限制。每个条目具有以下结构:type
(string, REQUIRED) the platform resource being limited.type
(字符串类型,必需):被限制的平台资源。Linux: valid values are defined in the
getrlimit(2)
man page, such asRLIMIT_MSGQUEUE
.Linux:有效值定义于 getrlimit (2) 手册页中,例如 RLIMIT_MSGQUEUE。
Solaris: valid values are defined in the
getrlimit(3)
man page, such asRLIMIT_CORE
.Solaris:有效值定义于 getrlimit (3) 手册页中,例如 RLIMIT_CORE。
The runtime MUST generate an error for any values which cannot be mapped to a relevant kernel interface. For each entry in
rlimits
, agetrlimit(3)
ontype
MUST succeed. For the following properties,rlim
refers to the status returned by thegetrlimit(3)
call.*对于任何无法映射到相关内核接口的值,运行时必须生成错误。对于 rlimits 中的每个条目,针对其 type 调用 getrlimit (3) 必须成功。对于以下属性,rlim 指的是 getrlimit (3) 调用返回的状态。*
soft
(uint64, REQUIRED) the value of the limit enforced for the corresponding resource.rlim.rlim_cur
MUST match the configured value.*
soft
(64 位无符号整数类型,必需):对应资源的强制限制值。rlim.rlim_cur 必须与配置的值一致。*hard
(uint64, REQUIRED) the ceiling for the soft limit that could be set by an unprivileged process.rlim.rlim_max
MUST match the configured value. Only a privileged process (e.g. one with theCAP_SYS_RESOURCE
capability) can raise a hard limit.*
hard
(64 位无符号整数类型,必需):非特权进程可设置的软限制上限。rlim.rlim_max 必须与配置的值一致。只有特权进程(例如具备 CAP_SYS_RESOURCE 权限的进程)才能提高硬限制。*
If
rlimits
contains duplicated entries with sametype
, the runtime MUST generate an error.如果 rlimits 中包含类型相同的重复条目,运行时必须生成错误。
8.4.2 Linux Process(Linux进程)
For Linux-based systems, the process
object supports the following process-specific properties.
对于基于 Linux 的系统,进程对象支持以下特定于进程的属性。
apparmorProfile
(string, OPTIONAL) specifies the name of the AppArmor profile for the process. For more information about AppArmor, see AppArmor documentation.
apparmorProfile(字符串类型,可选)指定进程的 AppArmor 配置文件名称。有关 AppArmor 的更多信息,请参阅 AppArmor 文档。
capabilities
(object, OPTIONAL) is an object containing arrays that specifies the sets of capabilities for the process. Valid values are defined in the capabilities(7) man page, such asCAP_CHOWN
. Any value which cannot be mapped to a relevant kernel interface, or cannot be granted otherwise MUST be logged as a warning by the runtime. Runtimes SHOULD NOT fail if the container configuration requests capabilities that cannot be granted, for example, if the runtime operates in a restricted environment with a limited set of capabilities.capabilities
contains the following properties:capabilities
(对象类型,可选)是一个包含数组的对象,用于指定进程的能力集。有效值定义于 capabilities (7) 手册页中,例如 CAP_CHOWN。任何无法映射到相关内核接口、或无法以其他方式授予的 value,运行时必须将其记录为警告。如果容器配置请求了无法授予的能力(例如,运行时在能力集受限的环境中运行),运行时不应因此失败。capabilities 包含以下属性:effective
(array of strings, OPTIONAL) theeffective
field is an array of effective capabilities that are kept for the process.effective
(字符串数组类型,可选):effective 字段是为进程保留的有效能力数组。bounding
(array of strings, OPTIONAL) thebounding
field is an array of bounding capabilities that are kept for the process.bounding
(字符串数组类型,可选):bounding 字段是为进程保留的边界能力数组。inheritable
(array of strings, OPTIONAL) theinheritable
field is an array of inheritable capabilities that are kept for the process.inheritable
(字符串数组类型,可选):inheritable 字段是为进程保留的可继承能力数组。permitted
(array of strings, OPTIONAL) thepermitted
field is an array of permitted capabilities that are kept for the process.permitted
(字符串数组类型,可选):permitted 字段是为进程保留的允许能力数组。ambient
(array of strings, OPTIONAL) theambient
field is an array of ambient capabilities that are kept for the process.ambient
(字符串数组类型,可选):ambient 字段是为进程保留的环境能力数组。
noNewPrivileges
(bool, OPTIONAL) settingnoNewPrivileges
to true prevents the process from gaining additional privileges. As an example, theno_new_privs
article in the kernel documentation has information on how this is achieved using aprctl
system call on Linux.noNewPrivileges
(布尔类型,可选):将 noNewPrivileges 设置为 true 可防止进程获取额外权限。例如,内核文档中的 no_new_privs 文章详细说明了在 Linux 系统上如何通过 prctl 系统调用来实现这一功能。oomScoreAdj
(int, OPTIONAL) adjusts the oom-killer score in[pid]/oom_score_adj
for the process’s[pid]
in a proc pseudo-filesystem. IfoomScoreAdj
is set, the runtime MUST setoom_score_adj
to the given value. IfoomScoreAdj
is not set, the runtime MUST NOT change the value ofoom_score_adj
.oomScoreAdj
(整数类型,可选)用于调整 proc 伪文件系统中进程 [pid] 对应的 [pid]/oom_score_adj 文件中的 OOM 杀手(oom-killer)分数。若设置了 oomScoreAdj,运行时必须将 oom_score_adj 设为指定值;若未设置 oomScoreAdj,运行时不得更改 oom_score_adj 的值。This is a per-process setting, where as
disableOOMKiller
is scoped for a memory cgroup. For more information on how these two settings work together, see the memory cgroup documentation section 10. OOM Contol.这是一项进程级别的设置,而 disableOOMKiller 的作用范围则是内存控制组(cgroup)。有关这两项设置如何协同工作的更多信息,请参阅内存控制组文档的第 10 节 “OOM 控制”。
scheduler
(object, OPTIONAL) is an object describing the scheduler properties for the process. Thescheduler
contains the following properties:scheduler
(对象类型,可选)是一个描述进程调度器属性的对象。该调度器包含以下属性:policy
(string, REQUIRED) represents the scheduling policy. A valid list of values is:policy
(字符串类型,必需)表示调度策略。有效的值列表如下:SCHED_OTHER
SCHED_FIFO
SCHED_RR
SCHED_BATCH
SCHED_ISO
SCHED_IDLE
SCHED_DEADLINE
nice
(int32, OPTIONAL) is the nice value for the process, affecting its priority. A lower nice value corresponds to a higher priority. If not set, the runtime must use the value 0.nice
(32 位整数类型,可选)是进程的优先级调整值,会影响进程的优先级。nice 值越低,对应的优先级越高。若未设置,运行时必须使用值 0。priority
(int32, OPTIONAL) represents the static priority of the process, used by real-time policies like SCHED_FIFO and SCHED_RR. If not set, the runtime must use the value 0.priority
(32 位整数类型,可选)表示进程的静态优先级,供 SCHED_FIFO 和 SCHED_RR 等实时调度策略使用。若未设置,运行时必须使用值 0。flags
(array of strings, OPTIONAL) is an array of strings representing scheduling flags. A valid list of values is:flags
(字符串数组类型,可选)是表示调度标志的字符串数组。有效的值列表如下:SCHED_FLAG_RESET_ON_FORK
SCHED_FLAG_RECLAIM
SCHED_FLAG_DL_OVERRUN
SCHED_FLAG_KEEP_POLICY
SCHED_FLAG_KEEP_PARAMS
SCHED_FLAG_UTIL_CLAMP_MIN
SCHED_FLAG_UTIL_CLAMP_MAX
runtime
(uint64, OPTIONAL) represents the amount of time in nanoseconds during which the process is allowed to run in a given period, used by the deadline scheduler. If not set, the runtime must use the value 0.runtime
(64 位无符号整数类型,可选)表示在给定周期内进程被允许运行的时间,单位为纳秒,供截止时间调度器(deadline scheduler)使用。若未设置,运行时必须使用值 0。deadline
(uint64, OPTIONAL) represents the absolute deadline for the process to complete its execution, used by the deadline scheduler. If not set, the runtime must use the value 0.deadline
(64 位无符号整数类型,可选)表示进程完成执行的绝对截止时间,供截止时间调度器(deadline scheduler)使用。若未设置,运行时必须使用值 0。period
(uint64, OPTIONAL) represents the length of the period in nanoseconds used for determining the process runtime, used by the deadline scheduler. If not set, the runtime must use the value 0.period
(64 位无符号整数类型,可选)表示用于确定进程运行时间的周期长度,单位为纳秒,供截止时间调度器(deadline scheduler)使用。若未设置,运行时必须使用值 0。
selinuxLabel
(string, OPTIONAL) specifies the SELinux label for the process. For more information about SELinux, see SELinux documentation.selinuxLabel
(字符串类型,可选)指定进程的 SELinux 标签。有关 SELinux 的更多信息,请参阅 SELinux 文档。ioPriority
(object, OPTIONAL) configures the I/O priority settings for the container’s processes within the process group. The I/O priority settings will be automatically applied to the entire process group, affecting all processes within the container. The following properties are available:ioPriority
(对象类型,可选)用于配置容器进程组内进程的 I/O 优先级设置。I/O 优先级设置会自动应用于整个进程组,影响容器内的所有进程。可用属性如下:class
(string, REQUIRED) specifies the I/O scheduling class. Possible values areIOPRIO_CLASS_RT
,IOPRIO_CLASS_BE
, andIOPRIO_CLASS_IDLE
.class
(字符串类型,必需)指定 I/O 调度类别。可能的值包括 IOPRIO_CLASS_RT、IOPRIO_CLASS_BE 和 IOPRIO_CLASS_IDLE。priority
(int, REQUIRED) specifies the priority level within the class. The value should be an integer ranging from 0 (highest) to 7 (lowest).priority
(整数类型,必需)指定该类别内的优先级级别。其值应为整数,范围从 0(最高)到 7(最低)。
execCPUAffinity
(object, OPTIONAL) specifies CPU affinity used to execute the process. This setting is not applicable to the container’s init process. The following properties are available:execCPUAffinity
(对象类型,可选)指定用于执行进程的 CPU 亲和性设置。此设置不适用于容器的初始化进程(init process)。可用属性如下:initial
(string, OPTIONAL) is a list of CPUs a runtime parent process to be run on initially, before the transition to container’s cgroup. This is a a comma-separated list, with dashes to represent ranges. For example,0-3,7
represents CPUs 0,1,2,3, and 7.initial
(字符串类型,可选)是运行时父进程在过渡到容器控制组(cgroup)之前最初运行的 CPU 列表。这是一个用逗号分隔的列表,其中使用短横线表示范围。例如,0-3,7 表示 CPU 0、1、2、3 和 7。final
(string, OPTIONAL) is a list of CPUs the process will be run on after the transition to container’s cgroup. The format is the same as forinitial
. If omitted or empty, runtime SHOULD NOT change process’ CPU affinity after the process is moved to container’s cgroup, and the final affinity is determined by the Linux kernel.final
(字符串类型,可选)是进程过渡到容器控制组(cgroup)后将运行的 CPU 列表。其格式与 initial 相同。若省略此参数或其值为空,运行时不应在进程移至容器控制组后更改其 CPU 亲和性,最终的亲和性将由 Linux 内核决定。
8.4.3 z/OS Process(z/OS 进程)
For z/OS-based systems, the process
object supports the following process-specific properties.
对于基于 z/OS 的系统,进程对象支持以下特定于进程的属性。
noNewPrivileges
(bool, OPTIONAL) settingnoNewPrivileges
to true prevents the process from gaining additional privileges.noNewPrivileges
(布尔类型,可选):将noNewPrivileges
设置为 true 可防止进程获取额外权限。
8.5 User (用户)
The user for the process is a platform-specific structure that allows specific control over which user the process runs as.
进程的用户设置是一个平台特定的结构,它允许对进程运行时所使用的用户进行具体控制。
8.5.1 POSIX-platform User (POSIX平台用户)
For POSIX platforms the user
structure has the following fields:
对于 POSIX平台的用户而言,有如下的字段:
uid
(int, REQUIRED) specifies the user ID in the container namespace.uid
(整数类型,必需)指定容器命名空间中的用户 ID。gid
(int, REQUIRED) specifies the group ID in the container namespace.gid
(整数类型,必需)指定容器命名空间中的组 ID。umask
(int, OPTIONAL) specifies the [umask][umask_2] of the user. If unspecified, the umask should not be changed from the calling process’umask
(整数类型,可选)指定用户的umask,若未指定,umask 不应从调用进程的 umask 进行更改。additionalGids
(array of ints, OPTIONAL) specifies additional group IDs in the container namespace to be added to the process.additionalGids
(整数数组类型,可选)指定要添加到进程的容器命名空间中的额外组 ID。
Note: symbolic name for uid and gid, such as uname and gname respectively, are left to upper levels to derive (i.e. /etc/passwd
parsing, NSS, etc)
注意:uid 和 gid 的符号名称(分别对应 uname 和 gname 等)由上层系统推导得出(例如通过解析 /etc/passwd 文件、名称服务开关(NSS)等方式)。
8.5.2 Example (Linux 示例)
1 | "process": { |
8.5.3 Example (Solaris 示例)
1 | "process": { |
8.5.4 Windows User (Windows用户)
For Windows based systems the user structure has the following fields:
对于基于 Windows 的系统,用户结构包含以下字段:
username
(string, OPTIONAL) specifies the user name for the process.username
(字符串类型,可选)指定进程的用户名。
8.5.5 Example (Windows 示例)
1 | "process": { |
8.6 Hostname (宿主机名)
hostname
(string, OPTIONAL) specifies the container’s hostname as seen by processes running inside the container. On Linux, for example, this will change the hostname in the container UTS namespace. Depending on your namespace configuration, the container UTS namespace may be the runtime UTS namespace.hostname
(字符串类型,可选)指定容器内运行的进程所看到的容器主机名。例如,在 Linux 系统上,这会修改容器 UTS 命名空间中的主机名。根据命名空间的配置情况,容器的 UTS 命名空间可能与运行时的 UTS 命名空间一致。
8.6.1 Example(示例)
1 | "hostname": "mrsdalloway" |
8.7 Domainname(域名)
domainname
(string, OPTIONAL) specifies the container’s domainname as seen by processes running inside the container. On Linux, for example, this will change the domainname in the container UTS namespace. Depending on your namespace configuration, the container UTS namespace may be the runtime UTS namespace.domainname(字符串类型,可选)指定容器内运行的进程所看到的容器域名。例如,在 Linux 系统上,这会修改容器 UTS 命名空间中的域名。根据命名空间的配置情况,容器的 UTS 命名空间可能与运行时的 UTS 命名空间一致。
8.7.1 Example(示例)
1 | "domainname": "foobarbaz.test" |
8.8 Platform-specific configuration (平台特定配置)
linux
(object, OPTIONAL) Linux-specific configuration. This MAY be set if the target platform of this spec islinux
.linux
(对象类型,可选):特定于 Linux 系统的配置。如果本规范的目标平台是 Linux,此配置可以被设置。windows
(object, OPTIONAL) Windows-specific configuration. This MUST be set if the target platform of this spec iswindows
.windows
(对象类型,可选):特定于 Windows 系统的配置。如果本规范的目标平台是 Windows,此配置必须被设置。solaris
(object, OPTIONAL) Solaris-specific configuration. This MAY be set if the target platform of this spec issolaris
.solaris
(对象类型,可选):特定于 Solaris 系统的配置。如果本规范的目标平台是 Solaris,此配置可以被设置。vm
(object, OPTIONAL) Virtual-machine-specific configuration. This MAY be set if the target platform and architecture of this spec support hardware virtualization.vm
(对象类型,可选):特定于虚拟机的配置。如果本规范的目标平台和架构支持硬件虚拟化,此配置可以被设置。zos
(object, OPTIONAL) z/OS-specific configuration. This MAY be set if the target platform of this spec iszos
.zos
(对象类型,可选):特定于 z/OS 系统的配置。如果本规范的目标平台是 z/OS,此配置可以被设置。
8.8.1 Example (Linux 示例)
1 | { |
8.9 POSIX-platform Hooks(POSIX-platform 钩子)
For POSIX platforms, the configuration structure supports hooks
for configuring custom actions related to the lifecycle of the container.
对于 POSIX 平台,该配置结构支持钩子机制,用于配置与容器生命周期相关的自定义操作。
hooks
(object, OPTIONAL) MAY contain any of the following properties:hooks
(对象类型,可选)可以包含以下任意属性:prestart
(array of objects, OPTIONAL,DEPRECATED) is an array of prestart` hooksprestart
(对象数组类型,可选,已废弃)是一组启动前钩子。Entries in the array contain the following properties:
数组中的条目包含以下属性:
path
(string, REQUIRED) with similar semantics to IEEE Std 1003.1-2008execv
‘s path. This specification extends the IEEE standard in thatpath
MUST be absolute.path
(字符串类型,必需)的语义与 IEEE Std 1003.1-2008 中的 execv 路径类似。本规范对该 IEEE 标准进行了扩展,要求 path 必须为绝对路径。args
(array of strings, OPTIONAL) with the same semantics as IEEE Std 1003.1-2008execv
‘s argv.args
(字符串数组类型,可选)的语义与 IEEE Std 1003.1-2008 中 execv 的 argv 相同。env
(array of strings, OPTIONAL) with the same semantics as IEEE Std 1003.1-2008’senviron
.*env
(字符串数组类型,可选)的语义与 IEEE Std 1003.1-2008 中的 environ 相同。*timeout
(int, OPTIONAL) is the number of seconds before aborting the hook. If set,timeout
MUST be greater than zero.timeout
(整数类型,可选)是终止钩子前的秒数。若设置此参数,timeout 的值必须大于零。
The value of
path
MUST resolve in the runtime namespace.path
的值必须在运行时命名空间中解析。The
prestart
hooks MUST be executed in the runtime namespace.prestart
钩子必须在运行时命名空间中执行。
createRuntime
(array of objects, OPTIONAL) is an array ofcreateRuntime
hookscreateRuntime
(对象数组类型,可选)是一组创建运行时钩子。Entries in the array contain the following properties (the entries are identical to the entries in the deprecated prestart hooks):
数组中的条目包含以下属性(这些条目与已废弃的启动前钩子中的条目完全相同):
path
(string, REQUIRED) with similar semantics to IEEE Std 1003.1-2008execv
‘s path. This specification extends the IEEE standard in thatpath
MUST be absolute.path
(字符串类型,必需)的语义与 IEEE Std 1003.1-2008 中 execv 的路径类似。本规范对该 IEEE 标准进行了扩展,要求 path 必须为绝对路径。args
(array of strings, OPTIONAL) with the same semantics as IEEE Std 1003.1-2008execv
‘s argv.args
(字符串数组类型,可选)的语义与 IEEE Std 1003.1-2008 中 execv 的 argv 相同。env
(array of strings, OPTIONAL) with the same semantics as IEEE Std 1003.1-2008’senviron
.env
(字符串数组类型,可选)的语义与 IEEE Std 1003.1-2008 中的 environ 相同。timeout
(int, OPTIONAL) is the number of seconds before aborting the hook. If set,timeout
MUST be greater than zero.timeout
(整数类型,可选)是终止钩子前的秒数。若设置此参数,timeout 的值必须大于零。
The value of
path
MUST resolve in the runtime namespace.path
的值必须在运行时命名空间中解析。The
createRuntime
hooks MUST be executed in the runtime namespace.创建运行时钩子必须在运行时命名空间中执行。
createContainer
(array of objects, OPTIONAL) is an array ofcreateContainer
hookscreateContainer
(对象数组类型,可选)是一组创建容器钩子。Entries in the array have the same schema as
createRuntime
entries.数组中的条目与
createRuntime
条目的结构完全相同。The value of
path
MUST resolve in the runtime namespace.path
的值必须在运行时命名空间中解析。The
createContainer
hooks MUST be executed in the container namespace.创建容器钩子必须在容器命名空间中执行。
startContainer
(array of objects, OPTIONAL) is an array ofstartContainer
hooksstartContainer
(对象数组类型,可选)是一组启动容器钩子。Entries in the array have the same schema as
createRuntime
entries.数组中的条目与
createRuntime
条目的结构完全相同。The value of
path
MUST resolve in the container namespace.path
的值必须在容器命名空间中解析。The
startContainer
hooks MUST be executed in the container namespace.启动容器钩子必须在容器命名空间中执行。
poststart
(array of objects, OPTIONAL) is an array ofpoststart
hookspoststart
(对象数组类型,可选)是一组启动后钩子。Entries in the array have the same schema as
createRuntime
entries.数组中的条目与
createRuntime
条目的结构完全相同The value of
path
MUST resolve in the runtime namespace.path
的值必须在容器命名空间中解析。The
poststart
hooks MUST be executed in the runtime namespace.启动后钩子必须在运行时命名空间中执行。
poststop
(array of objects, OPTIONAL) is an array ofpoststop
hookspoststop
(对象数组类型,可选)是一组停止后钩子。Entries in the array have the same schema as
createRuntime
entries.数组中的条目与 createRuntime 条目的结构完全相同。
The value of
path
MUST resolve in the runtime namespace.*path
的值必须在容器命名空间中解析。*The
poststop
hooks MUST be executed in the runtime namespace.停止后钩子必须在运行时命名空间中执行。
Hooks allow users to specify programs to run before or after various lifecycle events. Hooks MUST be called in the listed order. The state of the container MUST be passed to hooks over stdin so that they may do work appropriate to the current state of the container.
钩子允许用户指定在各种生命周期事件之前或之后运行的程序。钩子必须按照列出的顺序调用。容器的状态必须通过标准输入(stdin)传递给钩子,以便钩子能够根据容器的当前状态执行相应的操作。
8.9.1 Prestart (启动前)
The prestart
hooks MUST be called as part of the create
operation after the runtime environment has been created (according to the configuration in config.json) but before the pivot_root
or any equivalent operation has been executed. On Linux, for example, they are called after the container namespaces are created, so they provide an opportunity to customize the container (e.g. the network namespace could be specified in this hook). The prestart
hooks MUST be called before the createRuntime
hooks.
启动前(prestart)钩子必须作为创建操作的一部分调用,调用时机为运行时环境已根据 config.json 中的配置创建完成,但尚未执行 pivot_root 或任何等效操作之前。例如,在 Linux 系统上,启动前钩子会在容器命名空间创建之后调用,因此它们为自定义容器提供了机会(例如,可在此钩子中指定网络命名空间)。启动前钩子必须在创建运行时(createRuntime)钩子之前调用。
Note: prestart
hooks were deprecated in favor of createRuntime
, createContainer
and startContainer
hooks, which allow more granular hook control during the create and start phase.
注意:启动前(prestart)钩子已被弃用,取而代之的是 createRuntime、createContainer 和 startContainer 钩子。这些新钩子能够在创建和启动阶段提供更精细的钩子控制。
The prestart
hooks’ path MUST resolve in the runtime namespace. The prestart
hooks MUST be executed in the runtime namespace.
启动前(prestart)钩子的路径必须在运行时命名空间中解析。启动前钩子必须在运行时命名空间中执行。
8.9.2 CreateRuntime Hooks (创建运行时钩子)
The createRuntime
hooks MUST be called as part of the create
operation after the runtime environment has been created (according to the configuration in config.json) but before the pivot_root
or any equivalent operation has been executed.
创建运行时(createRuntime)钩子必须作为创建操作的一部分调用,调用时机为运行时环境已根据 config.json 中的配置创建完成,但尚未执行 pivot_root 或任何等效操作之前。
The createRuntime
hooks’ path MUST resolve in the runtime namespace. The createRuntime
hooks MUST be executed in the runtime namespace.
创建运行时(createRuntime)钩子的路径必须在运行时命名空间中解析。创建运行时钩子必须在运行时命名空间中执行。
On Linux, for example, they are called after the container namespaces are created, so they provide an opportunity to customize the container (e.g. the network namespace could be specified in this hook).
例如,在 Linux 系统上,它们会在容器命名空间创建之后被调用,因此这为自定义容器提供了机会(例如,可在此钩子中指定网络命名空间)。
The definition of createRuntime
hooks is currently underspecified and hooks authors, should only expect from the runtime that the mount namespace have been created and the mount operations performed. Other operations such as cgroups and SELinux/AppArmor labels might not have been performed by the runtime.
当前,创建运行时(createRuntime)钩子的定义尚未明确。钩子的编写者仅能预期运行时已完成挂载命名空间的创建和挂载操作,而诸如控制组(cgroups)及 SELinux/AppArmor 标签等其他操作,运行时可能尚未执行。
8.9.3 CreateContainer Hooks (容器创建钩子)
The createContainer
hooks MUST be called as part of the create
operation after the runtime environment has been created (according to the configuration in config.json) but before the pivot_root
or any equivalent operation has been executed. The createContainer
hooks MUST be called after the createRuntime
hooks.
创建容器(createContainer)钩子必须作为创建操作的一部分调用,调用时机为运行时环境已根据 config.json 中的配置创建完成,但尚未执行 pivot_root 或任何等效操作之前。创建容器钩子必须在创建运行时(createRuntime)钩子之后调用。
The createContainer
hooks’ path MUST resolve in the runtime namespace. The createContainer
hooks MUST be executed in the container namespace.
创建容器(createContainer)钩子的路径必须在运行时命名空间中解析。创建容器钩子必须在容器命名空间中执行。
For example, on Linux this would happen before the pivot_root
operation is executed but after the mount namespace was created and setup.
例如,在 Linux 系统上,这会在 pivot_root 操作执行之前发生,但在挂载命名空间已创建并设置完成之后。
The definition of createContainer
hooks is currently underspecified and hooks authors, should only expect from the runtime that the mount namespace and different mounts will be setup. Other operations such as cgroups and SELinux/AppArmor labels might not have been performed by the runtime.
当前,创建容器(createContainer)钩子的定义尚未明确。钩子编写者仅能预期运行时已完成挂载命名空间的创建以及各类挂载操作的设置,而诸如控制组(cgroups)及 SELinux/AppArmor 标签等其他操作,运行时可能尚未执行。
8.9.4 StartContainer Hooks(启动容器钩子)
The startContainer
hooks MUST be called before the user-specified process is executed as part of the start
operation. This hook can be used to execute some operations in the container, for example running the ldconfig
binary on linux before the container process is spawned.
启动容器(startContainer)钩子必须在作为启动操作一部分的用户指定进程执行之前调用。此钩子可用于在容器内执行一些操作,例如在 Linux 系统上,在容器进程启动前运行 ldconfig 二进制文件。
The startContainer
hooks’ path MUST resolve in the container namespace. The startContainer
hooks MUST be executed in the container namespace.
启动容器(startContainer)钩子的路径必须在容器命名空间中解析。启动容器钩子必须在容器命名空间中执行。
8.9.5 Poststart (启动后钩子)
The poststart
hooks MUST be called after the user-specified process is executed but before the start
operation returns. For example, this hook can notify the user that the container process is spawned.
启动后(poststart)钩子必须在用户指定的进程执行之后、但启动操作返回之前调用。例如,此钩子可用于通知用户容器进程已启动。
The poststart
hooks’ path MUST resolve in the runtime namespace. The poststart
hooks MUST be executed in the runtime namespace.
启动后(poststart)钩子的路径必须在运行时命名空间中解析。启动后钩子必须在运行时命名空间中执行。
8.9.6 Poststop (停止后钩子)
The poststop
hooks MUST be called after the container is deleted but before the delete
operation returns. Cleanup or debugging functions are examples of such a hook.
停止后(poststop)钩子必须在容器被删除之后、但删除操作返回之前调用。清理或调试功能就是此类钩子的示例。
The poststop
hooks’ path MUST resolve in the runtime namespace. The poststop
hooks MUST be executed in the runtime namespace.
停止后(poststop)钩子的路径必须在运行时命名空间中解析。停止后钩子必须在运行时命名空间中执行。
8.9.7 Summary (总结)
See the below table for a summary of hooks and when they are called:
Name | Namespace | When |
---|---|---|
prestart (Deprecated)(已弃用) |
runtime | During the create operation, after the runtime environment has been created and before the pivot root or any equivalent operation.(在创建操作期间,运行时环境创建完成后,且 pivot_root 或任何等效操作执行之前。) |
createRuntime |
runtime | During the create operation, after the runtime environment has been created and before the pivot root or any equivalent operation.(在创建操作期间,运行时环境创建完成后,且 pivot_root 或任何等效操作执行之前。) |
createContainer |
container | During the create operation, after the runtime environment has been created and before the pivot root or any equivalent operation.(在创建操作期间,运行时环境创建完成后,且 pivot_root 或任何等效操作执行之前。) |
startContainer |
container | After the start operation is called but before the user-specified program command is executed.(在启动操作被调用后,且用户指定的程序命令执行之前。) |
poststart |
runtime | After the user-specified process is executed but before the start operation returns.(在用户指定的进程执行之后,且启动操作返回之前。) |
poststop |
runtime | After the container is deleted but before the delete operation returns.(在容器被删除之后,且删除操作返回之前。) |
8.9.8 Example(示例)
1 | "hooks": { |
8.10 Annotations(注解)
annotations
(object, OPTIONAL) contains arbitrary metadata for the container. This information MAY be structured or unstructured. Annotations MUST be a key-value map. If there are no annotations then this property MAY either be absent or an empty map.
annotations
(对象类型,可选)包含容器的任意元数据。此类信息可以是结构化的,也可以是非结构化的。注释必须为键值对映射结构。若不存在注释,该属性可以省略,或设为一个空映射。
Keys MUST be strings. Keys MUST NOT be an empty string. Keys SHOULD be named using a reverse domain notation - e.g. com.example.myKey
.
*键(Keys)必须为字符串类型。键不能为空字符串。键建议使用反向域名表示法命名,例如:com.example.myKey 。*
The org.opencontainers
namespace for keys is reserved for use by this specification, annotations using keys in this namespace MUST be as described in this section. The following keys in the org.opencontainers
namespaces MAY be used:
org.opencontainers 键命名空间为本文档规范专用预留。使用该命名空间中键的注释必须符合本节中的描述。org.opencontainers 命名空间中的以下键可以使用:
键(Key) | 定义(Definition) |
---|---|
org.opencontainers.image.os |
Indicates the operating system the container image was built to run on. The annotation value MUST have a valid value for the os property as defined in the OCI image specification. This annotation SHOULD only be used in accordance with the OCI image specification’s runtime conversion specification. (指示容器镜像构建时目标运行的操作系统。该注释的值必须符合 OCI 镜像规范中定义的 os 属性的有效值要求。此注释应仅按照 OCI 镜像规范的运行时转换规范使用。) |
org.opencontainers.image.os.version |
Indicates the operating system version targeted by the container image. The annotation value MUST have a valid value for the os.version property as defined in the OCI image specification. This annotation SHOULD only be used in accordance with the OCI image specification’s runtime conversion specification. (指示容器镜像目标运行的操作系统版本。该注释的值必须符合 OCI 镜像规范中定义的 os.version 属性的有效值要求。此注释应仅按照 OCI 镜像规范的运行时转换规范使用。) |
org.opencontainers.image.os.features |
Indicates mandatory operating system features required by the container image. The annotation value MUST have a valid value for the os.features property as defined in the OCI image specification. This annotation SHOULD only be used in accordance with the OCI image specification’s runtime conversion specification.(指示容器镜像所需的强制性操作系统功能。该注释的值必须符合 OCI 镜像规范中定义的 os.features 属性的有效值要求。此注释应仅按照 OCI 镜像规范的运行时转换规范使用。) |
org.opencontainers.image.architecture |
Indicates the architecture that binaries in the container image are built to run on. The annotation value MUST have a valid value for the architecture property as defined in the OCI image specification. This annotation SHOULD only be used in accordance with the OCI image specification’s runtime conversion specification.(指示容器镜像中的二进制文件构建时目标运行的架构。该注释的值必须符合 OCI 镜像规范中定义的 architecture 属性的有效值要求。此注释应仅按照 OCI 镜像规范的运行时转换规范使用。) |
org.opencontainers.image.variant |
Indicates the variant of the architecture that binaries in the container image are built to run on. The annotation value MUST have a valid value for the variant property as defined in the OCI image specification. This annotation SHOULD only be used in accordance with the OCI image specification’s runtime conversion specification. (指示容器镜像中的二进制文件构建时目标运行的架构变体。该注释的值必须符合 OCI 镜像规范中定义的 variant 属性的有效值要求。此注释应仅按照 OCI 镜像规范的运行时转换规范使用。) |
org.opencontainers.image.author |
Indicates the author of the container image. The annotation value MUST have a valid value for the author property as defined in the OCI image specification. This annotation SHOULD only be used in accordance with the OCI image specification’s runtime conversion specification. (指示容器镜像的作者。该注释的值必须符合 OCI 镜像规范中定义的 author 属性的有效值要求。此注释应仅按照 OCI 镜像规范的运行时转换规范使用。) |
org.opencontainers.image.created |
Indicates the date and time when the container image was created. The annotation value MUST have a valid value for the created property as defined in the OCIimage specification. This annotation SHOULD only be used in accordance with the OCI image specification’s runtime conversion specification.(指示容器镜像的创建日期和时间。该注释的值必须符合 OCI 镜像规范中定义的 created 属性的有效值要求。此注释应仅按照 OCI 镜像规范的运行时转换规范使用。) |
org.opencontainers.image.stopSignal |
Indicates signal that SHOULD be sent by the container runtimes to kill the container. The annotation value MUST have a valid value for the config.StopSignal property as defined in the OCI image specification. This annotation SHOULD only be used in accordance with the OCI image specification’s runtime conversion specification. (指示容器运行时应发送以终止容器的信号。该注释的值必须符合 OCI 镜像规范中定义的 config.StopSignal 属性的有效值要求。此注释应仅按照 OCI 镜像规范的运行时转换规范使用。) |
All other keys in the org.opencontainers
namespace not specified in this above table are reserved and MUST NOT be used by subsequent specifications. Runtimes MUST handle unknown annotation keys like any other unknown property.
org.opencontainers 命名空间中未在上述表格中指定的所有其他键均为预留键,后续规范不得使用。运行时处理未定义的注解key可以遵循8.11小节的处理规范。
Values MUST be strings. Values MAY be an empty string.
键值对中的内容必须是字符串,当然内容也是被允许赋值为空字。
1 | "annotations": { |
8.11 Extensibility(拓展性)
Runtimes MAY log unknown properties but MUST otherwise ignore them. That includes not generating errors if they encounter an unknown property.
运行时可以记录未知属性,但在其他情况下必须忽略这些属性。这包括在遇到未知属性时不得生成错误。
8.12 Valid values (有效值)
Runtimes MUST generate an error when invalid or unsupported values are encountered. Unless support for a valid value is explicitly required, runtimes MAY choose which subset of the valid values it will support.
运行时在遇到无效值或不支持的值时必须生成错误。除非明确要求支持某个有效值,否则运行时可以选择支持有效值的子集。
8.13 Configuration Schema Example (配置Schema示例)
Here is a full example config.json
for reference.
下面是完整的config.json
的引用示例
1 | { |
9 Linux Container Configuration (Linux容器配置)
This document describes the schema for the Linux-specific section of the container configuration. The Linux container specification uses various kernel features like namespaces, cgroups, capabilities, LSM, and filesystem jails to fulfill the spec.
本文档描述了容器配置中 Linux 特定部分的模式(schema)。Linux 容器规范借助命名空间、控制组(cgroups)、权限能力(capabilities)、Linux 安全模块(LSM)和文件系统隔离等多种内核特性来实现其规范要求。
9.1 Default Filesystems (默认文件系统)
The Linux ABI includes both syscalls and several special file paths. Applications expecting a Linux environment will very likely expect these file paths to be set up correctly.
Linux ABI 包含系统调用(syscalls)和多个特殊文件路径。期望运行在 Linux 环境中的应用程序,很可能会要求这些文件路径被正确配置。
The following filesystems SHOULD be made available in each container’s filesystem:
以下文件系统应在每个容器的文件系统中提供:
Path | Type |
---|---|
/proc | proc |
/sys | sysfs |
/dev/pts | devpts |
/dev/shm | tmpfs |
9.2 Namespaces (命名空间)
A namespace wraps a global system resource in an abstraction that makes it appear to the processes within the namespace that they have their own isolated instance of the global resource. Changes to the global resource are visible to other processes that are members of the namespace, but are invisible to other processes. For more information, see the namespaces(7) man page.
命名空间通过抽象化包装全局系统资源,使命名空间内的进程看起来拥有该全局资源的独立隔离实例。对全局资源的修改仅对该命名空间内的其他进程可见,对外部进程则不可见。更多信息请参阅 namespaces (7) 手册页。
Namespaces are specified as an array of entries inside the namespaces
root field. The following parameters can be specified to set up namespaces:
命名空间在 namespaces
根字段内以条目数组的形式指定。可通过以下参数来配置命名空间:
type
(string, REQUIRED) - namespace type. The following namespace types SHOULD be supported:type
(字符串类型,必填)- 命名空间类型。以下命名空间类型应被支持:pid
processes inside the container will only be able to see other processes inside the same container or inside the same pid namespace.pid
容器内的进程将只能看到同一容器内或同一 pid 命名空间内的其他进程。network
the container will have its own network stack.network
容器将拥有自己的网络栈。mount
the container will have an isolated mount table.mount
容器将拥有独立的挂载表。ipc
processes inside the container will only be able to communicate to other processes inside the same container via system level IPC.Ipc
容器内的进程将只能通过系统级 IPC(进程间通信)与同一容器内的其他进程进行通信。uts
the container will be able to have its own hostname and domain name.uts
容器将能够拥有自己的主机名和域名。user
the container will be able to remap user and group IDs from the host to local users and groups within the container.user
容器将能够将主机的用户 ID 和组 ID 重新映射为容器内部的本地用户和组。cgroup
the container will have an isolated view of the cgroup hierarchy.cgroup
容器将拥有独立的控制组(cgroup)层级视图。time
the container will be able to have its own clocks.time
:容器将能够拥有自己的时钟。
path
(string, OPTIONAL) - namespace file. This value MUST be an absolute path in the runtime mount namespace. The runtime MUST place the container process in the namespace associated with thatpath
. The runtime MUST generate an error ifpath
is not associated with a namespace of typetype
.path
(字符串类型,可选)- 命名空间文件。该值必须是运行时挂载命名空间中的绝对路径。运行时必须将容器进程放入与该路径关联的命名空间中。如果 path 未与 type 类型的命名空间关联,运行时必须生成错误。If
path
is not specified, the runtime MUST create a new container namespace of typetype
.如果未指定
path
,运行时必须创建一个 type 类型的新容器命名空间。
If a namespace type is not specified in the namespaces
array, the container MUST inherit the runtime namespace of that type. If a namespaces
field contains duplicated namespaces with same type
, the runtime MUST generate an error.
如果命名空间数组中未指定某种命名空间类型,容器必须继承该类型的运行时命名空间。如果命名空间字段中包含相同类型的重复命名空间,运行时必须生成错误。
9.2.1 Example(示例)
1 | "namespaces": [ |
9.3 User namespace mappings (用户命名空间映射)
uidMappings
(array of objects, OPTIONAL) describes the user namespace uid mappings from the host to the container. gidMappings
(array of objects, OPTIONAL) describes the user namespace gid mappings from the host to the container.
uidMappings
(对象数组,可选)描述从主机到容器的用户命名空间 UID 映射。gidMappings
(对象数组,可选)描述从主机到容器的用户命名空间 GID 映射。
Each entry has the following structure:
每个实体都具有以下结构:
containerID
(uint32, REQUIRED) - is the starting uid/gid in the container.containerID
(uint32 类型,必填)- 是容器内的起始 UID/GID。hostID
(uint32, REQUIRED) - is the starting uid/gid on the host to be mapped to containerID.hostID
(uint32 类型,必填)- 是主机上要映射到 containerID 的起始 UID/GID。size
(uint32, REQUIRED) - is the number of ids to be mapped.size
(uint32 类型,必填)- 是要映射的 ID 数量。
The runtime SHOULD NOT modify the ownership of referenced filesystems to realize the mapping. Note that the number of mapping entries MAY be limited by the kernel.
运行时不应通过修改引用文件系统的所有权来实现映射。请注意,映射条目的数量可能会受到内核的限制。
9.3.1 Example(示例)
1 | "uidMappings": [ |
9.4 Offset for Time Namespace(time命名空间偏移量)
timeOffsets
(object, OPTIONAL) sets the offset for Time Namespace. For more information see the time_namespaces.
timeOffsets
(对象类型,可选)用于设置时间命名空间的偏移量。有关更多信息,请参阅 time_namespaces(时间命名空间相关文档)。
The name of the clock is the entry key. Entry values are objects with the following properties:
时钟的名称是实体的键。实体的值具有以下属性对象:
secs
(int64, OPTIONAL) - is the offset of clock (in seconds) in the container.secs
(int64 类型,可选)- 是容器内时钟的偏移量(以秒为单位)。nanosecs
(uint32, OPTIONAL) - is the offset of clock (in nanoseconds) in the container.nanosecs
(uint32 类型,可选)- 是容器内时钟的偏移量(以纳秒为单位)。
9.5 Devices(外设)
devices
(array of objects, OPTIONAL) lists devices that MUST be available in the container. The runtime MAY supply them however it likes (with mknod
, by bind mounting from the runtime mount namespace, using symlinks, etc.).
devices
(对象数组,可选)列出了容器中必须可用的设备。运行时可以通过任意方式提供这些设备(例如使用 mknod 命令、从运行时挂载命名空间进行绑定挂载、使用符号链接等)。
Each entry has the following structure:
实体具有如下所示的结构:
type
(string, REQUIRED) - type of device:c
,b
,u
orp
. More info in mknod(1).type
(字符串类型,必填)- 设备类型:c、b、u 或 p。更多信息请参见 mknod (1) 手册。path
(string, REQUIRED) - full path to device inside container. If a file already exists atpath
that does not match the requested device, the runtime MUST generate an error. The path MAY be anywhere in the container filesystem, notably outside of/dev
.path
(字符串类型,必填)- 容器内设备的完整路径。如果该路径下已存在文件且与请求的设备不匹配,运行时必须生成错误。该路径可以位于容器文件系统的任何位置,尤其可以在 /dev 目录之外。major, minor
(int64, REQUIRED unlesstype
isp
) - major, minor numbers for the device.major、minor
(int64 类型,除非设备类型为 p,否则为必填项)- 设备的主设备号和次设备号。fileMode
(uint32, OPTIONAL) - file mode for the device. You can also control access to devices with cgroups.fileMode
(uint32 类型,可选)- 设备的文件权限模式。你也可以通过 cgroups(控制组)来控制对设备的访问权限。uid
(uint32, OPTIONAL) - id of device owner in the container namespace.uid
(uint32 类型,可选)- 容器命名空间中设备所有者的 ID。gid
(uint32, OPTIONAL) - id of device group in the container namespace.gid
(uint32 类型,可选)- 容器命名空间中设备所属组的 ID。
The same type
, major
and minor
SHOULD NOT be used for multiple devices.
相同的类型、主设备号和次设备号不应用于多个设备。
Containers MAY NOT access any device node that is not either explicitly referenced in the devices
array or listed as being part of the default devices. Rationale: runtimes based on virtual machines need to be able to adjust the node devices, and accessing device nodes that were not adjusted could have undefined behaviour.
容器不得访问任何未在 devices
数组中明确引用或未列为默认设备一部分的设备节点。理由:基于虚拟机的运行时需要能够调整节点设备,而访问未调整的设备节点可能会导致未定义的行为。
9.5.1 Example(示例)
1 | "devices": [ |
9.5.2 Default Devices(默认外设)
In addition to any devices configured with this setting, the runtime MUST also supply:
除通过此设置配置的所有设备外,运行时还必须提供:
/dev/null
/dev/zero
/dev/full
/dev/random
/dev/urandom
/dev/tty
/dev/console
is set up ifterminal
is enabled in the config by bind mounting the pseudoterminal pty to/dev/console
.(若配置中启用了终端,则通过将伪终端(pty)绑定挂载到 /dev/console 来设置 /dev/console。)/dev/ptmx
. A bind-mount or symlink of the container’s/dev/pts/ptmx
.(容器内 /dev/pts/ptmx 的绑定挂载或符号链接。)
9.6 Control groups (控制组)
Also known as cgroups, they are used to restrict resource usage for a container and handle device access. cgroups provide controls (through controllers) to restrict cpu, memory, IO, pids, network and RDMA resources for the container. For more information, see the kernel cgroups documentation.
众所周知 的cgroups被称为控制组,用于限制容器的资源使用并管理设备访问。cgroups 通过控制器提供控制能力,以限制容器的 CPU、内存、IO、进程 ID(pids)、网络和 RDMA 资源。有关更多信息,请参阅内核 cgroups 文档。
A runtime MAY, during a particular container operation, such as create, start, or exec, check if the container cgroup is fit for purpose, and MUST generate an error if such a check fails. For example, a frozen cgroup or (for create operation) a non-empty cgroup. The reason for this is that accepting such configurations could cause container operation outcomes that users may not anticipate or understand, such as operation on one container inadvertently affecting other containers.
在特定的容器操作(如创建、启动或执行)过程中,运行时可以检查容器控制组(cgroup)是否适用,如果检查失败,运行时必须生成错误。例如,对于冻结的控制组,或者(针对创建操作而言)非空的控制组,就会出现这种情况。这样做的原因是,接受此类配置可能会导致容器操作结果超出用户的预期或理解范围,比如某个容器上的操作可能会无意中影响到其他容器。
9.6.1 Cgroups Path (Cgroup路径)
cgroupsPath
(string, OPTIONAL) path to the cgroups. It can be used to either control the cgroups hierarchy for containers or to run a new process in an existing container.
cgroupsPath
(字符串类型,可选)指的是控制组(cgroups)的路径。它可用于控制容器的控制组层级结构,或者在现有容器中运行新进程。
The value of cgroupsPath
MUST be either an absolute path or a relative path.
cgroupsPath 的值必须是绝对路径或相对路径。
In the case of an absolute path (starting with
/
), the runtime MUST take the path to be relative to the cgroups mount point.对于绝对路径(以 / 开头),运行时必须将该路径视为相对于控制组(cgroups)挂载点的路径。
In the case of a relative path (not starting with
/
), the runtime MAY interpret the path relative to a runtime-determined location in the cgroups hierarchy.对于相对路径(不以 / 开头),运行时可以将该路径解释为相对于控制组(cgroups)层级结构中由运行时确定的位置的路径。
If the value is specified, the runtime MUST consistently attach to the same place in the cgroups hierarchy given the same value of cgroupsPath
. If the value is not specified, the runtime MAY define the default cgroups path. Runtimes MAY consider certain cgroupsPath
values to be invalid, and MUST generate an error if this is the case.
如果指定了该值,那么对于相同的 cgroupsPath 值,运行时必须始终附加到控制组(cgroups)层级结构中的同一位置。如果未指定该值,运行时可以定义默认的控制组路径。运行时可能会将某些 cgroupsPath 值视为无效,若出现这种情况,必须生成错误。
Implementations of the Spec can choose to name cgroups in any manner. The Spec does not include naming schema for cgroups. The Spec does not support per-controller paths for the reasons discussed in the cgroupv2 documentation. The cgroups will be created if they don’t exist.
本规范的实现可以选择以任何方式为控制组(cgroups)命名。本规范不包含控制组的命名方案。出于 cgroupv2 文档中讨论的原因,本规范不支持针对单个控制器的路径。如果控制组不存在,将会被创建。
You can configure a container’s cgroups via the resources
field of the Linux configuration. Do not specify resources
unless limits have to be updated. For example, to run a new process in an existing container without updating limits, resources
need not be specified.
你可以通过 Linux 配置中的 resources
字段来配置容器的控制组(cgroups)。除非必须更新限制,否则不要指定资源配置。例如,若要在现有容器中运行新进程且不更新限制,则无需指定资源配置。
Runtimes MAY attach the container process to additional cgroup controllers beyond those necessary to fulfill the resources
settings.
运行时可以将容器进程附加到除满足资源设置所需的控制器之外的其他控制组(cgroup)控制器上。
9.6.2 Cgroup ownership (Cgroup 所有权)
Runtimes MAY, according to the following rules, change (or cause to be changed) the owner of the container’s cgroup to the host uid that maps to the value of process.user.uid
in the container namespace; that is, the user that will execute the container process.
运行时可以根据以下规则,将容器控制组(cgroup)的所有者更改为(或促使其更改为)与容器命名空间中 process.user.uid
值相对应的主机用户 ID(uid);也就是说,更改为将执行容器进程的用户的 ID。
Runtimes SHOULD NOT change the ownership of container cgroups when cgroups v1 is in use. Cgroup delegation is not secure in cgroups v1.
当使用 cgroups v1 时,运行时不应更改容器控制组(cgroup)的所有权。在 cgroups v1 中,控制组的委托机制并不安全。
A runtime SHOULD NOT change the ownership of a container cgroup unless it will also create a new cgroup namespace for the container. Typically this occurs when the linux.namespaces
array contains an object with type
equal to "cgroup"
and path
unset.
运行时不应更改容器控制组(cgroup)的所有权,除非它同时为容器创建一个新的控制组命名空间。通常,这种情况发生在 linux.namespaces
数组包含一个类型为 “cgroup” 且未设置路径的对象时。
Runtimes SHOULD change the cgroup ownership if and only if the cgroup filesystem is to be mounted read/write; that is, when the configuration’s mounts
array contains an object where:
运行时应当更改控制组(cgroup)的所有权,当且仅当控制组文件系统将以读写模式挂载时;也就是说,当配置的 mounts
数组包含满足以下条件的对象时:
The
source
field is equal to"cgroup"
source
字段的值等于 “cgroup” 。The
destination
field is equal to"/sys/fs/cgroup"
destination
字段的值等于 “/sys/fs/cgroup” 。The
options
field does not contain the value"ro"
options
字段不包含值 “ro” 。
If the configuration does not specify such a mount, the runtime SHOULD NOT change the cgroup ownership.
如果配置中未指定此类挂载,运行时则不应更改控制组(cgroup)的所有权。
A runtime that changes the cgroup ownership SHOULD only change the ownership of the container’s cgroup directory and files within that directory that are listed in /sys/kernel/cgroup/delegate
. See cgroups(7)
for details about this file. Note that not all files listed in /sys/kernel/cgroup/delegate
necessarily exist in every cgroup. Runtimes MUST NOT fail in this scenario, and SHOULD change the ownership of the listed files that do exist in the cgroup.
*更改控制组(cgroup)所有权的运行时,仅应更改容器控制组目录的所有权,以及该目录下在 /sys/kernel/cgroup/delegate
中列出的文件的所有权。有关此文件的详细信息,请参阅 cgroups(7)
手册。请注意,并非 /sys/kernel/cgroup/delegate
中列出的所有文件都必然存在于每个控制组中。在这种情况下,运行时不得因此失败,且应更改控制组中实际存在的列出文件的所有权。*
If the /sys/kernel/cgroup/delegate
file does not exist, the runtime MUST fall back to using the following list of files:
如果 /sys/kernel/cgroup/delegate
文件不存在,运行时必须回退使用以下文件列表:
1 | cgroup.procs |
The runtime SHOULD NOT change the ownership of any other files. Changing other files may allow the container to elevate its own resource limits or perform other unwanted behaviour.
运行时不应更改任何其他文件的所有权。更改其他文件可能会使容器得以提升自身的资源限制或执行其他不必要的操作。
9.6.2.1 Example(示例)
1 | "cgroupsPath": "/myRuntime/myContainer", |
9.6.3 Allowed Device list (允许的设备列表)
devices
(array of objects, OPTIONAL) configures the allowed device list. The runtime MUST apply entries in the listed order.
devices
(对象数组,可选)用于配置允许的设备列表。运行时必须按照列表中的顺序应用条目。
Each entry has the following structure:
每一个实体有如下的结构:
allow
(boolean, REQUIRED) - whether the entry is allowed or denied.allow
(布尔值,必填)—— 该条目是允许还是拒绝。type
(string, OPTIONAL) - type of device:a
(all),c
(char), orb
(block). Unset values mean “all”, mapping toa
.type
(字符串,可选)—— 设备类型:a
(所有设备)、c
(字符设备)或b
(块设备)。未设置值时表示 “所有设备”,对应a
。major, minor
(int64, OPTIONAL) - major, minor numbers for the device. Unset values mean “all”, mapping to*
in the filesystem API.major
、minor
(int64 类型,可选)—— 设备的主设备号、次设备号。未设置值时表示 “所有设备”,对应文件系统 API 中的*
。access
(string, OPTIONAL) - cgroup permissions for device. A composition ofr
(read),w
(write), andm
(mknod).access
(字符串类型,可选)—— 设备的控制组(cgroup)权限。由r
(读取)、w
(写入)和m
(创建设备文件)组合而成。
9.6.3.1 Example(示例)
1 | "devices": [ |
9.6.4 Memory(内存)
memory
(object, OPTIONAL) represents the cgroup subsystem memory
and it’s used to set limits on the container’s memory usage. For more information, see the kernel cgroups documentation about memory.
memory
(对象类型,可选)代表控制组(cgroup)的内存子系统,用于对容器的内存使用设置限制。有关更多信息,请参阅内核控制组文档中关于内存的部分。
Values for memory specify the limit in bytes, or -1
for unlimited memory.
memory 的值用于指定内存限制(以字节为单位),或设为 -1
表示无内存限制。
limit
(int64, OPTIONAL) - sets limit of memory usagelimit
(int64 类型,可选)—— 设置内存使用的限制reservation
(int64, OPTIONAL) - sets soft limit of memory usagereservation
(int64 类型,可选)—— 设置内存使用的软限制swap
(int64, OPTIONAL) - sets limit of memory+Swap usageswap
(int64 类型,可选)—— 设置内存 + 交换分区(Swap)使用的限制kernel
(int64, OPTIONAL, NOT RECOMMENDED) - sets hard limit for kernel memorykernel
(int64 类型,可选,不推荐)—— 设置内核内存的硬限制kernelTCP
(int64, OPTIONAL, NOT RECOMMENDED) - sets hard limit for kernel TCP buffer memorykernelTCP
(int64 类型,可选,不推荐)—— 设置内核 TCP 缓冲区内存的硬限制
The following properties do not specify memory limits, but are covered by the memory
controller:
以下属性并非用于指定内存限制,但受内存控制器(memory controller)管控:
swappiness
(uint64, OPTIONAL) - sets swappiness parameter of vmscan (See sysctl’s vm.swappiness) The values are from 0 to 100. Higher means more swappy.swappiness
(uint64 类型,可选)—— 设置 vmscan 机制的交换活跃度参数(参见 sysctl 的vm.swappiness
)。取值范围为 0 到 100,数值越高表示交换行为越频繁。disableOOMKiller
(bool, OPTIONAL) - enables or disables the OOM killer. If enabled (false
), tasks that attempt to consume more memory than they are allowed are immediately killed by the OOM killer. The OOM killer is enabled by default in every cgroup using thememory
subsystem. To disable it, specify a value oftrue
.disableOOMKiller
(布尔值,可选)—— 启用或禁用 OOM 后立刻杀掉进程。如果启用(值为false
),当任务尝试消耗超过允许额度的内存时,会立即被 OOM 杀手终止。在所有使用内存子系统的控制组(cgroup)中,OOM 杀手默认处于启用状态。若要禁用它,需将值指定为true
。useHierarchy
(bool, OPTIONAL) - enables or disables hierarchical memory accounting. If enabled (true
), child cgroups will share the memory limits of this cgroup.useHierarchy
(布尔值,可选)—— 启用或禁用层级内存统计。如果启用(值为true
),子控制组(child cgroups)将共享该控制组的内存限制。checkBeforeUpdate
(bool, OPTIONAL) - enables container memory usage check before setting a new limit. If enabled (true
), runtime MAY check if a new memory limit is lower than the current usage, and MUST reject the new limit. Practically, when cgroup v1 is used, the kernel rejects the limit lower than the current usage, and when cgroup v2 is used, an OOM killer is invoked. This setting can be used on cgroup v2 to mimic the cgroup v1 behavior.checkBeforeUpdate
(布尔值,可选)—— 启用在设置新限制前对容器内存使用情况的检查。如果启用(值为true
),运行时可以检查新内存限制是否低于当前使用量,且必须拒绝该新限制。实际场景中,当使用 cgroup v1 时,内核会直接拒绝低于当前使用量的限制;而当使用 cgroup v2 时,系统会调用 OOM killer 。此设置可在 cgroup v2 上使用,以模拟 cgroup v1 的行为。
9.6.4.1 Example(示例)
1 | "memory": { |
9.6.5 CPU
cpu
(object, OPTIONAL) represents the cgroup subsystems cpu
and cpusets
. For more information, see the kernel cgroups documentation about cpusets.
cpu
(对象类型,可选)代表控制组(cgroup)的 cpu
和 cpusets
子系统。有关更多信息,请参阅内核控制组文档中关于 cpusets
的部分。
The following parameters can be specified to set up the controller:
可通过指定以下参数来配置该控制器:
shares
(uint64, OPTIONAL) - specifies a relative share of CPU time available to the tasks in a cgroupshares
(uint64 类型,可选)—— 指定控制组(cgroup)中任务可使用的 CPU 时间相对份额。quota
(int64, OPTIONAL) - specifies the total amount of time in microseconds for which all tasks in a cgroup can run during one period (as defined byperiod
below) If specified with any (valid) positive value, it MUST be no smaller thanburst
(runtimes MAY generate an error).quota
(int64 类型,可选)—— 指定控制组(cgroup)中所有任务在一个周期内(如下面定义的period
)可运行的总时间(以微秒为单位)。若指定为任何(有效的)正值,则其值必须不小于burst
(运行时可以生成错误)。burst
(uint64, OPTIONAL) - specifies the maximum amount of accumulated time in microseconds for which all tasks in a cgroup can run additionally for burst during one period (as defined byperiod
below) If specified, this value MUST be no larger than any positivequota
(runtimes MAY generate an error).burst
(uint64 类型,可选)—— 指定控制组(cgroup)中所有任务在一个周期内(如下面定义的period
)可额外累积使用的最大突发运行时间(以微秒为单位)。若指定此值,则其必须不大于任何正值的quota
(运行时可以生成错误)。period
(uint64, OPTIONAL) - specifies a period of time in microseconds for how regularly a cgroup’s access to CPU resources should be reallocated (CFS scheduler only)period
(uint64 类型,可选)—— 指定控制组(cgroup)CPU 资源访问权限的重新分配周期(以微秒为单位),仅适用于 CFS 调度器。realtimeRuntime
(int64, OPTIONAL) - specifies a period of time in microseconds for the longest continuous period in which the tasks in a cgroup have access to CPU resourcesrealtimeRuntime
(int64 类型,可选)—— 指定控制组(cgroup)中的任务可连续访问 CPU 资源的最长时间(以微秒为单位)。realtimePeriod
(uint64, OPTIONAL) - same asperiod
but applies to realtime scheduler onlyrealtimePeriod
(uint64 类型,可选)—— 作用与period
相同,但仅适用于实时调度器。cpus
(string, OPTIONAL) - list of CPUs the container will run on. This is a comma-separated list, with dashes to represent ranges. For example,0-3,7
represents CPUs 0,1,2,3, and 7.cpus
(字符串类型,可选)—— 容器将运行的 CPU 列表。这是一个用逗号分隔的列表,使用短横线表示范围。例如,0-3,7
表示 CPU 0、1、2、3 和 7。mems
(string, OPTIONAL) - list of memory nodes the container will run on. This is a comma-separated list, with dashes to represent ranges. For example,0-3,7
represents memory nodes 0,1,2,3, and 7.mems
(字符串类型,可选)—— 容器将运行的内存节点列表。这是一个用逗号分隔的列表,使用短横线表示范围。例如,0-3,7
表示内存节点 0、1、2、3 和 7。idle
(int64, OPTIONAL) - cgroups are configured with minimum weight, 0: default behavior, 1: SCHED_IDLE.idle
(int64 类型,可选)—— 控制组(cgroups)以最低权重配置,其中 0 表示默认行为,1 表示启用 SCHED_IDLE 调度策略。
9.6.5.1 Example(示例)
1 | "cpu": { |
9.6.6 Block IO (块设备 IO)
blockIO
(object, OPTIONAL) represents the cgroup subsystem blkio
which implements the block IO controller. For more information, see the kernel cgroups documentation about blkio of cgroup v1 or io of cgroup v2, .
blockIO
(对象类型,可选)代表控制组(cgroup)的 blkio
子系统,该子系统实现了块设备 IO 控制器。有关更多信息,请参阅内核控制组文档中关于 cgroup v1 的 blkio
或 cgroup v2 的 io
部分。
Note that I/O throttling settings in cgroup v1 apply only to Direct I/O due to kernel implementation constraints, while this limitation does not exist in cgroup v2.
需要注意的是,由于内核实现的限制,cgroup v1 中的 I/O 限流设置仅适用于直接 I/O(Direct I/O),而这一限制在 cgroup v2 中并不存在。
The following parameters can be specified to set up the controller:
可通过指定以下参数来配置该控制器:
weight
(uint16, OPTIONAL) - specifies per-cgroup weight. This is default weight of the group on all devices until and unless overridden by per-device rules.*weight
(uint16 类型,可选)—— 指定每个控制组(cgroup)的权重。这是该组在所有设备上的默认权重,除非被针对特定设备的规则覆盖。*leafWeight
(uint16, OPTIONAL) - equivalents ofweight
for the purpose of deciding how much weight tasks in the given cgroup has while competing with the cgroup’s child cgroups.leafWeight
(uint16 类型,可选)—— 作用等同于权重(weight),用于确定在给定控制组(cgroup)的任务与该控制组的子控制组竞争资源时,该控制组任务所拥有的权重大小。weightDevice
(array of objects, OPTIONAL) - an array of per-device bandwidth weights. Each entry has the following structure:weightDevice
(对象数组类型,可选)—— 按设备划分的带宽权重数组。每个条目具有以下结构:major, minor
(int64, REQUIRED) - major, minor numbers for device. For more information, see the mknod(1) man page.major
、minor
(int64 类型,必填)—— 设备的主设备号和次设备号。有关更多信息,请参阅mknod(1)
手册页。weight
(uint16, OPTIONAL) - bandwidth weight for the device.weight
(uint16 类型,可选)—— 该设备的带宽权重。leafWeight
(uint16, OPTIONAL) - bandwidth weight for the device while competing with the cgroup’s child cgroups, CFQ scheduler onlyleafWeight
(uint16 类型,可选)—— 当与控制组的子控制组竞争时,该设备的带宽权重,仅适用于 CFQ 调度器。
You MUST specify at least one of
weight
orleafWeight
in a given entry, and MAY specify both.在单个条目中,必须至少指定
weight
或leafWeight
中的一项,也可以同时指定两者。throttleReadBpsDevice
,throttleWriteBpsDevice
(array of objects, OPTIONAL) - an array of per-device bandwidth rate limits. Each entry has the following structure:throttleReadBpsDevice
、throttleWriteBpsDevice
(对象数组类型,可选)—— 按设备划分的带宽速率限制数组。每个条目具有以下结构:major, minor
(int64, REQUIRED) - major, minor numbers for device. For more information, see the mknod(1) man page.major
、minor
(int64 类型,必填)—— 设备的主设备号和次设备号。有关更多信息,请参阅mknod(1)
手册页。rate
(uint64, REQUIRED) - bandwidth rate limit in bytes per second for the devicerate
(uint64 类型,必填)—— 该设备的带宽速率限制,单位为字节 / 秒。
throttleReadIOPSDevice
,throttleWriteIOPSDevice
(array of objects, OPTIONAL) - an array of per-device IO rate limits. Each entry has the following structure:throttleReadIOPSDevice
、throttleWriteIOPSDevice
(对象数组类型,可选)—— 按设备划分的 IO 速率限制数组。每个条目具有以下结构:major, minor
(int64, REQUIRED) - major, minor numbers for device. For more information, see the mknod(1) man page.major
、minor
(int64 类型,必填)—— 设备的主设备号和次设备号。有关更多信息,请参阅mknod(1)
手册页。rate
(uint64, REQUIRED) - IO rate limit for the devicerate
(uint64 类型,必填)—— 该设备的 IO 速率限制。
9.6.6.1 Example(示例)
1 | "blockIO": { |
9.6.7 Huge page limits (大页内存限制)
hugepageLimits
(array of objects, OPTIONAL) represents the hugetlb
controller which allows to limit the HugeTLB reservations (if supported) or usage (page fault). By default if supported by the kernel, hugepageLimits
defines the hugepage sizes and limits for HugeTLB controller reservation accounting, which allows to limit the HugeTLB reservations per control group and enforces the controller limit at reservation time and at the fault of HugeTLB memory for which no reservation exists. Otherwise if not supported by the kernel, this should fallback to the page fault accounting, which allows users to limit the HugeTLB usage (page fault) per control group and enforces the limit during page fault.
hugepageLimits
(对象数组类型,可选)代表大页内存(HugeTLB)控制器,该控制器可限制大页内存的预留量(若内核支持)或使用量(基于页面故障)。默认情况下,若内核支持,hugepageLimits
用于定义大页内存的大小以及大页内存控制器预留量统计的限制,这使得可以按控制组限制大页内存预留量,并在预留时以及对未预留的大页内存发生页面故障时强制执行控制器限制。若内核不支持该功能,则会回退到页面故障统计模式,此时用户可按控制组限制大页内存的使用量(基于页面故障),并在页面故障发生时强制执行该限制。
Note that reservation limits are superior to page fault limits, since reservation limits are enforced at reservation time (on mmap or shget), and never causes the application to get SIGBUS signal if the memory was reserved before hand. This allows for easier fallback to alternatives such as non-HugeTLB memory for example. In the case of page fault accounting, it’s very hard to avoid processes getting SIGBUS since the sysadmin needs precisely know the HugeTLB usage of all the tasks in the system and make sure there is enough pages to satisfy all requests. Avoiding tasks getting SIGBUS on overcommited systems is practically impossible with page fault accounting.
需要注意的是,预留限制优于页面故障限制,因为预留限制在内存预留时(如 mmap
或 shget
操作时)即被强制执行,且如果内存已预先预留,绝不会导致应用程序收到 SIGBUS 信号。这使得系统更容易回退到其他内存方案,例如非大页内存(non-HugeTLB memory)。而在页面故障统计模式下,很难避免进程收到 SIGBUS 信号,因为系统管理员需要精确掌握系统中所有任务的大页内存使用情况,并确保有足够的页面来满足所有请求。在内存超分配的系统中,依靠页面故障统计来避免任务收到 SIGBUS 信号实际上是不可能的。
For more information, see the kernel cgroups documentation about HugeTLB.
有关更多信息,请参阅内核控制组(cgroups)文档中关于大页内存(HugeTLB)的部分。
Each entry has the following structure:
每个实体具有以下结构:
pageSize
(string, REQUIRED) - hugepage size. The value has the format<size><unit-prefix>B
(64KB, 2MB, 1GB), and must match the<hugepagesize>
of the corresponding control file found in/sys/fs/cgroup/hugetlb/hugetlb.<hugepagesize>.rsvd.limit_in_bytes
(if hugetlb_cgroup reservation is supported) or/sys/fs/cgroup/hugetlb/hugetlb.<hugepagesize>.limit_in_bytes
(if not supported). Values of<unit-prefix>
are intended to be parsed using base 1024 (“1KB” = 1024, “1MB” = 1048576, etc).pageSize
(字符串类型,必填)—— 大页内存大小。其值的格式为<size><unit-prefix>B
(例如 64KB、2MB、1GB),且必须与/sys/fs/cgroup/hugetlb/hugetlb.<hugepagesize>.rsvd.limit_in_bytes
(若内核支持大页内存控制组预留功能)或/sys/fs/cgroup/hugetlb/hugetlb.<hugepagesize>.li
(若不支持该功能)对应的控制文件中的<hugepagesize>
相匹配。<unit-prefix>
的取值需按 1024 进制解析(例如 “1KB”= 1024 字节,“1MB”= 1048576 字节等)。limit
(uint64, REQUIRED) - limit in bytes of hugepagesize HugeTLB reservations (if supported) or usage.*limit
(uint64 类型,必填)—— 大页内存预留量(若支持预留功能)或使用量的字节限制。*
9.6.7.1 Example(示例)
1 | "hugepageLimits": [ |
9.6.8 Network(网络)
network
(object, OPTIONAL) represents the cgroup subsystems net_cls
and net_prio
. For more information, see the kernel cgroups documentations about net_cls cgroup and net_prio cgroup.
network
(对象类型,可选)代表控制组(cgroup)的 net_cls
和 net_prio
子系统。有关更多信息,请参阅内核控制组文档中关于 net_cls
控制组和 net_prio
控制组的部分。
The following parameters can be specified to set up the controller:
可通过指定以下参数来配置该控制器:
classID
(uint32, OPTIONAL) - is the network class identifier the cgroup’s network packets will be tagged withclassID
(uint32 类型,可选)—— 是该控制组(cgroup)的网络数据包将被标记的网络类别标识符。priorities
(array of objects, OPTIONAL)- specifies a list of objects of the priorities assigned to traffic originating from processes in the group and egressing the system on various interfaces. The following parameters can be specified per-priority:priorities
(对象数组类型,可选)—— 指定一组对象,这些对象定义了分配给源自该控制组内进程、且经各类接口流出系统的流量的优先级。每个优先级可通过以下参数进行配置:name
(string, REQUIRED) - interface name in runtime network namespacename
(字符串类型,必填)—— 运行时网络命名空间的接口名称。priority
(uint32, REQUIRED) - priority applied to the interfacepriority
(uint32 类型,必填)—— 应用于该接口的优先级。
9.6.8.1 Example(示例)
1 | "network": { |
9.6.9 PIDs (进程ID)
pids
(object, OPTIONAL) represents the cgroup subsystem pids
. For more information, see the kernel cgroups documentation about pids.
pids
(对象类型,可选)代表控制组(cgroup)的 pids
子系统。有关更多信息,请参阅内核控制组文档中关于 pids
子系统的部分。
The following parameters can be specified to set up the controller:
可通过指定以下参数来配置该控制器:
limit
(int64, REQUIRED) - specifies the maximum number of tasks in the cgrouplimit
(int64 类型,必填)—— 指定该控制组(cgroup)中任务的最大数量。
9.6.9.1 Example(示例)
1 | "pids": { |
9.6.10 RDMA(远程直接内存访问)
rdma
(object, OPTIONAL) represents the cgroup subsystem rdma
. For more information, see the kernel cgroups documentation about rdma.
rdma
(对象类型,可选)代表控制组(cgroup)的 rdma
子系统。有关更多信息,请参阅内核控制组文档中关于 rdma
子系统的部分。
The name of the device to limit is the entry key. Entry values are objects with the following properties:
需要限制的设备名称为条目的键。条目值为具有以下属性的对象:
hcaHandles
(uint32, OPTIONAL) - specifies the maximum number of hca_handles in the cgrouphcaHandles
(uint32 类型,可选)—— 指定该控制组(cgroup)中 hca_handles 的最大数量。hcaObjects
(uint32, OPTIONAL) - specifies the maximum number of hca_objects in the cgrouphcaObjects
(uint32 类型,可选)—— 指定该控制组(cgroup)中 hca_objects 的最大数量。
You MUST specify at least one of the hcaHandles
or hcaObjects
in a given entry, and MAY specify both.
*在单个条目中,必须至少指定 hcaHandles
或 hcaObjects
中的一项,也可以同时指定两者。*
9.6.10.1 Example(示例)
1 | "rdma": { |
9.6.11 Unified
unified
(object, OPTIONAL) allows cgroup v2 parameters to be to be set and modified for the container.
unified
(对象类型,可选)允许为容器设置和修改 cgroup v2 参数。
Each key in the map refers to a file in the cgroup unified hierarchy.
该映射中的每个键均对应 cgroup 统一层级结构中的一个文件。
The OCI runtime MUST ensure that the needed cgroup controllers are enabled for the cgroup.
OCI 运行时必须确保为该控制组(cgroup)启用所需的控制组控制器。
Configuration unknown to the runtime MUST still be written to the relevant file.
运行时未知的配置仍必须写入相关文件。
The runtime MUST generate an error when the configuration refers to a cgroup controller that is not present or that cannot be enabled.
当配置引用不存在或无法启用的控制组(cgroup)控制器时,运行时必须生成错误。
9.6.11.1 Example(示例)
1 | "unified": { |
If a controller is enabled on the cgroup v2 hierarchy but the configuration is provided for the cgroup v1 equivalent controller, the runtime MAY attempt a conversion.
如果某个控制器在 cgroup v2 层级结构中已启用,但配置是为对应的 cgroup v1 控制器提供的,运行时可以尝试进行转换。
If the conversion is not possible the runtime MUST generate an error.
如果转换无法实现,运行时必须生成错误。
9.6.12 IntelRdt( Intel Resource Director Technology)
intelRdt
(object, OPTIONAL) represents the Intel Resource Director Technology. If intelRdt
is set, the runtime MUST write the container process ID to the tasks
file in a proper sub-directory in a mounted resctrl
pseudo-filesystem. That sub-directory name is specified by closID
parameter. If no mounted resctrl
pseudo-filesystem is available in the runtime mount namespace, the runtime MUST generate an error.
intelRdt
(对象类型,可选)代表英特尔资源分配技术(Intel Resource Director Technology)。若设置了 intelRdt
,运行时必须将容器进程 ID 写入已挂载的 resctrl
伪文件系统中相应子目录下的 tasks
文件。该子目录的名称由 closID
参数指定。如果运行时的挂载命名空间中没有可用的已挂载 resctrl
伪文件系统,运行时必须生成错误。
If intelRdt
is not set, the runtime MUST NOT manipulate any resctrl
pseudo-filesystems.
若未设置 intelRdt
,运行时不得对任何 resctrl
伪文件系统进行操作。
The following parameters can be specified for the container:
可为此容器指定以下参数:
closID
(string, OPTIONAL) - specifies the identity for RDT Class of Service (CLOS).closID
(字符串类型,可选)—— 指定 RDT 服务等级(Class of Service,CLOS)的标识。l3CacheSchema
(string, OPTIONAL) - specifies the schema for L3 cache id and capacity bitmask (CBM). The value SHOULD start withL3:
and SHOULD NOT contain newlines.l3CacheSchema
(字符串类型,可选)—— 指定 L3 缓存 ID 和容量位掩码(CBM)的模式。该值应当以L3:
开头,且不应包含换行符。memBwSchema
(string, OPTIONAL) - specifies the schema of memory bandwidth per L3 cache id. The value MUST start withMB:
and MUST NOT contain newlines.memBwSchema
(字符串类型,可选)—— 指定每个 L3 缓存 ID 的内存带宽模式。该值必须以MB:
开头,且必须不包含换行符。
The following rules on parameters MUST be applied:
以下参数规则必须被遵守:
If both
l3CacheSchema
andmemBwSchema
are set, runtimes MUST write the combined value to theschemata
file in that sub-directory discussed inclosID
.若同时设置了
l3CacheSchema
和memBwSchema
,运行时必须将组合后的值写入closID
中所述子目录下的schemata
文件。If
l3CacheSchema
contains a line beginning withMB:
, the value written toschemata
file MUST be the non-MB:
line(s) froml3CacheSchema
and the line frommemBWSchema
.若
l3CacheSchema
包含以MB:
开头的行,则写入schemata
文件的值必须为l3CacheSchema
中所有非MB:
开头的行,以及memBwSchema
中的行。If either
l3CacheSchema
ormemBwSchema
is set, runtimes MUST write the value to theschemata
file in the that sub-directory discussed inclosID
.若
l3CacheSchema
或memBwSchema
中任一参数被设置,运行时必须将对应的值写入closID
中所述子目录下的schemata
文件。If neither
l3CacheSchema
normemBwSchema
is set, runtimes MUST NOT write toschemata
files in anyresctrl
pseudo-filesystems.若
l3CacheSchema
和memBwSchema
均未设置,运行时不得向任何resctrl
伪文件系统中的schemata
文件写入内容。If
closID
is not set, runtimes MUST use the container ID fromstart
and create the<container-id>
directory.若未设置
closID
,运行时必须使用启动时的容器 ID 并创建<container-id>
目录。If
closID
is set,l3CacheSchema
and/ormemBwSchema
is set若已设置
closID
,则l3CacheSchema
和 / 或memBwSchema
需要被设置if
closID
directory in a mountedresctrl
pseudo-filesystem doesn’t exist, the runtimes MUST create it.若已挂载的
resctrl
伪文件系统中不存在closID
目录,运行时必须创建该目录。if
closID
directory in a mountedresctrl
pseudo-filesystem exists, runtimes MUST comparel3CacheSchema
and/ormemBwSchema
value withschemata
file, and generate an error if doesn’t match.若已挂载的
resctrl
伪文件系统中存在closID
目录,运行时必须将l3CacheSchema
和 / 或memBwSchema
的值与schemata
文件进行比对,若不匹配则生成错误。
If
closID
is set, and neither ofl3CacheSchema
andmemBwSchema
are set, runtime MUST check if corresponding pre-configured directoryclosID
is present in mountedresctrl
. If such pre-configured directoryclosID
exists, runtime MUST assign container to thisclosID
and generate an error if directory does not exist.若已设置
closID
,且l3CacheSchema
和memBwSchema
均未设置,运行时必须检查已挂载的resctrl
伪文件系统中是否存在对应的预配置目录closID
。如果该预配置目录closID
存在,运行时必须将容器分配至该closID
;若该目录不存在,则运行时必须生成错误。enableCMT
(boolean, OPTIONAL) - specifies if Intel RDT CMT should be enabled:enableCMT
(布尔类型,可选)—— 指定是否应启用英特尔资源分配技术(Intel RDT)的缓存监控技术(CMT)。CMT (Cache Monitoring Technology) supports monitoring of the last-level cache (LLC) occupancy for the container.
CMT(缓存监控技术,Cache Monitoring Technology)支持对容器的末级缓存(LLC,Last-Level Cache)占用情况进行监控。
enableMBM
(boolean, OPTIONAL) - specifies if Intel RDT MBM should be enabled:enableMBM
(布尔类型,可选)—— 指定是否应启用英特尔资源分配技术(Intel RDT)的内存带宽监控(MBM)。MBM (Memory Bandwidth Monitoring) supports monitoring of total and local memory bandwidth for the container.
MBM(内存带宽监控,Memory Bandwidth Monitoring)支持对容器的总内存带宽和本地内存带宽进行监控。
9.6.12.1 Example(示例)
Consider a two-socket machine with two L3 caches where the default CBM is 0x7ff and the max CBM length is 11 bits, and minimum memory bandwidth of 10% with a memory bandwidth granularity of 10%.
假设有一台双插槽服务器,配备两个 L3 缓存,其默认容量位掩码(CBM)为 0x7ff,最大 CBM 长度为 11 位,且内存带宽的最小值为 10%,内存带宽粒度为 10%。
Tasks inside the container only have access to the “upper” 7/11 of L3 cache on socket 0 and the “lower” 5/11 L3 cache on socket 1, and may use a maximum memory bandwidth of 20% on socket 0 and 70% on socket 1.
容器内的任务仅能访问插槽 0 上 L3 缓存的 “高 7/11 部分” 和插槽 1 上 L3 缓存的 “低 5/11 部分”,且在插槽 0 上的最大内存带宽使用限制为 20%,在插槽 1 上的最大内存带宽使用限制为 70%。
1 | "linux": { |
9.7 Sysctl
sysctl
(object, OPTIONAL) allows kernel parameters to be modified at runtime for the container. For more information, see the sysctl(8) man page.
sysctl
(对象类型,可选)允许在运行时为容器修改内核参数。有关更多信息,请参阅 sysctl(8)
手册页。
9.7.1 Example (示例)
1 | "sysctl": { |
9.8 Seccomp
Seccomp provides application sandboxing mechanism in the Linux kernel. Seccomp configuration allows one to configure actions to take for matched syscalls and furthermore also allows matching on values passed as arguments to syscalls. For more information about Seccomp, see Seccomp kernel documentation. The actions, architectures, and operators are strings that match the definitions in seccomp.h from libseccomp and are translated to corresponding values.
Seccomp 提供了 Linux 内核中的应用程序沙箱机制。Seccomp 配置允许用户为匹配到的系统调用配置相应动作,此外还支持根据传递给系统调用的参数值进行匹配。有关 Seccomp 的更多信息,请参阅 Seccomp 内核文档。其中的动作、架构和运算符均为字符串,与 libseccomp 中 seccomp.h
里的定义一致,并会被转换为对应的数值。
seccomp
(object, OPTIONAL)(对象型,可选)
The following parameters can be specified to set up seccomp:
可指定以下参数来配置 seccomp:
defaultAction
(string, REQUIRED) - the default action for seccomp. Allowed values are the same assyscalls[].action
.defaultAction
(字符串类型,必填)—— seccomp 的默认动作。允许的值与syscalls[]
中的action
相同。defaultErrnoRet
(uint, OPTIONAL) - the errno return code to use. Some actions likeSCMP_ACT_ERRNO
andSCMP_ACT_TRACE
allow to specify the errno code to return. When the action doesn’t support an errno, the runtime MUST print and error and fail. The default isEPERM
.defaultErrnoRet
(无符号整数类型,可选)—— 要使用的错误号(errno)返回码。某些动作(如SCMP_ACT_ERRNO
和SCMP_ACT_TRACE
)允许指定返回的错误号。当动作不支持错误号时,运行时必须打印错误信息并终止运行。默认值为EPERM
(操作不允许)。architectures
(array of strings, OPTIONAL) - the architecture used for system calls. A valid list of constants as of libseccomp v2.6.0 is shown below.architectures
(字符串数组类型,可选)—— 用于系统调用的架构。截至 libseccomp v2.6.0 版本,有效的常量列表如下所示。SCMP_ARCH_X86
SCMP_ARCH_X86_64
SCMP_ARCH_X32
SCMP_ARCH_ARM
SCMP_ARCH_AARCH64
SCMP_ARCH_MIPS
SCMP_ARCH_MIPS64
SCMP_ARCH_MIPS64N32
SCMP_ARCH_MIPSEL
SCMP_ARCH_MIPSEL64
SCMP_ARCH_MIPSEL64N32
SCMP_ARCH_PPC
SCMP_ARCH_PPC64
SCMP_ARCH_PPC64LE
SCMP_ARCH_S390
SCMP_ARCH_S390X
SCMP_ARCH_PARISC
SCMP_ARCH_PARISC64
SCMP_ARCH_RISCV64
SCMP_ARCH_LOONGARCH64
SCMP_ARCH_M68K
SCMP_ARCH_SH
SCMP_ARCH_SHEB
flags
(array of strings, OPTIONAL) - list of flags to use with seccomp(2).flags
(字符串数组类型,可选)—— 用于seccomp(2)
的标志列表。A valid list of constants is shown below.
有效的常量列表如下所示。
SECCOMP_FILTER_FLAG_TSYNC
SECCOMP_FILTER_FLAG_LOG
SECCOMP_FILTER_FLAG_SPEC_ALLOW
SECCOMP_FILTER_FLAG_WAIT_KILLABLE_RECV
listenerPath
(string, OPTIONAL) - specifies the path of UNIX domain socket over which the runtime will send the container process state data structure when theSCMP_ACT_NOTIFY
action is used. This socket MUST useAF_UNIX
domain andSOCK_STREAM
type. The runtime MUST send exactly one container process state per connection. The connection MUST NOT be reused and it MUST be closed after sending a seccomp state. If sending to this socket fails, the runtime MUST generate an error. If theSCMP_ACT_NOTIFY
action is not used this value is ignored.listenerPath
(字符串类型,可选)—— 指定 UNIX 域套接字的路径,当使用SCMP_ACT_NOTIFY
动作时,运行时将通过该套接字发送容器进程状态数据结构。此套接字必须使用 AF_UNIX 域和 SOCK_STREAM 类型。运行时*必须 为每个连接发送恰好一个容器进程状态。该连接不得 被复用,且在发送 seccomp 状态后必须 关闭。若向此套接字发送数据失败,运行时必须 生成错误。若未使用SCMP_ACT_NOTIFY
动作,则此值将被忽略。*The runtime sends the following file descriptors using
SCM_RIGHTS
and set their names in thefds
array of the container process state:运行时通过 SCM_RIGHTS 发送以下文件描述符,并在容器进程状态的
fds
数组中设置它们的名称:seccompFd
(string, REQUIRED) is the seccomp file descriptor returned by the seccomp syscall.seccompFd(字符串类型,必填项)是由 seccomp 系统调用返回的 seccomp 文件描述符。
listenerMetadata
(string, OPTIONAL) - specifies an opaque data to pass to the seccomp agent. This string will be sent as themetadata
field in the container process state. This field MUST NOT be set iflistenerPath
is not set.listenerMetadata
(字符串类型,可选)—— 指定要传递给 seccomp 代理的不透明数据。该字符串将作为容器进程状态中的metadata
字段发送。若未设置listenerPath
,则此字段不得设置。syscalls
(array of objects, OPTIONAL) - match a syscall in seccomp. While this property is OPTIONAL, some values ofdefaultAction
are not useful withoutsyscalls
entries. For example, ifdefaultAction
isSCMP_ACT_KILL
andsyscalls
is empty or unset, the kernel will kill the container process on its first syscall. Each entry has the following structure:*syscalls
(对象数组类型,可选)—— 匹配 seccomp 中的系统调用。尽管此属性为可选,但如果没有syscalls
条目,某些defaultAction
的值将失去实际意义。例如,若defaultAction
设为SCMP_ACT_KILL
且syscalls
为空或未设置,内核会在容器进程执行第一个系统调用时将其终止。每个条目具有以下结构:*names
(array of strings, REQUIRED) - the names of the syscalls.names
MUST contain at least one entry.names
(字符串数组类型,必填)—— 系统调用的名称。names
必须包含至少一个条目。action
(string, REQUIRED) - the action for seccomp rules. A valid list of constants as of libseccomp v2.6.0 is shown below.action
(字符串类型,必填)—— seccomp 规则的动作。截至 libseccomp v2.6.0 版本,有效的常量列表如下所示。SCMP_ACT_KILL
SCMP_ACT_KILL_PROCESS
SCMP_ACT_KILL_THREAD
SCMP_ACT_TRAP
SCMP_ACT_ERRNO
SCMP_ACT_TRACE
SCMP_ACT_ALLOW
SCMP_ACT_LOG
SCMP_ACT_NOTIFY
errnoRet
(uint, OPTIONAL) - the errno return code to use. Some actions likeSCMP_ACT_ERRNO
andSCMP_ACT_TRACE
allow to specify the errno code to return. When the action doesn’t support an errno, the runtime MUST print and error and fail. The default isEPERM
.errnoRet
(无符号整数类型,可选)—— 要使用的错误号(errno)返回码。某些动作(如SCMP_ACT_ERRNO
和SCMP_ACT_TRACE
)允许指定返回的错误号。当动作不支持错误号时,运行时必须打印错误信息并终止运行。默认值为EPERM
(操作不允许)。args
(array of objects, OPTIONAL) - the specific syscall in seccomp. Each entry has the following structure:args
(对象数组类型,可选)—— seccomp 中的特定系统调用参数。每个条目具有以下结构:index
(uint, REQUIRED) - the index for syscall arguments in seccomp.index
(无符号整数类型,必填)—— seccomp 中系统调用参数的索引。value
(uint64, REQUIRED) - the value for syscall arguments in seccomp.value
(64 位无符号整数类型,必填)—— seccomp 中系统调用参数的值。valueTwo
(uint64, OPTIONAL) - the value for syscall arguments in seccomp.valueTwo
(64 位无符号整数类型,可选)—— seccomp 中系统调用参数的值。op
(string, REQUIRED) - the operator for syscall arguments in seccomp. A valid list of constants as of libseccomp v2.6.0 is shown below.op
(字符串类型,必填)—— seccomp 中系统调用参数的运算符。截至 libseccomp v2.6.0 版本,有效的常量列表如下所示。SCMP_CMP_NE
SCMP_CMP_LT
SCMP_CMP_LE
SCMP_CMP_EQ
SCMP_CMP_GE
SCMP_CMP_GT
SCMP_CMP_MASKED_EQ
9.8.1 Example(示例)
1 | "seccomp": { |
9.9 The Container Process State (容器进程状态)
The container process state is a data structure passed via a UNIX socket. The container runtime MUST send the container process state over the UNIX socket as regular payload serialized in JSON and file descriptors MUST be sent using SCM_RIGHTS
. The container runtime MAY use several sendmsg(2)
calls to send the aforementioned data. If more than one sendmsg(2)
is used, the file descriptors MUST be sent only in the first call.
容器进程状态是一种通过 UNIX 套接字传递的数据结构。容器运行时必须将容器进程状态作为 JSON 序列化的常规有效载荷通过 UNIX 套接字发送,且文件描述符必须使用 SCM_RIGHTS 发送。容器运行时可以使用多个 sendmsg(2)
调用发送上述数据。若使用不止一个 sendmsg(2)
调用,文件描述符必须仅在第一个调用中发送。
The container process state includes the following properties:
容器进程状态包含以下属性:
ociVersion
(string, REQUIRED) is version of the Open Container Initiative Runtime Specification with which the container process state complies.ociVersion
(字符串类型,必填)是容器进程状态所遵循的开放容器倡议(Open Container Initiative)运行时规范版本。fds
(array, OPTIONAL) is a string array containing the names of the file descriptors passed. The index of the name in this array corresponds to index of the file descriptors in theSCM_RIGHTS
array.fds
(数组类型,可选)是一个字符串数组,包含所传递文件描述符的名称。该数组中名称的索引与 SCM_RIGHTS 数组中文件描述符的索引相对应。pid
(int, REQUIRED) is the container process ID, as seen by the runtime.pid
(整数类型,必填)是容器进程 ID,即运行时所看到的进程 ID。metadata
(string, OPTIONAL) opaque metadata.metadata
(字符串类型,可选)是不透明的元数据。state
(state, REQUIRED) is the state of the container.state
(状态类型,必填)是容器的状态。
Example sending a single seccompFd
file descriptor in the SCM_RIGHTS
array:
示例:在 SCM_RIGHTS 数组中发送单个 seccompFd 文件描述符:
1 | { |
9.10 Rootfs Mount Propagation(根文件系统挂载传播)
rootfsPropagation
(string, OPTIONAL) sets the rootfs’s mount propagation. Its value is either shared
, slave
, private
or unbindable
. It’s worth noting that a peer group is defined as a group of VFS mounts that propagate events to each other. A nested container is defined as a container launched inside an existing container.
rootfsPropagation
(字符串类型,可选)用于设置根文件系统(rootfs)的挂载传播属性。其取值可以是 shared
(共享)、slave
(从属)、private
(私有)或 unbindable
(不可绑定)。值得注意的是,对等组(peer group) 指的是一组会相互传播事件的虚拟文件系统(VFS)挂载集合。嵌套容器(nested container) 指的是在现有容器内部启动的容器。
shared
: the rootfs mount belongs to a new peer group. This means that further mounts (e.g. nested containers) will also belong to that peer group and will propagate events to the rootfs. Note this does not mean that it’s shared with the host.shared
(共享):根文件系统(rootfs)挂载属于一个新的对等组。这意味着后续的挂载(例如嵌套容器的挂载)也将属于该对等组,并且会向根文件系统传播事件。请注意,这并不意味着它与主机共享。slave
: the rootfs mount receives propagation events from the host (e.g. if something is mounted on the host it will also appear in the container) but not the other way around.slave
(从属):根文件系统(rootfs)挂载会接收来自主机的传播事件(例如,若主机上挂载了某个内容,该内容也会出现在容器中),但不会反向传播(即容器内的挂载事件不会传递到主机)。private
: the rootfs mount doesn’t receive mount propagation events from the host and further mounts in nested containers will be isolated from the host and from the rootfs (even if the nested containerrootfsPropagation
option is shared).private
(私有):根文件系统(rootfs)挂载不会接收来自主机的挂载传播事件,且嵌套容器中的后续挂载将与主机以及根文件系统相互隔离(即使嵌套容器的rootfsPropagation
选项设为shared
也是如此)。unbindable
: the rootfs mount is a private mount that cannot be bind-mounted.unbindable
(不可绑定):根文件系统(rootfs)挂载是一种私有挂载,且不能进行绑定挂载(bindmounted)。
The Shared Subtrees article in the kernel documentation has more information about mount propagation.
内核文档中的《Shared Subtrees》一文包含了更多关于挂载传播的信息。
9.10.1 Example (示例)
1 | "rootfsPropagation": "slave", |
9.11 Masked Paths
maskedPaths
(array of strings, OPTIONAL) will mask over the provided paths inside the container so that they cannot be read. The values MUST be absolute paths in the container namespace.
maskedPaths
(字符串数组类型,可选)将对容器内提供的路径进行屏蔽,使其无法被读取。这些值必须是容器命名空间中的绝对路径。
9.11.1 Example(示例)
1 | "maskedPaths": [ |
9.12 Readonly Paths (只读路径)
readonlyPaths
(array of strings, OPTIONAL) will set the provided paths as readonly inside the container. The values MUST be absolute paths in the container namespace.
readonlyPaths
(字符串数组类型,可选)会将提供的路径在容器内设置为只读。这些值必须是容器命名空间中的绝对路径。
9.12.1 Example (示例)
1 | "readonlyPaths": [ |
9.13 Mount Label (挂载标签)
mountLabel
(string, OPTIONAL) will set the Selinux context for the mounts in the container.
mountLabel
(字符串类型,可选)将为容器内的挂载设置 SELinux 上下文。
9.13.1 Example (示例)
1 | "mountLabel": "system_u:object_r:svirt_sandbox_file_t:s0:c715,c811" |
9.14 Personality
personality
(object, OPTIONAL) sets the Linux execution personality. For more information see the personality syscall documentation. As most of the options are obsolete and rarely used, and some reduce security, the currently supported set is a small subset of the available options.
personality
(对象类型,可选)用于设置 Linux 的执行 personality。更多信息可参考 personality
系统调用的文档。由于大多数选项已过时且极少使用,同时部分选项会降低安全性,因此当前支持的选项仅为可用选项中的一小部分。
domain
(string, REQUIRED) - the execution domain. The valid list of constants is shown below.LINUX32
will set theuname
system call to show a 32 bit CPU type, such asi686
.domain
(字符串类型,必填)—— 执行域。有效的常量列表如下所示。LINUX32
会设置uname
系统调用,使其显示 32 位 CPU 类型(例如 i686)。LINUX
LINUX32
flags
(array of strings, OPTIONAL) - the additional flags to apply. Currently no flag values are supported.flags
(字符串数组类型,可选)—— 要应用的附加标志。目前没有支持的标志值。
10 Solaris Application Container Configuration (Solaris 应用容器配置)
Solaris application containers can be configured using the following properties, all of the below properties have mappings to properties specified under zonecfg(1M) man page, except milestone.
Solaris 应用容器可通过以下属性进行配置,除 “milestone”(里程碑)外,以下所有属性均与 zonecfg(1M)
手册页中指定的属性存在映射关系。
10.1 milestone
The SMF(Service Management Facility) FMRI which should go to “online” state before we start the desired process within the container.
在容器内启动目标进程之前,SMF(服务管理工具,Service Management Facility)的 FMRI 对应的服务必须进入 “online”(在线)状态。
milestone
(string, OPTIONAL)
milestone
(字符串类型,可选)
10.1.1 Example (示例)
1 | "milestone": "svc:/milestone/container:default" |
10.2 limitpriv
The maximum set of privileges any process in this container can obtain. The property should consist of a comma-separated privilege set specification as described in priv_str_to_set(3C) man page for the respective release of Solaris.
容器中任何进程所能获取的最大权限集。此属性应包含一个以逗号分隔的权限集说明,具体格式可参考相应 Solaris 版本的 priv_str_to_set(3C)
手册页。
limitpriv
(string, OPTIONAL)
limitpriv
(字符串类型,可选)
10.2.1 Example(示例)
1 | "limitpriv": "default" |
10.3 maxShmMemory
The maximum amount of shared memory allowed for this application container. A scale (K, M, G, T) can be applied to the value for each of these numbers (for example, 1M is one megabyte). Mapped to max-shm-memory
in zonecfg(1M) man page.
此应用容器允许使用的最大共享内存量。可对数值应用单位换算(K、M、G、T)(例如,1M 表示 1 兆字节)。该属性对应 zonecfg(1M)
手册页中的 max-shm-memory
配置项。
maxShmMemory
(string, OPTIONAL)
maxShmMemory
(字符串类型,可选)
10.3.1 Example(示例)
1 | "maxShmMemory": "512m" |
10.4 cappedCPU
Sets a limit on the amount of CPU time that can be used by a container. The unit used translates to the percentage of a single CPU that can be used by all user threads in a container, expressed as a fraction (for example, .75) or a mixed number (whole number and fraction, for example, 1.25). An ncpu value of 1 means 100% of a CPU, a value of 1.25 means 125%, .75 mean 75%, and so forth. When projects within a capped container have their own caps, the minimum value takes precedence. cappedCPU is mapped to capped-cpu
in zonecfg(1M) man page.
设置容器可使用的 CPU 时间限制。所用单位表示容器中所有用户线程可使用单个 CPU 的百分比,以小数形式(例如,.75)或带分数形式(整数加小数,例如,1.25)表示。ncpu
值为 1 表示占用 100% 的 CPU,值为 1.25 表示占用 125%,值为 .75 表示占用 75%,依此类推。当受限容器内的项目自身设有上限时,以较小的数值为准。cappedCPU
对应 zonecfg(1M)
手册页中的 capped-cpu
配置项。
ncpus
(string, OPTIONAL)ncpus
(字符串类型,可选)
10.4.1 Example(示例)
1 | "cappedCPU": { |
10.5 cappedMemory
The physical and swap caps on the memory that can be used by this application container. A scale (K, M, G, T) can be applied to the value for each of these numbers (for example, 1M is one megabyte). cappedMemory is mapped to capped-memory
in zonecfg(1M) man page.
此应用容器可使用的物理内存和交换内存上限。可对数值应用单位换算(K、M、G、T)(例如,1M 表示 1 兆字节)。cappedMemory
对应 zonecfg(1M)
手册页中的 capped-memory
配置项。
physical
(string, OPTIONAL)physical
(字符串类型,可选)swap
(string, OPTIONAL)swap
(字符串类型,可选)
10.5.1 Example(示例)
1 | "cappedMemory": { |
10.6 Network
10.6.1 Automatic Network (anet)
anet is specified as an array that is used to set up networking for Solaris application containers. The anet resource represents the automatic creation of a network resource for an application container. The zones administration daemon, zoneadmd, is the primary process for managing the container’s virtual platform. One of the daemon’s responsibilities is creation and teardown of the networks for the container. For more information on the daemon see the zoneadmd(1M) man page. When such a container is started, a temporary VNIC(Virtual NIC) is automatically created for the container. The VNIC is deleted when the container is torn down. The following properties can be used to set up automatic networks. For additional information on properties, check the zonecfg(1M) man page for the respective release of Solaris.
anet
被指定为一个数组,用于为 Solaris 应用容器配置网络。anet
资源表示为应用容器自动创建网络资源。区域管理守护进程 zoneadmd
是管理容器虚拟平台的主要进程,该守护进程的职责之一是为容器创建和销毁网络。有关该守护进程的更多信息,请参阅 zoneadmd(1M)
手册页。
当此类容器启动时,会自动为其创建一个临时的 VNIC(虚拟网卡);当容器被销毁时,该 VNIC 也会被删除。以下属性可用于配置自动网络。有关属性的更多信息,请查阅相应 Solaris 版本的 zonecfg(1M)
手册页。
linkname
(string, OPTIONAL) Specify a name for the automatically created VNIC datalink.linkname
(字符串类型,可选)指定自动创建的 VNIC 数据链路的名称。lowerLink
(string, OPTIONAL) Specify the link over which the VNIC will be created. Mapped tolower-link
in the zonecfg(1M) man page.lowerLink
(字符串类型,可选)指定用于创建 VNIC 的底层链路。对应zonecfg(1M)
手册页中的lower-link
配置项。allowedAddress
(string, OPTIONAL) The set of IP addresses that the container can use might be constrained by specifying theallowedAddress
property. IfallowedAddress
has not been specified, then they can use any IP address on the associated physical interface for the network resource. Otherwise, whenallowedAddress
is specified, the container cannot use IP addresses that are not in theallowedAddress
list for the physical address. Mapped toallowed-address
in the zonecfg(1M) man page.allowedAddress
(字符串类型,可选)通过指定allowedAddress
属性,可限制容器能够使用的 IP 地址集。如果未指定allowedAddress
,则容器可以使用该网络资源关联物理接口上的任何 IP 地址;反之,当指定了allowedAddress
时,容器不得使用不在该列表中的物理地址对应的 IP 地址。对应zonecfg(1M)
手册页中的allowed-address
配置项。configureAllowedAddress
(string, OPTIONAL) IfconfigureAllowedAddress
is set to true, the addresses specified byallowedAddress
are automatically configured on the interface each time the container starts. When it is set to false, theallowedAddress
will not be configured on container start. Mapped toconfigure-allowed-address
in the zonecfg(1M) man page.configureAllowedAddress
(字符串类型,可选)若将configureAllowedAddress
设置为true
,则每次容器启动时,allowedAddress
所指定的地址会自动配置到接口上;若设置为false
,则容器启动时不会配置allowedAddress
。该属性对应zonecfg(1M)
手册页中的configure-allowed-address
配置项。defrouter
(string, OPTIONAL) The value for the OPTIONAL default router.defrouter
(字符串类型,可选)这是可选默认路由器的值。macAddress
(string, OPTIONAL) Set the VNIC’s MAC addresses based on the specified value or keyword. If not a keyword, it is interpreted as a unicast MAC address. For a list of the supported keywords please refer to the zonecfg(1M) man page of the respective Solaris release. Mapped tomac-address
in the zonecfg(1M) man page.macAddress
(字符串类型,可选)根据指定的值或关键字设置 VNIC 的 MAC 地址。如果该值不是关键字,则会被解析为单播 MAC 地址。有关支持的关键字列表,请参阅相应 Solaris 版本的zonecfg(1M)
手册页。该属性对应zonecfg(1M)
手册页中的mac-address
配置项。linkProtection
(string, OPTIONAL) Enables one or more types of link protection using comma-separated values. See the protection property in dladm(8) for supported values in respective release of Solaris. Mapped tolink-protection
in the zonecfg(1M) man page.linkProtection
(字符串类型,可选)通过逗号分隔的值启用一种或多种链路保护类型。有关相应 Solaris 版本支持的值,请参阅dladm(8)
手册页中的protection
属性说明。该属性对应zonecfg(1M)
手册页中的link-protection
配置项。
10.6.2 Example(示例)
1 | "anet": [ |
11 Features Structure (功能结构)
A runtime MAY provide a JSON structure about its implemented features to runtime callers. This JSON structure is called “Features structure”.
运行时环境可以向运行时调用方提供一个关于其已实现功能的 JSON 结构。该 JSON 结构被称为 “功能结构(Features structure)”。
The Features structure is irrelevant to the actual availability of the features in the host operating system. Hence, the content of the Features structure SHOULD be determined on the compilation time of the runtime, not on the execution time.
功能结构(Features structure)与主机操作系统中功能的实际可用性无关。因此,功能结构的内容应在运行时的编译阶段确定,而非执行阶段。
All properties in the Features structure except ociVersionMin
and ociVersionMax
MAY either be absent or have the null
value. The null
value MUST NOT be confused with an empty value such as 0
, false
, ""
, []
, and {}
.
功能结构(Features structure)中,除 ociVersionMin
和 ociVersionMax
之外的所有属性可以缺省,也可以取值为 null。null 值不得将其与 0、false、””、[] 和 {} 等空值混淆。
11.1 Specification version
ociVersionMin
** (string, REQUIRED) The minimum recognized version of the Open Container Initiative Runtime Specification. The runtime MUST accept this value as theociVersion
property ofconfig.json
.ociVersionMin(字符串类型,必填) Open Container Initiative(开放容器倡议)运行时规范的最低可识别版本。运行时必须接受此值作为 config.json 的 ociVersion 属性。
ociVersionMax
(string, REQUIRED) The maximum recognized version of the Open Container Initiative Runtime Specification. The runtime MUST accept this value as theociVersion
property ofconfig.json
. The value MUST NOT be less than the value of theociVersionMin
property. The Features structure MUST NOT contain properties that are not defined in this version of the Open Container Initiative Runtime Specification.*ociVersionMax(字符串类型,必填) 开放容器倡议(Open Container Initiative)运行时规范的最高可识别版本。运行时必须接受此值作为 config.json 的 ociVersion 属性。该值不得小于 ociVersionMin 属性的值。功能结构(Features structure)不得包含本版本开放容器倡议运行时规范中未定义的属性。*
11.1.1 Example (示例)
1 | { |
11.2 Hooks (钩子)
hooks
(array of strings, OPTIONAL) The recognized names of the hooks. The runtime MUST support the elements in this array as thehooks
property ofconfig.json
.hooks(字符串数组类型,可选) 已识别的钩子名称。运行时必须支持此数组中的元素作为 config.json 的 hooks 属性。
11.2.1 Example(示例)
1 | "hooks": [ |
11.3 Mount Options (挂载选项)
mountOptions
(array of strings, OPTIONAL) The recognized names of the mount options, including options that might not be supported by the host operating system. The runtime MUST recognize the elements in this array as theoptions
ofmounts
objects inconfig.json
.
mountOptions
(字符串数组类型,可选) 已识别的挂载选项名称,包括主机操作系统可能不支持的选项。运行时必须将此数组中的元素识别为 config.json 中挂载对象(mounts objects)的选项(options)。Linux: this array SHOULD NOT contain filesystem-specific mount options that are passed to the mount(2) syscall as
const void *data
.*Linux:此数组不应包含传递给 mount (2) 系统调用(作为 const void data 参数)的文件系统特定挂载选项。
11.3.1 Example(示例)
1 | "mountOptions": [ |
11.4 Platform-specific features
linux
(object, OPTIONAL) Linux-specific features. This MAY be set if the runtime supportslinux
platform.linux
(对象类型,可选)特定于 Linux 的功能。如果运行时支持 Linux 平台,可设置此属性。
11.5 Annotations (注解)
annotations
(object, OPTIONAL) contains arbitrary metadata of the runtime. This information MAY be structured or unstructured. Annotations MUST be a key-value map that follows the same convention as the Key and Values of the annotations
property of config.json
. However, annotations do not need to contain the possible values of the annotations
property of config.json
. The current version of the spec do not provide a way to enumerate the possible values of the annotations
property of config.json
.
annotations
(对象类型,可选)包含运行时的任意元数据。此类信息可以是结构化的,也可以是非结构化的。注解(annotations
)必须是一个键值映射,且需遵循与 config.json 中 annotations
属性的键(Key)和值(Values)相同的约定。不过,此处的注解无需包含 config.json 中 annotations
属性可能有的值。当前版本的规范未提供枚举 config.json 中 annotations
属性可能值的方法。
11.5.1 Example(示例)
1 | "annotations": { |
11.6 Unsafe annotations in config.json
potentiallyUnsafeConfigAnnotations
(array of strings, OPTIONAL) contains values of annotations
property of config.json
that may potentially change the behavior of the runtime.
potentiallyUnsafeConfigAnnotations
(字符串数组类型,可选)包含 config.json 中 annotations
属性的值,这些值可能会改变运行时的行为。
A value that ends with “.” is interpreted as a prefix of annotations.
以句号 “.” 结尾的值会被解析为注解的前缀。
11.6.1 Example (示例)
1 | "potentiallyUnsafeConfigAnnotations": [ |
The example above matches com.example.foo.bar
, org.systemd.property.ExecStartPre
, etc. The example does not match com.example.foo.bar.baz
.
上面的示例匹配 com.example.foo.bar、org.systemd.property.ExecStartPre 等注解。该示例不匹配 com.example.foo.bar.baz。
11.7 Example(示例)
Here is a full example for reference.
1 | { |
12 Linux Features Structure (Linux功能结构)
This document describes the Linux-specific section of the Features structure.
本小结基于 11 Feature structure 章节继续描述 Linux的功能结构
12.1 Namespaces (命名空间)
namespaces
(array of strings, OPTIONAL) The recognized names of the namespaces, including namespaces that might not be supported by the host operating system. The runtime MUST recognize the elements in this array as thetype
oflinux.namespaces
objects inconfig.json
.namespaces
(字符串数组,可选):指已识别的命名空间名称,其中包括可能不受宿主操作系统支持的命名空间。运行时必须将此数组中的元素识别为 config.json 中 linux.namespaces 对象的类型。
12.1.1 Example(示例)
1 | "namespaces": [ |
12.2 Capabilities (过滤能力集)
capabilities
(array of strings, OPTIONAL) The recognized names of the capabilities, including capabilities that might not be supported by the host operating system. The runtime MUST recognize the elements in this array in theprocess.capabilities
object ofconfig.json
.capabilities
(字符串数组,可选):指已识别的能力名称,其中包括可能不受宿主操作系统支持的能力。运行时必须在 config.json 的 process.capabilities 对象中识别此数组内的元素。
12.2.1 Example (示例)
1 | "capabilities": [ |
12.3 Cgroup
cgroup
(object, OPTIONAL) represents the runtime’s implementation status of cgroup managers. Irrelevant to the cgroup version of the host operating system.
cgroup
(对象类型,可选)表示运行时对 cgroup(控制组)管理器的实现状态。该状态与宿主操作系统的 cgroup 版本无关。
v1
(bool, OPTIONAL) represents whether the runtime supports cgroup v1.v1
(布尔类型,可选)表示运行时是否支持 cgroup v1(控制组版本 1)。v2
(bool, OPTIONAL) represents whether the runtime supports cgroup v2.v2
(布尔类型,可选)表示运行时是否支持 cgroup v2(控制组版本 2)。systemd
(bool, OPTIONAL) represents whether the runtime supports system-wide systemd cgroup manager.systemd
(布尔类型,可选)表示运行时是否支持系统级的 systemd 控制组(cgroup)管理器。systemdUser
(bool, OPTIONAL) represents whether the runtime supports user-scoped systemd cgroup manager.systemdUser
(布尔类型,可选)表示运行时是否支持用户作用域的 systemd 控制组(cgroup)管理器。rdma
(bool, OPTIONAL) represents whether the runtime supports RDMA cgroup controller.rdma
(布尔类型,可选)表示运行时是否支持 RDMA 控制组(cgroup)控制器。
12.3.1 Example(示例)
1 | "cgroup": { |
12.4 Seccomp
seccomp
(object, OPTIONAL) represents the runtime’s implementation status of seccomp. Irrelevant to the kernel version of the host operating system.
seccomp
(对象类型,可选)表示运行时对 seccomp(安全计算模式)的实现状态,与宿主操作系统的内核版本无关。
enabled
(bool, OPTIONAL) represents whether the runtime supports seccomp.enabled
(布尔类型,可选)表示运行时是否支持 seccomp(安全计算模式)。actions
(array of strings, OPTIONAL) The recognized names of the seccomp actions. The runtime MUST recognize the elements in this array in thesyscalls[].action
property of thelinux.seccomp
object inconfig.json
.actions
(字符串数组类型,可选)指已识别的 seccomp(安全计算模式)动作名称。运行时必须在 config.json 中 linux.seccomp 对象的 syscalls [].action 属性里,识别此数组中的元素。operators
(array of strings, OPTIONAL) The recognized names of the seccomp operators. The runtime MUST recognize the elements in this array in thesyscalls[].args[].op
property of thelinux.seccomp
object inconfig.json
.operators
(字符串数组类型,可选)指已识别的 seccomp(安全计算模式)运算符名称。运行时必须在 config.json 中 linux.seccomp 对象的 syscalls [].args [].op 属性里,识别此数组中的元素。archs
(array of strings, OPTIONAL) The recognized names of the seccomp architectures. The runtime MUST recognize the elements in this array in thearchitectures
property of thelinux.seccomp
object inconfig.json
.archs
(字符串数组类型,可选)指已识别的 seccomp(安全计算模式)架构名称。运行时必须识别此(数组中的)元素,该数组中的元素需在 config.json 文件内 linux.seccomp 对象的 architectures 属性中被(运行时)识别。knownFlags
(array of strings, OPTIONAL) The recognized names of the seccomp flags. The runtime MUST recognize the elements in this array in theflags
property of thelinux.seccomp
object inconfig.json
.knownFlags
(字符串数组类型,可选):指已识别的 seccomp(安全计算模式)标志名称。运行时必须在 config.json 文件内 linux.seccomp 对象的 flags 属性中,识别此数组中的元素。supportedFlags
(array of strings, OPTIONAL) The recognized and supported names of the seccomp flags. This list may be a subset ofknownFlags
due to some flags not supported by the current kernel and/or libseccomp. The runtime MUST recognize and support the elements in this array in theflags
property of thelinux.seccomp
object inconfig.json
.supportedFlags
(字符串数组类型,可选):指已识别且支持的 seccomp 标志名称。由于当前内核和 / 或 libseccomp(seccomp 库)可能不支持部分标志,此列表可能是 knownFlags 的子集。运行时必须在 config.json 文件内 linux.seccomp 对象的 flags 属性中,识别并支持此数组中的元素。
12.4.1 Example(示例)
1 | "seccomp": { |
12.5 AppArmor
apparmor
(object, OPTIONAL) represents the runtime’s implementation status of AppArmor. Irrelevant to the availability of AppArmor on the host operating system.
apparmor
(对象类型,可选)表示运行时对 AppArmor(应用程序盔甲)的实现状态,与宿主操作系统上 AppArmor 的可用情况无关。
enabled
(bool, OPTIONAL) represents whether the runtime supports AppArmor.enabled
(布尔类型,可选)表示运行时是否支持 AppArmor(应用程序盔甲)。
12.5.1 Example(示例)
1 | "apparmor": { |
12.6 SELinux
selinux
(object, OPTIONAL) represents the runtime’s implementation status of SELinux. Irrelevant to the availability of SELinux on the host operating system.
selinux
(对象类型,可选)表示运行时对 SELinux(安全增强型 Linux)的实现状态,与宿主操作系统上 SELinux 的可用情况无关。
enabled
(bool, OPTIONAL) represents whether the runtime supports SELinux.enabled
(布尔类型,可选)表示运行时是否支持 SELinux(安全增强型 Linux)。
12.6.1 Example(示例)
1 | "selinux": { |
12.7 Intel RDT
intelRdt
(object, OPTIONAL) represents the runtime’s implementation status of Intel RDT. Irrelevant to the availability of Intel RDT on the host operating system.
intelRdt
(对象类型,可选)表示运行时对 Intel RDT(英特尔资源导向技术)的实现状态,与宿主操作系统上 Intel RDT 的可用情况无关。
enabled
(bool, OPTIONAL) represents whether the runtime supports Intel RDT.enabled
(布尔类型,可选)表示运行时是否支持 Intel RDT(英特尔资源导向技术)。
12.7.1 Example(示例)
1 | "intelRdt": { |
12.8 MountExtensions
mountExtensions
(object, OPTIONAL) represents whether the runtime supports certain mount features, irrespective of the availability of the features on the host operating system.
mountExtensions
(对象类型,可选)表示运行时是否支持特定的挂载特性,与宿主操作系统上这些特性的可用情况无关。
idmap
(object, OPTIONAL) represents whether the runtime supports idmap mounts using the uidMappings and gidMappings properties of the mount.idmap
(对象类型,可选)表示运行时是否支持使用挂载(mount)的 uidMappings(用户 ID 映射)和 gidMappings(组 ID 映射)属性进行 idmap 挂载(ID 映射挂载)。enabled
(bool, OPTIONAL) represents whether the runtime parses and attempts to use theuidMappings
andgidMappings
properties of mounts if provided. Note that it is possible for runtimes to have partial implementations of id-mapped mounts support (such as only allowing mounts which have mappings matching the container’s user namespace, or only allowing the id-mapped bind-mounts). In such cases, runtimes MUST still set this value totrue
, to indicate that the runtime recognises theuidMappings
andgidMappings
properties.enabled
(布尔类型,可选)表示若挂载配置中提供了 uidMappings(用户 ID 映射)和 gidMappings(组 ID 映射)属性,运行时是否会解析并尝试使用这两个属性。需注意,运行时对 ID 映射挂载的支持可能存在部分实现的情况(例如,仅允许映射规则与容器用户命名空间匹配的挂载,或仅支持 ID 映射绑定挂载)。在此类情况下,运行时仍必须将此值设为 true,以表明运行时可识别 uidMappings 和 gidMappings 这两个属性。
12.8.1 Example(示例)
1 | "mountExtensions": { |
13 Glossary (术语表)
13.1 Bundle
A directory structure that is written ahead of time, distributed, and used to seed the runtime for creating a container and launching a process within it.
一种提前编写、可分发的目录结构,用于为运行时提供初始数据(或 “预置数据”),以便创建容器并在容器内启动进程。
13.2 Configuration
The config.json
file in a bundle which defines the intended container and container process.
bundle(容器捆绑包)中的 config.json 文件,用于定义目标容器及容器进程的配置信息。
13.3 Container
An environment for executing processes with configurable isolation and resource limitations. For example, namespaces, resource limits, and mounts are all part of the container environment.
一种用于执行进程的环境,支持可配置的隔离性与资源限制。例如,命名空间(namespaces)、资源限制(resource limits)和挂载(mounts)均属于容器环境的组成部分。
13.4 Container namespace
On Linux,the namespaces in which the configured process executes.
在 Linux 系统中,指已配置进程所运行于其中的命名空间。
13.5 Features Structure
A JSON structure that represents the implemented features of the runtime. Irrelevant to the actual availability of the features in the host operating system.
一种 JSON 结构,用于表示运行时已实现的特性。该结构与宿主操作系统中这些特性的实际可用情况无关。
13.6 JSON
All configuration JSON MUST be encoded in UTF-8. JSON objects MUST NOT include duplicate names. The order of entries in JSON objects is not significant.
所有配置 JSON 必须以 UTF-8 编码。JSON 对象不得包含重复的名称。JSON 对象中条目的顺序无关紧要。
13.7 Runtime
An implementation of this specification. It reads the configuration files from a bundle, uses that information to create a container, launches a process inside the container, and performs other lifecycle actions.
本规范的一种实现。它从一个捆绑包(bundle)中读取配置文件,利用这些信息创建容器、在容器内部启动进程,并执行其他生命周期操作。
13.8 Runtime caller
An external program to execute a runtime, directly or indirectly.
用于直接或间接执行运行时(runtime)的外部程序。
Examples of direct callers include containerd, CRI-O, and Podman. Examples of indirect callers include Docker/Moby and Kubernetes.
直接调用程序的示例包括 containerd、CRI-O 和 Podman。间接调用程序的示例包括 Docker/Moby 和 Kubernetes。
Runtime callers often execute a runtime via runc-compatible command line interface, however, its interaction interface is currently out of the scope of the Open Container Initiative Runtime Specification.
运行时调用程序通常通过与 runc 兼容的命令行界面(CLI)来执行运行时,不过,其交互界面目前不在开放容器倡议运行时规范(Open Container Initiative Runtime Specification,OCI Runtime Specification)的范围内。
13.9 Runtime namespace
On Linux, the namespaces from which new container namespaces are created and from which some configured resources are accessed.
在 Linux 系统中,指用于创建新容器命名空间、且供部分已配置资源访问的命名空间。