Overview

This page contains the list of deprecations and important or breaking changes for Vault 1.13.x compared to 1.12. Please read it carefully.

Changes

Consul dataplane compatibility

If you are using Consul on Kubernetes, please be aware that upgrading to Consul 1.14.0 will impact Consul secrets, storage, and service registration. As of Consul 1.14.0, Consul on Kubernetes uses Consul Dataplane by default instead of client agents. Vault does not currently support Consul Dataplane. Please follow the Consul 1.14.0 upgrade guide to ensure that your Consul on Kubernetes deployment continues to use client agents.

Undo logs

Vault 1.13 introduced changes to add extra resiliency to log shipping with undo logs. These logs can help prevent several Merkle syncs from occurring due to rapid key changes in the primary Merkle tree as the secondary tries to synchronize. For integrated storage users, Vault needs to be upgraded to 1.13 will enable this feature by default. For Consul storage users, Consul also needs to be upgraded to 1.14 to use this feature.

User lockout

As of version 1.13, Vault will stop trying to validate user credentials if the user submits multiple invalid credentials in quick succession. During lockout, Vault ignores requests from the barred user rather than responding with a permission denied error.

User lockout is enabled by default with a lockout threshold of 5 attempt, a lockout duration of 15 minutes, and a counter reset window of 15 minutes.

For more information, refer to the User lockout overview.

Active directory secrets engine deprecation

The Active Directory (AD) secrets engine has been deprecated as of the Vault 1.13 release. We will continue to support the AD secrets engine in maintenance mode for six major Vault releases. Maintenance mode means that we will fix bugs and security issues but will not add new features. For additional information, see the deprecation table and migration guide.

AliCloud auth role parameter

The AliCloud auth plugin will now require the role parameter on login. This has always been documented as a required field but the requirement will now be enforced.

Mounts associated with removed builtin plugins will result in core shutdown on upgrade

As of 1.13.0 Standalone (logical) DB Engines and the AppId Auth Method have been marked with the Removed status. Any attempt to unseal Vault with mounts backed by one of these builtin plugins will result in an immediate shutdown of the Vault core.

NOTE In the event that an external plugin with the same name and type as a deprecated builtin is deregistered, any subsequent unseal will continue to unseal with an unusable auth backend, and a corresponding ERROR log.

$ vault plugin register -sha256=c805cf3b69f704dfcd5176ef1c7599f88adbfd7374e9c76da7f24a32a97abfe1 auth app-id
Success! Registered plugin: app-id
$ vault auth enable -plugin-name=app-id plugin
Success! Enabled app-id auth method at: app-id/
$ vault auth list -detailed | grep "app-id"
app-id/    app-id    auth_app-id_3a8f2e24    system         system     default-service    replicated     false        false                      map[]      n/a                        0018263c-0d64-7a70-fd5c-50e05c5f5dc3    n/a        n/a                      c805cf3b69f704dfcd5176ef1c7599f88adbfd7374e9c76da7f24a32a97abfe1    n/a
$ vault plugin deregister auth app-id
Success! Deregistered plugin (if it was registered): app-id
$ vault plugin list -detailed | grep "app-id"
app-id                               auth        v1.13.0+builtin.vault                                 removed
$ curl --header "X-Vault-Token: $VAULT_TOKEN" --request POST http://127.0.0.2:8200/v1/sys/seal
$ vault operator unseal <key1>
...
$ vault operator unseal <key2>
...
$ vault operator unseal <key3>
...
$ grep "app-id" /path/to/vault.log
[ERROR] core: skipping deprecated auth entry: name=app-id path=app-id/ error="mount entry associated with removed builtin"
[ERROR] core: skipping initialization for nil auth backend: path=app-id/ type=app-id version="v1.13.0+builtin.vault"

The remediation for affected mounts is to downgrade to the previously-used version of Vault environment variable and replace any Removed feature with the preferred alternative feature.

For more information on the phases of deprecation, see the Deprecation Notices FAQ.

Impacted versions

Affects upgrading from any version of Vault to 1.13.x. All other upgrade paths are unaffected.

Application of Sentinel Role Governing Policies (RGPs) via identity groups

As of versions 1.15.0, 1.14.4, and 1.13.8, the Sentinel RGPSs derived from membership in identity groups apply only to entities in the same and child namespaces, relative to the identity group.

Also, the group_policy_application_mode only applies to to ACL policies. Vault Sentinel Role Governing Policies (RGPs) are not affected by group policy application mode.

Known issues

Rotation configuration persistence issue could lose transform tokenization key versions

A rotation performed manually or via automatic time based rotation after restarting or leader change of Vault, where configuration of rotation was changed since the initial configuration of the tokenization transform can result in the loss of intermediate key versions. Tokenized values from these versions would not be decodeable. It is recommended that customers who have enabled automatic rotation disable it, and other customers avoid key rotation until the upcoming fix.

Affected versions

This issue affects Vault Enterprise with ADP versions 1.10.x and higher. A fix will be released in Vault 1.11.9, 1.12.5, and 1.13.1.

PKI OCSP GET requests can return HTTP redirect responses

If a base64 encoded OCSP request contains consecutive '/' characters, the GET request will return a 301 permanent redirect response. If the redirection is followed, the request will not decode as it will not be a properly base64 encoded request.

As a workaround, OCSP POST requests can be used which are unaffected.

Impacted versions

Affects all current versions of 1.12.x and 1.13.x

PKI revocation request forwarding

If a revocation request comes in to a standby or performance secondary node, for a certificate that is present locally, the request will not be correctly forwarded to the active node of this cluster.

As a workaround, submit revocation requests to the active node only.

STS credentials do not return a lease_duration

Vault 1.13.0 introduced a change to the AWS Secrets Engine such that it no longer creates leases for STS credentials due to the fact that they cannot be revoked or renewed. As part of this change, a bug was introduced which causes lease_duration to always return zero. This prevents the Vault Agent from refreshing STS credentials and may introduce undesired behaviour for anything which relies on a non-zero lease_duration.

For applications that can control what value to look for, the ttl value in the response can be used to know when to request STS credentials next.

An additional workaround for users rendering STS credentials via the Vault Agent is to set the static-secret-render-interval for a template using the credentials. Setting this configuration to 15 minutes accommodates the default minimum duration of an STS token and overrides the default render interval of 5 minutes.

Impacted versions

Affects Vault 1.13.0 only.

LDAP pagination issue

There was a regression introduced in 1.13.2 relating to LDAP maximum page sizes, resulting in an error no LDAP groups found in groupDN [...] only policies from locally-defined groups available. The issue occurs when upgrading Vault with an instance that has an existing LDAP Auth configuration.

As a workaround, disable paged searching using the following:

vault write auth/ldap/config max_page_size=-1

Impacted versions

Affects Vault 1.13.2.

PKI Cross-Cluster revocation requests and unified CRL/OCSP

When revoking certificates on a cluster that doesn't own the certificate, writing the revocation request will fail with a message like error persisting cross-cluster revocation request. Similar errors will appear in the log for failure to write unified CRL and unified delta CRL WAL entries.

As a workaround, submit revocation requests to the cluster which issued the certificate, or use BYOC revocation. Use cluster-local OCSP and CRLs until this is resolved.

Impacted versions

Affects Vault 1.13.0 to 1.13.2. Fixed in 1.13.3.

On upgrade, all local revocations will be synchronized between clusters; revocation requests are not persisted when failing to write cross-cluster.

Slow startup time when storing PKI certificates

There was a regression introduced in 1.13.0 where Vault is slow to start because the PKI secret engine performs a list operation on the stored certificates. If a large number of certificates are stored this can cause long start times on active and standby nodes.

There is currently no workaround for this other than limiting the number of certificates stored in Vault via the PKI tidy or using no_store flag for PKI roles.

Impacted versions

Affects Vault 1.13.0+

Token creation with a new entity alias could silently fail

A regression caused token creation requests under specific circumstances to be forwarded from perf standbys (Enterprise only) to the active node incorrectly. They would appear to succeed, however no lease was created. The token would then be revoked on first use causing a 403 error.

This only happened when all of the following conditions were met:

the token is being created against a role
the request specifies an entity alias which has never been used before with the same role (for example for a brand new role or a unique alias)
the request happens to be made to a perf standby rather than the active node

Retrying token creation after the affected token is rejected would work since the entity alias has already been created.

Affected versions

Affects Vault 1.13.0 to 1.13.3. Fixed in 1.13.4.

API calls to update-primary may lead to data loss

Affected versions

All versions of Vault before 1.14.1, 1.13.5, 1.12.9, and 1.11.12.

Issue

The update-primary endpoint temporarily removes all mount entries except for those that are managed automatically by vault (e.g. identity mounts). In certain situations, a race condition between mount table truncation replication repairs may lead to data loss when updating secondary replication clusters.

Situations where the race condition may occur:

When the cluster has local data (e.g., PKI certificates, app role secret IDs) in shared mounts. Calling update-primary on a performance secondary with local data in shared mounts may corrupt the merkle tree on the secondary. The secondary still contains all the previously stored data, but the corruption means that downstream secondaries will not receive the shared data and will interpret the update as a request to delete the information. If the downstream secondary is promoted before the merkle tree is repaired, the newly promoted secondary will not contain the expected local data. The missing data may be unrecoverable if the original secondary is is lost or destroyed.
When the cluster has an Allow paths defined. As of Vault 1.0.3.1, startup, unseal, and calling update-primary all trigger a background job that looks at the current mount data and removes invalid entries based on path filters. When a secondary has Allow path filters, the cleanup code may misfire in the windown of time after update-primary truncats the mount tables but before the mount tables are rewritten by replication. The cleanup code deletes data associated with the missing mount entries but does not modify the merkle tree. Because the merkle tree remains unchanged, replication will not know that the data is missing and needs to be repaired.

Workaround 1: PR secondary with local data in shared mounts

Watch for cleaning key in merkle tree in the TRACE log immediately after an update-primary call on a PR secondary to indicate the merkle tree may be corrupt. Repair the merkle tree by issuing a replication reindex request to the PR secondary.

If TRACE logs are no longer available, we recommend pre-emptively reindexing the PR secondary as a precaution.

Workaround 2: PR secondary with "Allow" path filters

Watch for deleted mistakenly stored mount entry from backend in the INFO log. Reindex the performance secondary to update the merkle tree with the missing data and allow replication to disseminate the changes. You will not be able to recover local data on shared mounts (e.g., PKI certificates).

If INFO logs are no longer available, query the shared mount in question to confirm whether your role and configuration data are present on the primary but missing from the secondary.

PKI storage migration revives deleted issuers

Vault 1.11 introduced Storage v1, a new storage layout that supported multiple issuers within a single mount. Bug fixes in Vault 1.11.6, 1.12.2, and 1.13.0 corrected a write-ordering issue that lead to invalid CA chains. Specifically, incorrectly ordered writes could fail due to load, resulting in the mount being re-migrated next time it was loaded or silently truncating CA chains. This collection of bug fixes introduced Storage v2.

Affected versions

Vault may incorrectly re-migrated legacy issuers created before Vault 1.11 that were migrated to Storage v1 and deleted before upgrading to a Vault version with Storage v2.

The migration fails when Vault finds managed keys associated with the legacy issuers that were removed from the managed key repository prior to the upgrade.

The migration error appears in Vault logs as:

Error during migration of PKI mount: failed to lookup public key from managed key: no managed key found with uuid

Note

Issuers created in Vault 1.11+ and direct upgrades to a Storage v2 layout are not affected.

The Storage v1 upgrade bug was fixed in Vault 1.14.1, 1.13.5, and 1.12.9.

Using 'update_primary_addrs' on a demoted cluster causes Vault to panic

Affected versions

1.13.3, 1.13.4 & 1.14.0

Issue

If the update_primary_addrs parameter is used on a recently demoted cluster, Vault will panic due to no longer having information about the primary cluster.

Workaround

Instead of using update_primary_addrs on the recently demoted cluster, instead provide an activation token.

Transit Encryption with Cloud KMS managed keys causes a panic

Affected versions

1.13.1+ up to 1.13.8 inclusively
1.14.0+ up to 1.14.4 inclusively
1.15.0

Issue

Vault panics when it receives a Transit encryption API call that is backed by a Cloud KMS managed key (Azure, GCP, AWS).

Note

The issue does not affect encryption and decryption with the following key types:

PKCS#11 managed keys
Transit native keys

Workaround

None at this time

Internal error when vault policy in namespace does not exist

If a user is a member of a group that gets a policy from a namespace other than the one they’re trying to log into, and that policy doesn’t exist, Vault returns an internal error. This impacts all auth methods.

Affected versions

1.13.8 and 1.13.9
1.14.4 and 1.14.5
1.15.0 and 1.15.1

A fix has been released in Vault 1.13.10, 1.14.6, and 1.15.2.

Workaround

During authentication, Vault derives inherited policies based on the groups an entity belongs to. Vault returns an internal error when attaching the derived policy to a token when:

the token belongs to a different namespace than the one handling authentication, and
the derived policy does not exist under the namespace.

You can resolve the error by adding the policy to the relevant namespace or deleting the group policy mapping that uses the derived policy.

As an example, consider the following userpass auth method failure. The error is due to the fact that Vault expects a group policy under the namespace that does not exist.

# Failed login
$ vault login -method=userpass username=user1 password=123
Error authenticating: Error making API request.

URL: PUT http://127.0.0.1:8200/v1/auth/userpass/login/user1
Code: 500. Errors:

* internal error

To confirm the problem is a missing policy, start by identifying the relevant entity and group IDs:

$ vault read -format=json identity/entity/name/user1 | \
  jq '{"entity_id": .data.id, "group_ids": .data.group_ids} '
{
  "entity_id": "420c82de-57c3-df2e-2ef6-0690073b1636",
  "group_ids": [
    "6cb152b7-955d-272b-4dcf-a2ed668ca1ea"
  ]
}

Use the group ID to fetch the relevant policies for the group under the ns1 namespace:

$ vault read -format=json -namespace=ns1 \
  identity/group/id/6cb152b7-955d-272b-4dcf-a2ed668ca1ea | \
  jq '.data.policies'
[
  "group_policy"
]

Now that we know Vault is looking for a policy called group_policy, we can check whether that policy exists under the ns1 namespace:

$ vault policy list -namespace=ns1
default

The only policy in the ns1 namespace is default, which confirms that the missing policy (group_policy) is causing the error.

To fix the problem, we can either remove the missing policy from the 6cb152b7-955d-272b-4dcf-a2ed668ca1ea group or create the missing policy under the ns1 namespace.

To remove group_policy from group ID 6cb152b7-955d-272b-4dcf-a2ed668ca1ea, use the vault write command to set the applicable policies to just include default:

$ vault write                                             \
  -namespace=ns1                                          \
  identity/group/id/6cb152b7-955d-272b-4dcf-a2ed668ca1ea  \
  name="test"                                             \
  policies="default"

To create the missing policy, use vault policy write and define the appropriate capabilities:

$ vault policy write -namespace=ns1 group_policy - << EOF
    path "secret/data/*" {
        capabilities = ["create", "update"]
    }
EOF

Verify the fix by re-running the login command:

$ vault login -method=userpass username=user1 password=123

Vault is storing references to ephemeral sub-loggers leading to unbounded memory consumption

Affected versions

This memory consumption bug affects Vault Community and Enterprise versions:

1.13.7 - 1.13.9
1.14.3 - 1.14.5
1.15.0 - 1.15.1

This change that introduced this bug has been reverted as of 1.13.10, 1.14.6, and 1.15.2

Issue

Vault is unexpectedly storing references to ephemeral sub-loggers which prevents them from being cleaned up, leading to unbound memory consumption for loggers. This came about from a change to address a previously known issue around sub-logger levels not being adjusted on reload. This impacts many areas of Vault, but primarily logins in Enterprise.

Workaround

There is no workaround.

Sublogger levels not adjusted on reload

Affected versions

This issue affects all Vault Community and Vault Enterprise versions.

Issue

Vault does not honor a modified log_level configuration for certain subsystem loggers on SIGHUP.

The issue is known to specifically affect resolver.watcher and replication.index.* subloggers.

After modifying the log_level and issuing a reload (SIGHUP), some loggers are updated to reflect the new configuration, while some subsystem logger levels remain unchanged.

For example, after starting a server with log_level: "trace" and modifying it to log_level: "info" the following lines appear after reload:

[TRACE] resolver.watcher: dr mode doesn't have failover support, returning
...
[DEBUG] replication.index.perf: saved checkpoint: num_dirty=5
[DEBUG] replication.index.local: saved checkpoint: num_dirty=0
[DEBUG] replication.index.periodic: starting WAL GC: from=2531280 to=2531280 last=2531536

Workaround

The workaround is to restart the Vault server.

Fatal error during expiration metrics gathering causing Vault crash

Affected versions

This issue affects Vault Community and Enterprise versions:

1.13.9
1.14.5
1.15.1

A fix has been issued in Vault 1.13.10, 1.14.6, and 1.15.2.

Issue

A recent change to Vault to improve state change speed (e.g. becoming active or standby) introduced a concurrency issue which can lead to a concurrent iteration and write on a map, causing a fatal error and crashing Vault. This error occurs when gathering lease and token metrics from the expiration manager. These metrics originate from the active node in a HA cluster, as such a standby node will take over active duties and the cluster will remain functional should the original active node encounter this bug. The new active node will be vulnerable to the same bug, but may not encounter it immediately.

There is no workaround.

Deadlock can occur on performance secondary clusters with many mounts

Affected versions

1.15.0 - 1.15.5
1.14.5 - 1.14.9
1.13.9 - 1.13.13

Issue

Vault 1.15.0, 1.14.5, and 1.13.9 introduced a worker pool to schedule periodic rollback operations on all mounts. This worker pool defaulted to using 256 workers. The worker pool introduced a risk of deadlocking on the active node of performance secondary clusters, leaving that cluster unable to service any requests.

The conditions required to cause the deadlock on the performance secondary:

Performance replication is enabled
The performance primary cluster has more than 256 non-local mounts. The more mounts the cluster has, the more likely the deadlock becomes
One of the following occurs:
- A replicated mount is unmounted or remounted OR
- A replicated namespace is deleted OR
- Replication paths filters are used to filter at least one mount or namespace

Workaround

Set the VAULT_ROLLBACK_WORKERS environment variable to a number larger than the number of mounts in your Vault cluster and restart Vault:

$ export VAULT_ROLLBACK_WORKERS=1000