Over the past year we have made a number of investments to strengthen the security of critical open source projects, and recently announced our $10 billion commitment to cybersecurity defense including $100 million to support third-party foundations that manage open source security priorities and help fix vulnerabilities.

Today, we are excited to announce our sponsorship for the Secure Open Source (SOS) pilot program run by the Linux Foundation. This program financially rewards developers for enhancing the security of critical open source projects that we all depend on. We are starting with a $1 million investment and plan to expand the scope of the program based on community feedback.

Why SOS?
SOS rewards a very broad range of improvements that proactively harden critical open source projects and supporting infrastructure against application and supply chain attacks. To complement existing programs that reward vulnerability management, SOS’s scope is comparatively wider in the type of work it rewards, in order to support project developers.
What projects are in scope?

Since there is no one definition of what makes an open source project critical, our selection process will be holistic. During submission evaluation we will consider the guidelines established by the National Institute of Standards and Technology’s definition in response to the recent Executive Order on Cybersecurity along with criteria listed below:

  • The impact of the project:
    • How many and what types of users will be affected by the security improvements?
    • Will the improvements have a significant impact on infrastructure and user security?
    • If the project were compromised, how serious or wide-reaching would the implications be?
  • The project’s rankings in existing open source criticality research:


What security improvements qualify? 

The program is initially focused on rewarding the following work:

  • Software supply chain security improvements including hardening CI/CD pipelines and distribution infrastructure. The SLSA framework suggests specific requirements to consider, such as basic provenance generation and verification.
  • Adoption of software artifact signing and verification. One option to consider is Sigstore’s set of utilities (e.g. cosign).
  • Project improvements that produce higher OpenSSF Scorecard results. For example, a contributor can follow remediation suggestions for the following Scorecard checks:
    • Code-Review
    • Branch-Protection
    • Pinned-Dependencies
    • Dependency-Update-Tool
    • Fuzzing
  • Use of OpenSSF Allstar and remediation of discovered issues.
  • Earning a CII Best Practice Badge (which also improves the Scorecard results).

We’ll continue adding to the above list, so check our FAQ for updates. You may also submit improvements not listed above, if you provide justification and evidence to help us understand the complexity and impact of the work.

Only work completed after October 1, 2021 qualifies for SOS rewards.

Upfront funding is available on a limited case by case basis for impactful improvements of moderate to high complexity over a longer time span. Such requests should explain why funding is required upfront and provide a detailed plan of how the improvements will be landed.

How to participate
Review our FAQ and fill out this form to submit your application.

Please include as much data or supporting evidence as possible to help us evaluate the significance of the project and your improvements. 


Reward amounts
Reward amounts are determined based on complexity and impact of work:

  • $10,000 or more for complicated, high-impact and lasting improvements that almost certainly prevent major vulnerabilities in the affected code or supporting infrastructure.
  • $5,000-$10,000 for moderately complex improvements that offer compelling security benefits.
  • $1,000-$5,000 for submissions of modest complexity and impact.
  • $505 for small improvements that nevertheless have merit from a security standpoint.


Looking Ahead

The SOS program is part of a broader effort to address a growing truth: the world relies on open source software, but widespread support and financial contributions are necessary to keep that software safe and secure. This $1 million investment is just the beginning—we envision the SOS pilot program as the starting point for future efforts that will hopefully bring together other large organizations and turn it into a sustainable, long-term initiative under the OpenSSF. We welcome community feedback and interest from others who want to contribute to the SOS program. Together we can pool our support to give back to the open source community that makes the modern internet possible.

A few months ago we announced that we started signing all distroless images with cosign, which allows users to verify that they have the correct image before starting the build process. Signing our images was our first step towards fully securing the distroless supply chain. Since then, we’ve implemented even more accountability in our supply chain and are excited to announce that distroless builds have achieved SLSA 2. SLSA is a security framework for increasing supply chain security, and Level 2 ensures that the build service is tamper resistant.

This means that in addition to a signature, each distroless image now has an associated signed provenance. This provenance is an in-toto attestation and includes information around how each image was built, what command was run, and what build system was used. It also includes any special parameters that were passed in, the exact commit the images were built at, and more. This provenance is a useful tool for builds that need to be audited in the future.

SLSA 2 Requirement

Distroless

Source – Version controlled

Source code in Github

Build – Scripted build

Build script exists as a Tekton Pipeline, invoked as a Google Cloud Build step

Build – Build service

All steps run on Kubernetes with Tekton

Provenance – Available

Provenance is available in the rekor transparency log as an in-toto attestation

Provenance – Authenticated

Provenance is signed with the distroless GCP KMS key

Provenance – Service generated

Provenance is generated by Tekton Chains from a Tekton TaskRun


Achieving SLSA 2 required some changes to the distroless build pipeline: we set up Tekton Pipelines and Tekton Chains in a GKE cluster to automate building images and generating provenance. Every time a pull request is merged to the distroless Github repo, a Tekton Pipeline is triggered. This Pipeline builds the distroless images, and Tekton Chains is responsible for generating signed provenance for each image. Tekton Chains stores the signed provenance alongside the image in an OCI registry and also stores a record of the provenance in the rekor transparency log.

Don’t trust us?

You can try the build yourself. Because distroless builds are reproducible, all the information to replicate the build is in the provenance, and you or a trusted third party can build the image yourselves and verify the build is correct by matching image digests.

You can verify an attestation for a distroless image with cosign and the distroless public key:

$ cosign verify-attestation -key cosign.pub gcr.io/distroless/base@sha256:4f8aa0aba190e375a5a53bb71a303c89d9734c817714aeaca9bb23b82135ed91

Verification for gcr.io/distroless/base@sha256:4f8aa0aba190e375a5a53bb71a303c89d9734c817714aeaca9bb23b82135ed91 —

The following checks were performed on each of these signatures:

  – The cosign claims were validated

  – The signatures were verified against the specified public key

  – Any certificates were verified against the Fulcio roots.

And you can find the provenance for the image in the rekor transparency log with the rekor-cli tool. For example, you could find the provenance for the above image by using the image’s digest and running:

$ rekor-cli search –sha sha256:4f8aa0aba190e375a5a53bb71a303c89d9734c817714aeaca9bb23b82135ed91

af7a9687d263504ccdb2759169c9903d8760775045c6e7554e365ec2bf29f6f8

$ rekor-cli get –uuid af7a9687d263504ccdb2759169c9903d8760775045c6e7554e365ec2bf29f6f8 –format json | jq -r .Attestation | base64 –decode | jq

{

  “_type”: “distroless-provenance”,

  “predicateType”: “https://tekton.dev/chains/provenance”,

  “subject”: [

    {

      “name”: “gcr.io/distroless/base”,

      “digest”: {

        “sha256”: “703a4726aedc9ec7a7e32251087565246db117bb9a141a7993d1c4bb4036660d”

      }

    },

    {

      “name”: “gcr.io/distroless/base”,

      “digest”: {

        “sha256”: “d322ed16d530596c37eee3eb57a039677502aa71f0e4739b0272b1ebd8be9bce”

      }

    },

    {

      “name”: “gcr.io/distroless/base”,

      “digest”: {

        “sha256”: “2dfdd5bf591d0da3f67a25f3fc96d929b256d5be3e0af084db10952e5da2c661”

      }

    },

    {

      “name”: “gcr.io/distroless/base”,

      “digest”: {

        “sha256”: “4f8aa0aba190e375a5a53bb71a303c89d9734c817714aeaca9bb23b82135ed91”

      }

    },

    {

      “name”: “gcr.io/distroless/base”,

      “digest”: {

        “sha256”: “dc0a793d83196a239abf3ba035b3d1a0c7a24184856c2649666e84bc82fc5980”

      }

    },

    {

      “name”: “gcr.io/distroless/base-debian10”,

      “digest”: {

        “sha256”: “2dfdd5bf591d0da3f67a25f3fc96d929b256d5be3e0af084db10952e5da2c661”

      }

    },

    {

      “name”: “gcr.io/distroless/base-debian10”,

      “digest”: {

        “sha256”: “703a4726aedc9ec7a7e32251087565246db117bb9a141a7993d1c4bb4036660d”

      }

    },

    {

      “name”: “gcr.io/distroless/base-debian10”,

      “digest”: {

        “sha256”: “4f8aa0aba190e375a5a53bb71a303c89d9734c817714aeaca9bb23b82135ed91”

      }

    },

    {

      “name”: “gcr.io/distroless/base-debian10”,

      “digest”: {

        “sha256”: “d322ed16d530596c37eee3eb57a039677502aa71f0e4739b0272b1ebd8be9bce”

      }

    },

    {

      “name”: “gcr.io/distroless/base-debian10”,

      “digest”: {

        “sha256”: “dc0a793d83196a239abf3ba035b3d1a0c7a24184856c2649666e84bc82fc5980”

      }

    },

    {

      “name”: “gcr.io/distroless/base-debian11”,

      “digest”: {

        “sha256”: “c9507268813f235b11e63a7ae01526b180c94858bd718d6b4746c9c0e8425f7a”

      }

    },

    {

      “name”: “gcr.io/distroless/cc”,

      “digest”: {

        “sha256”: “4af613acf571a1b86b1d3c50682caada0b82024e566c1c4c2fe485a70f3af47d”

      }

    },

    {

      “name”: “gcr.io/distroless/cc”,

      “digest”: {

        “sha256”: “2c4bb6b7236db0a55ec54ba8845e4031f5db2be957ac61867872bf42e56c4deb”

      }

    },

    {

      “name”: “gcr.io/distroless/cc”,

      “digest”: {

        “sha256”: “2c4bb6b7236db0a55ec54ba8845e4031f5db2be957ac61867872bf42e56c4deb”

      }

    },

    {

      “name”: “gcr.io/distroless/cc-debian10”,

      “digest”: {

        “sha256”: “4af613acf571a1b86b1d3c50682caada0b82024e566c1c4c2fe485a70f3af47d”

      }

    },

    {

      “name”: “gcr.io/distroless/cc-debian10”,

      “digest”: {

        “sha256”: “2c4bb6b7236db0a55ec54ba8845e4031f5db2be957ac61867872bf42e56c4deb”

      }

    },

    {

      “name”: “gcr.io/distroless/cc-debian10”,

      “digest”: {

        “sha256”: “2c4bb6b7236db0a55ec54ba8845e4031f5db2be957ac61867872bf42e56c4deb”

      }

    },

    {

      “name”: “gcr.io/distroless/java”,

      “digest”: {

        “sha256”: “deb41661be772c6256194eb1df6b526cc95a6f60e5f5b740dda2769b20778c51”

      }

    },

    {

      “name”: “gcr.io/distroless/nodejs”,

      “digest”: {

        “sha256”: “927dd07e7373e1883469c95f4ecb31fe63c3acd104aac1655e15cfa9ae0899bf”

      }

    },

    {

      “name”: “gcr.io/distroless/nodejs”,

      “digest”: {

        “sha256”: “927dd07e7373e1883469c95f4ecb31fe63c3acd104aac1655e15cfa9ae0899bf”

      }

    },

    {

      “name”: “gcr.io/distroless/nodejs”,

      “digest”: {

        “sha256”: “f106757268ab4e650b032e78df0372a35914ed346c219359b58b3d863ad9fb58”

      }

    },

    {

      “name”: “gcr.io/distroless/nodejs-debian10”,

      “digest”: {

        “sha256”: “927dd07e7373e1883469c95f4ecb31fe63c3acd104aac1655e15cfa9ae0899bf”

      }

    },

    {

      “name”: “gcr.io/distroless/nodejs-debian10”,

      “digest”: {

        “sha256”: “f106757268ab4e650b032e78df0372a35914ed346c219359b58b3d863ad9fb58”

      }

    },

    {

      “name”: “gcr.io/distroless/nodejs-debian10”,

      “digest”: {

        “sha256”: “927dd07e7373e1883469c95f4ecb31fe63c3acd104aac1655e15cfa9ae0899bf”

      }

    },

    {

      “name”: “gcr.io/distroless/python3”,

      “digest”: {

        “sha256”: “aa8a0358b2813e8b48a54c7504316c7dcea59d6ae50daa0228847de852c83878”

      }

    },

    {

      “name”: “gcr.io/distroless/python3-debian10”,

      “digest”: {

        “sha256”: “aa8a0358b2813e8b48a54c7504316c7dcea59d6ae50daa0228847de852c83878”

      }

    },

    {

      “name”: “gcr.io/distroless/static”,

      “digest”: {

        “sha256”: “9acfd1fdf62b26cbd4f3c31422cf1edf3b7b01a9ecee00a499ef8b7e3536914d”

      }

    },

    {

      “name”: “gcr.io/distroless/static”,

      “digest”: {

        “sha256”: “e50641dbb871f78831f9aa7ffa59ec8f44d4cc33ae4ee992c9f4b046040e97f2”

      }

    },

    {

      “name”: “gcr.io/distroless/static-debian10”,

      “digest”: {

        “sha256”: “9acfd1fdf62b26cbd4f3c31422cf1edf3b7b01a9ecee00a499ef8b7e3536914d”

      }

    },

    {

      “name”: “gcr.io/distroless/static-debian10”,

      “digest”: {

        “sha256”: “e50641dbb871f78831f9aa7ffa59ec8f44d4cc33ae4ee992c9f4b046040e97f2”

      }

    }

  ],

  “predicate”: {

    “invocation”: {

      “parameters”: [

        “MANIFEST_SUBSECTION={string 0 []}”,

        “CHAINS-GIT_COMMIT={string 976c1c9bc178ac0371d8888d69893145c3df09f0 []}”,

        “CHAINS-GIT_URL={string https://github.com/GoogleContainerTools/distroless []}”

      ],

      “recipe_uri”: “task://distroless-provenance”,

      “event_id”: “531c282f-806e-41e4-b3ad-b596c4283381”,

      “builder.id”: “tekton-chains”

    },

    “recipe”: {

      “steps”: [

        {

          “entryPoint”: “#!/bin/sh\nset -ex\n\n# get the digests for a subset of images built, and store in the IMAGES result\ngo run provenance/provenance.go images $(params.MANIFEST_SUBSECTION) > $(results.IMAGES.path)\n”,

          “arguments”: null,

          “environment”: {

            “container”: “provenance”,

            “image”: “docker.io/library/golang@sha256:cb1a7482cb5cfc52527c5cdea5159419292360087d5249e3fe5472f3477be642”

          },

          “annotations”: null

        }

      ]

    },

    “metadata”: {

      “buildStartedOn”: “2021-09-16T00:03:04Z”,

      “buildFinishedOn”: “2021-09-16T00:04:36Z”

    },

    “materials”: [

      {

        “uri”: “https://github.com/GoogleContainerTools/distroless”,

        “digest”: {

          “revision”: “976c1c9bc178ac0371d8888d69893145c3df09f0”

        }

      }

    ]

  }

}

As you might guess, our next step is getting distroless to SLSA 3, which will require adding non-falsifiable provenance and isolated builds to the distroless supply chain. Stay tuned for more!

To borrow from an excellent analogy between the modern computer ecosystem and the US automotive industry of the 1960s, the Linux kernel runs well: when driving down the highway, you’re not sprayed in the face with oil and gasoline, and you quickly get where you want to go. However, in the face of failure, the car may end up on fire, flying off a cliff.

As we approach its 30th Anniversary, Linux still remains the largest collaborative development project in the history of computing. The huge community surrounding Linux allows it to do amazing things and run smoothly. What’s still missing, though, is sufficient focus to make sure that Linux fails well too. There’s a strong link between code robustness and security: making it harder for any bugs to manifest makes it harder for security flaws to manifest. But that’s not the end of the story. When flaws do manifest, it’s important to handle them effectively.

Rather than only taking a one-bug-at-a-time perspective, preemptive actions can stop bugs from having bad effects. With Linux written in C, it will continue to have a long tail of associated problems. Linux must be designed to take proactive steps to defend itself from its own risks. Cars have seat belts not because we want to crash, but because it is guaranteed to happen sometimes.

Even though everyone wants a safe kernel running on their computer, phone, car, or interplanetary helicopter, not everyone is in a position to do something about it. Upstream kernel developers can fix bugs, but have no control over what a downstream vendor chooses to incorporate into their products. End users get to choose their products, but don’t usually have control over what bugs are fixed nor what kernel is used (a problem in itself). Ultimately, vendors are responsible for keeping their product’s kernels safe.

What to fix?

The statistics of tracking and fixing distinct bugs are sobering. The stable kernel releases (“bug fixes only”) each contain close to 100 new fixes per week. Faced with this high rate of change, a vendor can choose to ignore all the fixes, pick out only “important” fixes, or face the daunting task of taking everything.

Fix nothing?

With the preponderance of malware, botnets, and state surveillance targeting flawed software, it’s clear that ignoring all fixes is the wrong “solution.” Unfortunately this is the very common stance of vendors who see their devices as just a physical product instead of a hybrid product/service that must be regularly updated.

Fix important flaws?

Between the dereliction of doing nothing and the assumed burden of fixing everything, the traditional vendor choice has been to cherry-pick only the “important” fixes. But what constitutes “important” or even relevant? Just determining whether to implement a fix takes developer time.

The prevailing wisdom has been to choose vulnerabilities to fix based on the Mitre CVE list, presuming all important flaws (and therefore fixes) would have an associated CVE. However, given the volume of flaws and their applicability to a particular system, not all security flaws have CVEs assigned, nor are they assigned in a timely manner. Evidence shows that for Linux CVEs, more than 40% had been fixed before the CVE was even assigned, with the average delay being over three months after the fix. Some fixes went years without having their security impact recognized. On top of this, product-relevant bugs may not even classify for a CVE. Finally, upstream developers aren’t actually interested in CVE assignment; they spend their limited time actually fixing bugs.

A vendor relying on cherry-picking is all but guaranteed to miss important vulnerabilities that others are actively fixing, which is almost worse than doing nothing since it creates the illusion that security updates are being appropriately handled.

Fix everything!

So what is a vendor to do? The answer is simple, if painful: continuously update to the latest kernel release, either major or stable. Tracking major releases means gaining security improvements along with bug fixes, while stable releases are bug fixes only. For example, although modern Android phones ship with kernels that are based on major releases from almost two to four years earlier, Android vendors do now, thankfully, track stable kernel releases. So even though the features being added to newer major kernels will be missing, all the latest stable kernel fixes are present.

Performing continuous kernel updates (major or stable) understandably faces enormous resistance within an organization due to fear of regressions—will the update break the product? The answer is usually that a vendor doesn’t know, or that the update frequency is shorter than their time needed for testing. But the problem with updating is not that the kernel might cause regressions; it’s that vendors don’t have sufficient test coverage and automation to know the answer. Testing must take priority over individual fixes.

Make it happen

One question remains: how to possibly support all the work continuous updates require? As it turns out, it’s a simple resource allocation problem, and is more easily accomplished than might be imagined: downstream redundancy can be moved into greater upstream collaboration.

More engineers for fixing bugs earlier

With vendors using old kernels and backporting existing fixes, their engineering resources are doing redundant work. For example, instead of 10 companies each assigning one engineer to backport the same fix independently, those developer hours could be shifted to upstream work where 10 separate bugs could be fixed for everyone in the Linux ecosystem. This would help address the growing backlog of bugs. Looking at just one source of potential kernel security flaws, the syzkaller dashboard shows the number of open bugs is currently approaching 900 and growing by about 100 a year, even with about 400 a year being fixed.

More engineers for code review

Beyond just squashing bugs after the fact, more focus on upstream code review will help stem the tide of their introduction in the first place, with benefits extending beyond just the immediate bugs caught. Capable code review bandwidth is a limited resource. Without enough people dedicated to upstream code review and subsystem maintenance tasks, the entire kernel development process bottlenecks.

Long-term Linux robustness depends on developers, but especially on effective kernel maintainers. Although there is effort in the industry to train new developers, this has been traditionally justified only by the “feature driven” jobs they can get. But focusing only on product timelines ultimately leads Linux into the Tragedy of the Commons. Expanding the number of maintainers can avoid it. Luckily the “pipeline” for new maintainers is straightforward.

Maintainers are built not only from their depth of knowledge of a subsystem’s technology, but also from their experience with mentorship of other developers and code review. Training new reviewers must become the norm, motivated by making upstream review part of the job. Today’s reviewers become tomorrow’s maintainers. If each major kernel subsystem gained four more dedicated maintainers, we could double productivity.

More engineers for testing and infrastructure

Along with more reviewers, improving Linux’s development workflow is critical to expanding everyone’s ability to contribute. Linux’s “email only” workflow is showing its age, but the upstream development of more automated patch tracking, continuous integration, fuzzing, coverage, and testing will make the development process significantly more efficient.

Additionally, instead of testing kernels after they’re released, it’s more effective to test during development. When tests are performed against unreleased kernel versions (e.g. linux-next) and reported upstream, developers get immediate feedback about bugs. Fixes can be developed before a flaw is ever actually released; it’s always easier to fix a bug earlier than later.

This “upstream first” approach to product kernel development and testing is extremely efficient. Google has been successfully doing this with Chrome OS and Android for a while now, and is hardly alone in the industry. It means feature development happens against the latest kernel, and devices are similarly tested as close as possible to the latest upstream kernels, all avoiding duplicated “in-house” effort.

More engineers for security and toolchain development

Besides dealing reactively to individual bugs and existing maintenance needs, there is also the need to proactively eliminate entire classes of flaws, so developers cannot introduce these types of bugs ever again. Why fix the same kind of security vulnerability 10 times a year when we can stop it from ever appearing again?

Over the last few years, various fragile language features and kernel APIs have been eliminated or replaced (e.g. VLAs, switch fallthrough, addr_limit). However, there is still plenty more work to be done. One of the most time-consuming aspects has been the refactoring involved in making these usually invasive and context-sensitive changes across Linux’s 25 million lines of code.

Beyond kernel code itself, the compiler and toolchain also need to grow more defensive features (e.g. variable zeroing, CFI, sanitizers). With the toolchain technically “outside” the kernel, its development effort is often inappropriately overlooked and underinvested. Code safety burdens need to be shifted as much as possible to the toolchain, freeing humans to work in other areas. On the most progressive front, we must make sure Linux can be written in memory-safe languages like Rust.

Don’t wait another minute

If you’re not using the latest kernel, you don’t have the most recently added security defenses (including bug fixes). In the face of newly discovered flaws, this leaves systems less secure than they could have been. Even when mediated by careful system design, proper threat modeling, and other standard security practices, the magnitude of risk grows quickly over time, leaving vendors to do the calculus of determining how old a kernel they can tolerate exposing users to. Unless the answer is “just abandon our users,” engineering resources must be focused upstream on closing the gap by continuously deploying the latest kernel release.

Based on our most conservative estimates, the Linux kernel and its toolchains are currently underinvested by at least 100 engineers, so it’s up to everyone to bring their developer talent together upstream. This is the only solution that will ensure a balance of security at reasonable long-term cost.


To borrow from an excellent analogy between the modern computer ecosystem and the US automotive industry of the 1960s, the Linux kernel runs well: when driving down the highway, you’re not sprayed in the face with oil and gasoline, and you quickly get where you want to go. However, in the face of failure, the car may end up on fire, flying off a cliff.

As we approach its 30th Anniversary, Linux still remains the largest collaborative development project in the history of computing. The huge community surrounding Linux allows it to do amazing things and run smoothly. What’s still missing, though, is sufficient focus to make sure that Linux fails well too. There’s a strong link between code robustness and security: making it harder for any bugs to manifest makes it harder for security flaws to manifest. But that’s not the end of the story. When flaws do manifest, it’s important to handle them effectively.

Rather than only taking a one-bug-at-a-time perspective, preemptive actions can stop bugs from having bad effects. With Linux written in C, it will continue to have a long tail of associated problems. Linux must be designed to take proactive steps to defend itself from its own risks. Cars have seat belts not because we want to crash, but because it is guaranteed to happen sometimes.

Even though everyone wants a safe kernel running on their computer, phone, car, or interplanetary helicopter, not everyone is in a position to do something about it. Upstream kernel developers can fix bugs, but have no control over what a downstream vendor chooses to incorporate into their products. End users get to choose their products, but don’t usually have control over what bugs are fixed nor what kernel is used (a problem in itself). Ultimately, vendors are responsible for keeping their product’s kernels safe.

What to fix?

The statistics of tracking and fixing distinct bugs are sobering. The stable kernel releases (“bug fixes only”) each contain close to 100 new fixes per week. Faced with this high rate of change, a vendor can choose to ignore all the fixes, pick out only “important” fixes, or face the daunting task of taking everything.

Fix nothing?

With the preponderance of malware, botnets, and state surveillance targeting flawed software, it’s clear that ignoring all fixes is the wrong “solution.” Unfortunately this is the very common stance of vendors who see their devices as just a physical product instead of a hybrid product/service that must be regularly updated.

Fix important flaws?

Between the dereliction of doing nothing and the assumed burden of fixing everything, the traditional vendor choice has been to cherry-pick only the “important” fixes. But what constitutes “important” or even relevant? Just determining whether to implement a fix takes developer time.

The prevailing wisdom has been to choose vulnerabilities to fix based on the Mitre CVE list, presuming all important flaws (and therefore fixes) would have an associated CVE. However, given the volume of flaws and their applicability to a particular system, not all security flaws have CVEs assigned, nor are they assigned in a timely manner. Evidence shows that for Linux CVEs, more than 40% had been fixed before the CVE was even assigned, with the average delay being over three months after the fix. Some fixes went years without having their security impact recognized. On top of this, product-relevant bugs may not even classify for a CVE. Finally, upstream developers aren’t actually interested in CVE assignment; they spend their limited time actually fixing bugs.

A vendor relying on cherry-picking is all but guaranteed to miss important vulnerabilities that others are actively fixing, which is almost worse than doing nothing since it creates the illusion that security updates are being appropriately handled.

Fix everything!

So what is a vendor to do? The answer is simple, if painful: continuously update to the latest kernel release, either major or stable. Tracking major releases means gaining security improvements along with bug fixes, while stable releases are bug fixes only. For example, although modern Android phones ship with kernels that are based on major releases from almost two to four years earlier, Android vendors do now, thankfully, track stable kernel releases. So even though the features being added to newer major kernels will be missing, all the latest stable kernel fixes are present.

Performing continuous kernel updates (major or stable) understandably faces enormous resistance within an organization due to fear of regressions—will the update break the product? The answer is usually that a vendor doesn’t know, or that the update frequency is shorter than their time needed for testing. But the problem with updating is not that the kernel might cause regressions; it’s that vendors don’t have sufficient test coverage and automation to know the answer. Testing must take priority over individual fixes.

Make it happen

One question remains: how to possibly support all the work continuous updates require? As it turns out, it’s a simple resource allocation problem, and is more easily accomplished than might be imagined: downstream redundancy can be moved into greater upstream collaboration.

More engineers for fixing bugs earlier

With vendors using old kernels and backporting existing fixes, their engineering resources are doing redundant work. For example, instead of 10 companies each assigning one engineer to backport the same fix independently, those developer hours could be shifted to upstream work where 10 separate bugs could be fixed for everyone in the Linux ecosystem. This would help address the growing backlog of bugs. Looking at just one source of potential kernel security flaws, the syzkaller dashboard shows the number of open bugs is currently approaching 900 and growing by about 100 a year, even with about 400 a year being fixed.

More engineers for code review

Beyond just squashing bugs after the fact, more focus on upstream code review will help stem the tide of their introduction in the first place, with benefits extending beyond just the immediate bugs caught. Capable code review bandwidth is a limited resource. Without enough people dedicated to upstream code review and subsystem maintenance tasks, the entire kernel development process bottlenecks.

Long-term Linux robustness depends on developers, but especially on effective kernel maintainers. Although there is effort in the industry to train new developers, this has been traditionally justified only by the “feature driven” jobs they can get. But focusing only on product timelines ultimately leads Linux into the Tragedy of the Commons. Expanding the number of maintainers can avoid it. Luckily the “pipeline” for new maintainers is straightforward.

Maintainers are built not only from their depth of knowledge of a subsystem’s technology, but also from their experience with mentorship of other developers and code review. Training new reviewers must become the norm, motivated by making upstream review part of the job. Today’s reviewers become tomorrow’s maintainers. If each major kernel subsystem gained four more dedicated maintainers, we could double productivity.

More engineers for testing and infrastructure

Along with more reviewers, improving Linux’s development workflow is critical to expanding everyone’s ability to contribute. Linux’s “email only” workflow is showing its age, but the upstream development of more automated patch tracking, continuous integration, fuzzing, coverage, and testing will make the development process significantly more efficient.

Additionally, instead of testing kernels after they’re released, it’s more effective to test during development. When tests are performed against unreleased kernel versions (e.g. linux-next) and reported upstream, developers get immediate feedback about bugs. Fixes can be developed before a flaw is ever actually released; it’s always easier to fix a bug earlier than later.

This “upstream first” approach to product kernel development and testing is extremely efficient. Google has been successfully doing this with Chrome OS and Android for a while now, and is hardly alone in the industry. It means feature development happens against the latest kernel, and devices are similarly tested as close as possible to the latest upstream kernels, all avoiding duplicated “in-house” effort.

More engineers for security and toolchain development

Besides dealing reactively to individual bugs and existing maintenance needs, there is also the need to proactively eliminate entire classes of flaws, so developers cannot introduce these types of bugs ever again. Why fix the same kind of security vulnerability 10 times a year when we can stop it from ever appearing again?

Over the last few years, various fragile language features and kernel APIs have been eliminated or replaced (e.g. VLAs, switch fallthrough, addr_limit). However, there is still plenty more work to be done. One of the most time-consuming aspects has been the refactoring involved in making these usually invasive and context-sensitive changes across Linux’s 25 million lines of code.

Beyond kernel code itself, the compiler and toolchain also need to grow more defensive features (e.g. variable zeroing, CFI, sanitizers). With the toolchain technically “outside” the kernel, its development effort is often inappropriately overlooked and underinvested. Code safety burdens need to be shifted as much as possible to the toolchain, freeing humans to work in other areas. On the most progressive front, we must make sure Linux can be written in memory-safe languages like Rust.

Don’t wait another minute

If you’re not using the latest kernel, you don’t have the most recently added security defenses (including bug fixes). In the face of newly discovered flaws, this leaves systems less secure than they could have been. Even when mediated by careful system design, proper threat modeling, and other standard security practices, the magnitude of risk grows quickly over time, leaving vendors to do the calculus of determining how old a kernel they can tolerate exposing users to. Unless the answer is “just abandon our users,” engineering resources must be focused upstream on closing the gap by continuously deploying the latest kernel release.

Based on our most conservative estimates, the Linux kernel and its toolchains are currently underinvested by at least 100 engineers, so it’s up to everyone to bring their developer talent together upstream. This is the only solution that will ensure a balance of security at reasonable long-term cost.


A little over 10 years ago, we launched our Vulnerability Rewards Program (VRP). Our goal was to establish a channel for security researchers to report bugs to Google and offer an efficient way for us to thank them for helping make Google, our users, and the Internet a safer place. To recap our progress on these goals, here is a snapshot of what VRP has accomplished with the community over the past 10 years:

  • Total bugs rewarded: 11,055
  • Number of rewarded researchers: 2,022
  • Representing 84 different countries
  • Total rewards: $29,357,516

To celebrate our anniversary and ensure the next 10 years are just as (or even more) successful and collaborative, we are excited to announce the launch of our new platform, bughunters.google.com.

This new site brings all of our VRPs (Google, Android, Abuse, Chrome and Play) closer together and provides a single intake form that makes it easier for bug hunters to submit issues. Other improvements you will notice include:

  • More opportunities for interaction and a bit of healthy competition through gamification, per-country leaderboards, awards/badges for certain bugs and more!
  • A more functional and aesthetically pleasing leaderboard. We know a lot of you are using your achievements in the VRP to find jobs (we’re hiring!) and we hope this acts as a useful resource.
  • A stronger emphasis on learning: Bug hunters can improve their skills through the content available in our new Bug Hunter University
  • Streamlined publication process: we know the value that knowledge sharing brings to our community. That’s why we want to make it easier for you to publish your bug reports.
  • Swag will now be supported for special occasions (we heard you loud and clear!)

We also want to take a moment to shine a light on some aspects of the VRP that are not yet well-known, such as:

When we launched our very first VRP, we had no idea how many valid vulnerabilities – if any – would be submitted on the first day. Everyone on the team put in their estimate, with predictions ranging from zero to 20. In the end, we actually received more than 25 reports, taking all of us by surprise.

Since its inception, the VRP program has not only grown significantly in terms of report volume, but the team of security engineers behind it has also expanded – including almost 20 bug hunters who reported vulnerabilities to us and ended up joining the Google VRP team.

That is why we are thrilled to bring you this new platform, continue to grow our community of bug hunters and support the skill development of up-and-coming vulnerability researchers.

Thanks again to the entire Google bug hunter community for making our vulnerability rewards program successful. As you continue to play around with the new site and reporting system, tell us about it – we would love to hear your feedback. Until next time, keep on finding those bugs!


A little over 10 years ago, we launched our Vulnerability Rewards Program (VRP). Our goal was to establish a channel for security researchers to report bugs to Google and offer an efficient way for us to thank them for helping make Google, our users, and the Internet a safer place. To recap our progress on these goals, here is a snapshot of what VRP has accomplished with the community over the past 10 years:

  • Total bugs rewarded: 11,055
  • Number of rewarded researchers: 2,022
  • Representing 84 different countries
  • Total rewards: $29,357,516

To celebrate our anniversary and ensure the next 10 years are just as (or even more) successful and collaborative, we are excited to announce the launch of our new platform, bughunters.google.com.

This new site brings all of our VRPs (Google, Android, Abuse, Chrome and Play) closer together and provides a single intake form that makes it easier for bug hunters to submit issues. Other improvements you will notice include:

  • More opportunities for interaction and a bit of healthy competition through gamification, per-country leaderboards, awards/badges for certain bugs and more!
  • A more functional and aesthetically pleasing leaderboard. We know a lot of you are using your achievements in the VRP to find jobs (we’re hiring!) and we hope this acts as a useful resource.
  • A stronger emphasis on learning: Bug hunters can improve their skills through the content available in our new Bug Hunter University
  • Streamlined publication process: we know the value that knowledge sharing brings to our community. That’s why we want to make it easier for you to publish your bug reports.
  • Swag will now be supported for special occasions (we heard you loud and clear!)

We also want to take a moment to shine a light on some aspects of the VRP that are not yet well-known, such as:

When we launched our very first VRP, we had no idea how many valid vulnerabilities – if any – would be submitted on the first day. Everyone on the team put in their estimate, with predictions ranging from zero to 20. In the end, we actually received more than 25 reports, taking all of us by surprise.

Since its inception, the VRP program has not only grown significantly in terms of report volume, but the team of security engineers behind it has also expanded – including almost 20 bug hunters who reported vulnerabilities to us and ended up joining the Google VRP team.

That is why we are thrilled to bring you this new platform, continue to grow our community of bug hunters and support the skill development of up-and-coming vulnerability researchers.

Thanks again to the entire Google bug hunter community for making our vulnerability rewards program successful. As you continue to play around with the new site and reporting system, tell us about it – we would love to hear your feedback. Until next time, keep on finding those bugs!


The way we design and build software is continually evolving. Just as we now think of security as something we build into software from the start, we are also increasingly looking for new ways to minimize trust in that software. One of the ways we can do that is by designing software so that you can get cryptographic certainty of what the software has done.

In this post, we’ll introduce the concept of verifiable data structures that help us get this cryptographic certainty. We’ll describe some existing and new applications of verifiable data structures, and provide some additional resources we have created to help you use them in your own applications.
A verifiable data structure is a class of data structure that lets people efficiently agree, with cryptographic certainty, that the data contained within it is correct.

Merkle Trees are the most famous of these and have been used for decades because they can enable efficient verification that a particular piece of data is included among many records – as a result they also form the basis of most blockchains.

Although these verifiable data structures are not new, we now have a new generation of developers who have discovered them and the designs they enable — further accelerating their adoption.
These verifiable data structures enable building a new class of software that have elements of verifiability and transparency built into the way they operate. This gives us new ways to defend against coercion, introduce accountability to existing and new ecosystems, and make it easier to demonstrate compliance to regulators, customers and partners.

Certificate Transparency is a great example of a non-blockchain use of these verifiable data structures at scale to secure core internet infrastructure. By using these patterns, we have been able to introduce transparency and accountability to an existing system used by everyone without breaking the web.
Unfortunately, despite the capabilities of verifiable data structures and the associated patterns, there are not many resources developers can use to design, build, and deploy scalable and production-quality systems based on them.

To address this gap we have generalized the platform we used to build Certificate Transparency so it can be applied to other classes of problems as well. Since this infrastructure has been used for years as part of this ecosystem it is well understood and can be deployed confidently in production systems.
This is why we have seen solutions in areas of healthcare, financial services, and supply chain leverage this platform. Beyond that, we have also applied these patterns to bring these transparency and accountability properties to other problems within our own products and services.

To this end, in 2019, we used this platform to bring supply chain integrity to the Go language ecosystem via the Go Checksum Database. This system allows developers to have confidence that the package management systems supporting the Go ecosystem can’t intentionally, arbitrarily, or accidentally start giving out the wrong code without getting caught. The reproducibility of Go builds makes this particularly powerful as it enables the developer to ensure what is in the source repository matches what is in the package management system. This solution delivers a verifiable chaiin all the way from the source repositories to the final compiled artifacts.

Another example of using these patterns is our recently announced partnership with the Linux Foundation on Sigstore. This project is a response to the ever-increasing influx of supply chain attacks on the Open Source ecosystem.

Supply chain attacks have been possible because there are weaknesses at every link in the chain. Components like build systems, source code management tools, and artifact repositories all need to be treated as critical production environments, because they are. To address this, we first need to make it possible to verify provenance along the entire chain and the goal of the Sigstore effort is to enable just that.

We are now working on using these patterns and tools to enable hardware-enforced supply chain integrity for device firmware, which we hope will discourage supply chain attacks on the devices, like smartphones, that we rely on every day by bringing transparency and accountability to their firmware supply chain.

In all of the above examples, we are using these verifiable data structures to ensure the integrity of artifacts in the supply chain. This enables customers, auditors, and internal security teams to be confident that each actor in the supply chain has lived up to their responsibilities. This helps earn the trust of those that rely on the supply chain, discourages insiders from using their position as it increases the chance they will get caught, introduces accountability, and enables proving the associated systems continually meet their compliance obligations.

When using these patterns the most important task is defining what data should be logged. This is why we put together a taxonomy and modeling framework which we have found to be helpful in designing verifiability into the systems we discussed above, and which we hope you will find valuable too.
Please take a look at the transparency.dev website to learn about these verifiable data structures, and the tools and guidance we have put together to help use them in your own applications.

The way we design and build software is continually evolving. Just as we now think of security as something we build into software from the start, we are also increasingly looking for new ways to minimize trust in that software. One of the ways we can do that is by designing software so that you can get cryptographic certainty of what the software has done.

In this post, we’ll introduce the concept of verifiable data structures that help us get this cryptographic certainty. We’ll describe some existing and new applications of verifiable data structures, and provide some additional resources we have created to help you use them in your own applications.
A verifiable data structure is a class of data structure that lets people efficiently agree, with cryptographic certainty, that the data contained within it is correct.

Merkle Trees are the most famous of these and have been used for decades because they can enable efficient verification that a particular piece of data is included among many records – as a result they also form the basis of most blockchains.

Although these verifiable data structures are not new, we now have a new generation of developers who have discovered them and the designs they enable — further accelerating their adoption.
These verifiable data structures enable building a new class of software that have elements of verifiability and transparency built into the way they operate. This gives us new ways to defend against coercion, introduce accountability to existing and new ecosystems, and make it easier to demonstrate compliance to regulators, customers and partners.

Certificate Transparency is a great example of a non-blockchain use of these verifiable data structures at scale to secure core internet infrastructure. By using these patterns, we have been able to introduce transparency and accountability to an existing system used by everyone without breaking the web.
Unfortunately, despite the capabilities of verifiable data structures and the associated patterns, there are not many resources developers can use to design, build, and deploy scalable and production-quality systems based on them.

To address this gap we have generalized the platform we used to build Certificate Transparency so it can be applied to other classes of problems as well. Since this infrastructure has been used for years as part of this ecosystem it is well understood and can be deployed confidently in production systems.
This is why we have seen solutions in areas of healthcare, financial services, and supply chain leverage this platform. Beyond that, we have also applied these patterns to bring these transparency and accountability properties to other problems within our own products and services.

To this end, in 2019, we used this platform to bring supply chain integrity to the Go language ecosystem via the Go Checksum Database. This system allows developers to have confidence that the package management systems supporting the Go ecosystem can’t intentionally, arbitrarily, or accidentally start giving out the wrong code without getting caught. The reproducibility of Go builds makes this particularly powerful as it enables the developer to ensure what is in the source repository matches what is in the package management system. This solution delivers a verifiable chaiin all the way from the source repositories to the final compiled artifacts.

Another example of using these patterns is our recently announced partnership with the Linux Foundation on Sigstore. This project is a response to the ever-increasing influx of supply chain attacks on the Open Source ecosystem.

Supply chain attacks have been possible because there are weaknesses at every link in the chain. Components like build systems, source code management tools, and artifact repositories all need to be treated as critical production environments, because they are. To address this, we first need to make it possible to verify provenance along the entire chain and the goal of the Sigstore effort is to enable just that.

We are now working on using these patterns and tools to enable hardware-enforced supply chain integrity for device firmware, which we hope will discourage supply chain attacks on the devices, like smartphones, that we rely on every day by bringing transparency and accountability to their firmware supply chain.

In all of the above examples, we are using these verifiable data structures to ensure the integrity of artifacts in the supply chain. This enables customers, auditors, and internal security teams to be confident that each actor in the supply chain has lived up to their responsibilities. This helps earn the trust of those that rely on the supply chain, discourages insiders from using their position as it increases the chance they will get caught, introduces accountability, and enables proving the associated systems continually meet their compliance obligations.

When using these patterns the most important task is defining what data should be logged. This is why we put together a taxonomy and modeling framework which we have found to be helpful in designing verifiability into the systems we discussed above, and which we hope you will find valuable too.
Please take a look at the transparency.dev website to learn about these verifiable data structures, and the tools and guidance we have put together to help use them in your own applications.

In recent months, Google has launched several efforts to strengthen open-source security on multiple fronts. One important focus is improving how we identify and respond to known security vulnerabilities without doing extensive manual work. It is essential to have a precise common data format to triage and remediate security vulnerabilities, particularly when communicating about risks to affected dependencies—it enables easier automation and empowers consumers of open-source software to know when they are impacted and make security fixes as soon as possible.

We released the Open Source Vulnerabilities (OSV) database in February with the goal of automating and improving vulnerability triage for developers and users of open source software. This initial effort was bootstrapped with a dataset of a few thousand vulnerabilities from the OSS-Fuzz project. Implementing OSV to communicate precise vulnerability data for hundreds of critical open-source projects proved the success and utility of the format, and garnered feedback to help us improve the project; for example, we dropped the Cloud API key requirement, making the database even easier to access by more users. The community response also showed that there was broad interest in extending the effort further.

Today, we’re excited to announce a new milestone in expanding OSV to several key open-source ecosystems: Go, Rust, Python, and DWF. This expansion unites and aggregates four important vulnerability databases, giving software developers a better way to track and remediate the security issues that affect them. Our effort also aligns with the recent US Executive Order on Improving the Nation’s Cybersecurity, which emphasized the need to remove barriers to sharing threat information in order to strengthen national infrastructure. This expanded shared vulnerability database marks an important step toward creating a more secure open-source environment for all users.
A simple, unified schema for describing vulnerabilities precisely

As with open source development, vulnerability databases in open source follow a distributed model, with many ecosystems and organizations creating their own database. Since each uses their own format to describe vulnerabilities, a client tracking vulnerabilities across multiple databases must handle each completely separately. Sharing of vulnerabilities between databases is also difficult.

The Google Open Source Security team, Go team, and the broader open-source community have been developing a simple vulnerability interchange schema for describing vulnerabilities that’s designed from the beginning for open-source ecosystems. After starting work on the schema a few months ago, we requested public feedback and received hundreds of comments. We have incorporated the input from readers to arrive at the current schema:

{

        “id”: string,

        “modified”: string,

        “published”: string,

        “withdrawn”: string,

        “aliases”: [ string ],

        “related”: [ string ],

        “package”: {

                “ecosystem”: string,

                “name”: string,

                “purl”: string,

        },

        “summary”: string,

        “details”: string,

        “affects”: [ {

                “ranges”: [ {

                        “type”: string,

                        “repo”: string,

                        “introduced”: string,

                        “fixed”: string

                } ],

                “versions”: [ string ]

        } ],

        “references”: [ {

                “type”: string,

                “url”: string

        } ],

        “ecosystem_specific”: { see spec },

        “database_specific”: { see spec },

}

This new vulnerability schema aims to address some key problems with managing vulnerabilities in open source. We found that there was no existing standard format which:

  • Enforces version specification that precisely matches naming and versioning schemes used in actual open source package ecosystems. For instance, matching a vulnerability such as a CVE to a package name and set of versions in a package manager is difficult to do in an automated way using existing mechanisms such as CPEs.
  • Can be used to describe vulnerabilities in any open source ecosystem, while not requiring ecosystem-dependent logic to process them.
  • Is easy to use by both automated systems and humans.

With this schema we hope to define a format that all vulnerability databases can export. A unified format means that vulnerability databases, open source users, and security researchers can easily share tooling and consume vulnerabilities across all of open source. This means a more complete view of vulnerabilities in open source for everyone, as well as faster detection and remediation times resulting from easier automation.

The current state


The vulnerability schema spec has gone through several iterations, and we are inviting further feedback as it gets closer to finalized. A number of public vulnerability databases today are already exporting this format, with more in the pipeline:

The OSV service has also aggregated all of these vulnerability databases, which are viewable at our web UI. They can also be queried with a single command via the same existing APIs:


  curl X POST d \

      ‘{“commit”: “a46c08c533cfdf10260e74e2c03fa84a13b6c456”}’ \

      “https://api.osv.dev/v1/query”

    

  curl X POST d \

      ‘{“version”: “2.4.1”, “package”: {“name”: “jinja2”, “ecosystem”: “PyPI”}}’ \

      “https://api.osv.dev/v1/query”


Automating vulnerability database maintenance


Producing quality vulnerability data is also difficult. In addition to OSV’s existing automation, we built more automation tools for vulnerability database maintenance, and used these tools to bootstrap the community Python advisory database. This automation takes existing feeds, accurately matches them to packages, and generates entries containing precise, validated version ranges with minimal human intervention. We plan to extend this tooling to other ecosystems for which there is no existing vulnerability database, or little support for ongoing database maintenance.


Get involved


Thank you to all the open source developers who have provided feedback and adopted this format. We’re continuing to work with open source communities to develop this further and earn more widespread adoption in all ecosystems. If you are interested in adopting this format, we’d appreciate any feedback on our public spec.

In recent months, Google has launched several efforts to strengthen open-source security on multiple fronts. One important focus is improving how we identify and respond to known security vulnerabilities without doing extensive manual work. It is essential to have a precise common data format to triage and remediate security vulnerabilities, particularly when communicating about risks to affected dependencies—it enables easier automation and empowers consumers of open-source software to know when they are impacted and make security fixes as soon as possible.

We released the Open Source Vulnerabilities (OSV) database in February with the goal of automating and improving vulnerability triage for developers and users of open source software. This initial effort was bootstrapped with a dataset of a few thousand vulnerabilities from the OSS-Fuzz project. Implementing OSV to communicate precise vulnerability data for hundreds of critical open-source projects proved the success and utility of the format, and garnered feedback to help us improve the project; for example, we dropped the Cloud API key requirement, making the database even easier to access by more users. The community response also showed that there was broad interest in extending the effort further.

Today, we’re excited to announce a new milestone in expanding OSV to several key open-source ecosystems: Go, Rust, Python, and DWF. This expansion unites and aggregates four important vulnerability databases, giving software developers a better way to track and remediate the security issues that affect them. Our effort also aligns with the recent US Executive Order on Improving the Nation’s Cybersecurity, which emphasized the need to remove barriers to sharing threat information in order to strengthen national infrastructure. This expanded shared vulnerability database marks an important step toward creating a more secure open-source environment for all users.

 
A simple, unified schema for describing vulnerabilities precisely

As with open source development, vulnerability databases in open source follow a distributed model, with many ecosystems and organizations creating their own database. Since each uses their own format to describe vulnerabilities, a client tracking vulnerabilities across multiple databases must handle each completely separately. Sharing of vulnerabilities between databases is also difficult.

The Google Open Source Security team, Go team, and the broader open-source community have been developing a simple vulnerability interchange schema for describing vulnerabilities that’s designed from the beginning for open-source ecosystems. After starting work on the schema a few months ago, we requested public feedback and received hundreds of comments. We have incorporated the input from readers to arrive at the current schema:

{

        “id”: string,

        “modified”: string,

        “published”: string,

        “withdrawn”: string,

        “aliases”: [ string ],

        “related”: [ string ],

        “package”: {

                “ecosystem”: string,

                “name”: string,

                “purl”: string,

        },

        “summary”: string,

        “details”: string,

        “affects”: [ {

                “ranges”: [ {

                        “type”: string,

                        “repo”: string,

                        “introduced”: string,

                        “fixed”: string

                } ],

                “versions”: [ string ]

        } ],

        “references”: [ {

                “type”: string,

                “url”: string

        } ],

        “ecosystem_specific”: { see spec },

        “database_specific”: { see spec },

}

This new vulnerability schema aims to address some key problems with managing vulnerabilities in open source. We found that there was no existing standard format which:

  • Enforces version specification that precisely matches naming and versioning schemes used in actual open source package ecosystems. For instance, matching a vulnerability such as a CVE to a package name and set of versions in a package manager is difficult to do in an automated way using existing mechanisms such as CPEs.
  • Can be used to describe vulnerabilities in any open source ecosystem, while not requiring ecosystem-dependent logic to process them.
  • Is easy to use by both automated systems and humans.

With this schema we hope to define a format that all vulnerability databases can export. A unified format means that vulnerability databases, open source users, and security researchers can easily share tooling and consume vulnerabilities across all of open source. This means a more complete view of vulnerabilities in open source for everyone, as well as faster detection and remediation times resulting from easier automation.

The current state


The vulnerability schema spec has gone through several iterations, and we are inviting further feedback as it gets closer to finalized. A number of public vulnerability databases today are already exporting this format, with more in the pipeline:

The OSV service has also aggregated all of these vulnerability databases, which are viewable at our web UI. They can also be queried with a single command via the same existing APIs:


  curl X POST d \

      ‘{“commit”: “a46c08c533cfdf10260e74e2c03fa84a13b6c456”}’ \

      “https://api.osv.dev/v1/query”

    

  curl X POST d \

      ‘{“version”: “2.4.1”, “package”: {“name”: “jinja2”, “ecosystem”: “PyPI”}}’ \

      “https://api.osv.dev/v1/query”


Automating vulnerability database maintenance


Producing quality vulnerability data is also difficult. In addition to OSV’s existing automation, we built more automation tools for vulnerability database maintenance, and used these tools to bootstrap the community Python advisory database. This automation takes existing feeds, accurately matches them to packages, and generates entries containing precise, validated version ranges with minimal human intervention. We plan to extend this tooling to other ecosystems for which there is no existing vulnerability database, or little support for ongoing database maintenance.


Get involved


Thank you to all the open source developers who have provided feedback and adopted this format. We’re continuing to work with open source communities to develop this further and earn more widespread adoption in all ecosystems. If you are interested in adopting this format, we’d appreciate any feedback on our public spec.