Azure DevOps Pipelines with Caching

Azure DevOps Pipelines with Caching

Introduction to cache (Beta) task

Cache tasks are extremely useful in scenarios where we need to repeatedly download dependencies for each pipeline run. Without caching, this process can be highly time-consuming, often involving hundreds or even thousands of network calls. By using caching, we can significantly reduce build times, optimize pipeline efficiency, and minimize unnecessary network overhead.

When it comes to Azure Pipelines, especially when using Microsoft-hosted agents, the environments are destroyed immediately after the pipeline execution ends.

Whenever you start a new pipeline run, a fresh environment is created specifically for that run. This environment contains only the executables and services required for the CI process. This is where caching becomes important.

However, remember that caching is effective in improving pipeline execution time only if the time needed to restore and save the cache is actually less than the time required to recreate the outputs from scratch or download the dependencies.

As a result, there may be scenarios where setting up caching can negatively impact your execution time.

Using Caching in Azure Pipelines

Caching can be added to a pipeline using the Cache@2 pipeline task.

Below is an example configuration for caching NuGet packages:

- task: Cache@2
      displayName: 'Cache node_modules'
      inputs:
        key: 'cachekey | "$(Agent.OS)" | $(Build.SourcesDirectory)/packages.json'
        restoreKeys: |
          cachekey | "$(Agent.OS)"
          cachekey 
        path: node_modules
        cacheHitVar: NODE_MODULE_CACHE_RESTORED # Will talk about this variable in the below example

Key Inputs of the Cache Task

The Cache@2 task has two primary required inputs: Key and Path.

  1. Path:

    • The path input specifies the directory used to populate the cache during save operations and retrieve files during restore operations.

    • In this example, the path is set to “node_modules”.

    • If this directory does not already exist, the pipeline will create it during execution and store the cached files there.

  2. Key:

    • The key input serves as a unique identifier for the cache.

    • In the example, the key is dynamically created using string cachekey and the system variable $(Agent.OS), the build directory ($(Build.SourcesDirectory)), and the location of a dependency file (packages.json).

    • This key helps maintain uniqueness by using the hash of the dependency file’s content. Whenever the file changes, a new key is generated, and a new cache is created, ensuring you always retrieve the appropriate cache.

  3. Restore Keys:

    • The restoreKeys input in the Cache task is used as a fallback mechanism to retrieve a cache when the primary key (key) specified does not result in a cache hit.

    • If no cache is found for the primary key, the restoreKeys are checked in the order they are defined. The first matching restore key is used to retrieve the cache.

  4. CacheHitVariable:

    • The cache hit variable (e.g., cacheHitVar) in the Cache task is a custom variable you define to indicate whether a cache was successfully restored during the pipeline run. It is set to true if the cache was found and restored, and false otherwise.
    - task: CmdLine@2
      displayName: install dependencies
      retryCountOnTaskFailure: 3
      condition: and(succeeded(), eq(variables['NODE_MODULE_CACHE_RESTORED'], 'false'))
      inputs:
          script: yarn install --frozen-lockfile

eq(variables['NODE_MODULE_CACHE_RESTORED'], 'false'): This condition checks if the NODE_MODULE_CACHE_RESTORED variable is set to 'false'. The variable likely indicates whether a cache was successfully restored for node_modules. If it's 'false', it means the cache was not restored, and the task will install dependencies.

Understanding How yarn.lock is Generated and Used in Pipelines

Since this example involves a JavaScript-based project, I am using Yarn as the package manager. However, if the project is based on a different technology stack, the package manager could be different, please check out the below documentation to find more information.

Eg:

  • For a Python project, the equivalent tool could be pip or Poetry.

  • For a .NET project, it could be NuGet.

  • For a Java project, it could be Maven or Gradle.

Lets see, How is the yarn.lock file generated in our scenario?

  • The yarn.lock file is automatically created when the command yarn install is executed.

  • When the yarn install command runs, it:

    • Reads the package.json file in the source directory.

    • Resolves all dependencies listed in the package.json file.

    • Locks these resolved dependencies (along with their exact versions and sub-dependencies) in a file called yarn.lock.

yarn install

  • Installs all dependencies listed in package.json.

  • Updates the yarn.lock file if there are changes in package.json.

  • Creates a new yarn.lock file if one doesn't exist.

  • Used for local development or when updating dependencies and the lock file.

How is yarn.lock used in pipelines?

  • In subsequent pipeline runs, the yarn.lock file ensures the same dependency versions are installed, avoiding discrepancies.

  • During a pipeline run:

    • If the yarn.lock file exists, Yarn uses it to install the exact versions of dependencies defined in the file.

yarn install --frozen-lockfile

  • Strictly adheres to the existing yarn.lock file without modifying it.

  • Does not generate a new lock file.

  • Ideal for CI/CD pipelines to ensure consistent dependency versions and prevent accidental lock file updates.

Demo - Yarn in local

Explain—————————————————————!

Demo - Cache in Pipelines

Explain—————————————————————!

Comparison of Pipelines:

No Cache Pipeline

  • Total Time: 22 seconds

  • Dependency Installation Time: 13 seconds (59%)

Cache Miss Pipeline

  • Total Time: 28 seconds

  • Dependency Installation Time: 16 seconds (57%)

Cache Hit Pipeline

  • Total Time: 19 seconds

  • Dependency Installation Time: 6 seconds (32%)

Check out this Microsoft documentation to find more information about cache task

https://learn.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops

Did you find this article valuable?

Support @CNTKR's blog by becoming a sponsor. Any amount is appreciated!