Introduction to cache (Beta) task
Cache tasks are extremely useful in scenarios where we need to repeatedly download dependencies for each pipeline run. Without caching, this process can be highly time-consuming, often involving hundreds or even thousands of network calls. By using caching, we can significantly reduce build times, optimize pipeline efficiency, and minimize unnecessary network overhead.
When it comes to Azure Pipelines, especially when using Microsoft-hosted agents, the environments are destroyed immediately after the pipeline execution ends.
Whenever you start a new pipeline run, a fresh environment is created specifically for that run. This environment contains only the executables and services required for the CI process. This is where caching becomes important.
However, remember that caching is effective in improving pipeline execution time only if the time needed to restore and save the cache is actually less than the time required to recreate the outputs from scratch or download the dependencies.
As a result, there may be scenarios where setting up caching can negatively impact your execution time.
Using Caching in Azure Pipelines
Caching can be added to a pipeline using the Cache@2 pipeline task.
Below is an example configuration for caching NuGet packages:
- task: Cache@2
displayName: 'Cache node_modules'
inputs:
key: 'cachekey | "$(Agent.OS)" | $(Build.SourcesDirectory)/packages.json'
restoreKeys: |
cachekey | "$(Agent.OS)"
cachekey
path: node_modules
cacheHitVar: NODE_MODULE_CACHE_RESTORED # Will talk about this variable in the below example
Key Inputs of the Cache Task
The Cache@2 task has two primary required inputs: Key and Path.
Path:
The
path
input specifies the directory used to populate the cache during save operations and retrieve files during restore operations.In this example, the path is set to “node_modules”.
If this directory does not already exist, the pipeline will create it during execution and store the cached files there.
Key:
The
key
input serves as a unique identifier for the cache.In the example, the key is dynamically created using string cachekey and the system variable
$(Agent.OS)
, the build directory ($(Build.SourcesDirectory)
), and the location of a dependency file (packages.json
).This key helps maintain uniqueness by using the hash of the dependency file’s content. Whenever the file changes, a new key is generated, and a new cache is created, ensuring you always retrieve the appropriate cache.
Restore Keys:
The
restoreKeys
input in the Cache task is used as a fallback mechanism to retrieve a cache when the primary key (key
) specified does not result in a cache hit.If no cache is found for the primary key, the
restoreKeys
are checked in the order they are defined. The first matching restore key is used to retrieve the cache.
CacheHitVariable:
- The cache hit variable (e.g.,
cacheHitVar
) in the Cache task is a custom variable you define to indicate whether a cache was successfully restored during the pipeline run. It is set totrue
if the cache was found and restored, andfalse
otherwise.
- The cache hit variable (e.g.,
- task: CmdLine@2
displayName: install dependencies
retryCountOnTaskFailure: 3
condition: and(succeeded(), eq(variables['NODE_MODULE_CACHE_RESTORED'], 'false'))
inputs:
script: yarn install --frozen-lockfile
eq(variables['NODE_MODULE_CACHE_RESTORED'], 'false')
: This condition checks if the NODE_MODULE_CACHE_RESTORED
variable is set to 'false'
. The variable likely indicates whether a cache was successfully restored for node_modules
. If it's 'false'
, it means the cache was not restored, and the task will install dependencies.
Understanding How yarn.lock
is Generated and Used in Pipelines
Since this example involves a JavaScript-based project, I am using Yarn as the package manager. However, if the project is based on a different technology stack, the package manager could be different, please check out the below documentation to find more information.
Eg:
For a Python project, the equivalent tool could be pip or Poetry.
For a .NET project, it could be NuGet.
For a Java project, it could be Maven or Gradle.
Lets see, How is the yarn.lock
file generated in our scenario?
The
yarn.lock
file is automatically created when the commandyarn install
is executed.When the
yarn install
command runs, it:Reads the
package.json
file in the source directory.Resolves all dependencies listed in the
package.json
file.Locks these resolved dependencies (along with their exact versions and sub-dependencies) in a file called
yarn.lock
.
yarn install
Installs all dependencies listed in
package.json
.Updates the
yarn.lock
file if there are changes inpackage.json
.Creates a new
yarn.lock
file if one doesn't exist.Used for local development or when updating dependencies and the lock file.
How is yarn.lock
used in pipelines?
In subsequent pipeline runs, the
yarn.lock
file ensures the same dependency versions are installed, avoiding discrepancies.During a pipeline run:
- If the
yarn.lock
file exists, Yarn uses it to install the exact versions of dependencies defined in the file.
- If the
yarn install --frozen-lockfile
Strictly adheres to the existing
yarn.lock
file without modifying it.Does not generate a new lock file.
Ideal for CI/CD pipelines to ensure consistent dependency versions and prevent accidental lock file updates.
Demo - Yarn in local
Explain—————————————————————!
Demo - Cache in Pipelines
Explain—————————————————————!
Comparison of Pipelines:
No Cache Pipeline
Total Time: 22 seconds
Dependency Installation Time: 13 seconds (59%)
Cache Miss Pipeline
Total Time: 28 seconds
Dependency Installation Time: 16 seconds (57%)
Cache Hit Pipeline
Total Time: 19 seconds
Dependency Installation Time: 6 seconds (32%)
Check out this Microsoft documentation to find more information about cache task
https://learn.microsoft.com/en-us/azure/devops/pipelines/release/caching?view=azure-devops