diff --git a/.coveragerc b/.coveragerc
index 1bf19c310aa..3ba0b9591e0 100644
--- a/.coveragerc
+++ b/.coveragerc
@@ -1,7 +1,7 @@
[run]
omit =
- xarray/tests/*
- xarray/core/dask_array_compat.py
- xarray/core/npcompat.py
- xarray/core/pdcompat.py
- xarray/core/pycompat.py
+ */xarray/tests/*
+ */xarray/core/dask_array_compat.py
+ */xarray/core/npcompat.py
+ */xarray/core/pdcompat.py
+ */xarray/core/pycompat.py
diff --git a/.github/ISSUE_TEMPLATE/bug-report.md b/.github/ISSUE_TEMPLATE/bug-report.md
new file mode 100644
index 00000000000..02bc5d0f7b0
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/bug-report.md
@@ -0,0 +1,39 @@
+---
+name: Bug report
+about: Create a report to help us improve
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+
+
+**What happened**:
+
+**What you expected to happen**:
+
+**Minimal Complete Verifiable Example**:
+
+```python
+# Put your MCVE code here
+```
+
+**Anything else we need to know?**:
+
+**Environment**:
+
+Output of xr.show_versions()
+
+
+
+
+
diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
deleted file mode 100644
index c712cf27979..00000000000
--- a/.github/ISSUE_TEMPLATE/bug_report.md
+++ /dev/null
@@ -1,35 +0,0 @@
----
-name: Bug report / Feature request
-about: 'Post a problem or idea'
-title: ''
-labels: ''
-assignees: ''
-
----
-
-
-
-
-#### MCVE Code Sample
-
-
-```python
-# Your code here
-
-```
-
-#### Expected Output
-
-
-#### Problem Description
-
-
-
-#### Versions
-
-Output of xr.show_versions()
-
-
-
-
-
diff --git a/.github/ISSUE_TEMPLATE/config.yml b/.github/ISSUE_TEMPLATE/config.yml
new file mode 100644
index 00000000000..0ad7e5f3e13
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/config.yml
@@ -0,0 +1,8 @@
+blank_issues_enabled: false
+contact_links:
+ - name: Usage question
+ url: https://github.com/pydata/xarray/discussions
+ about: |
+ Ask questions and discuss with other community members here.
+ If you have a question like "How do I concatenate a list of datasets?" then
+ please include a self-contained reproducible example if possible.
diff --git a/.github/ISSUE_TEMPLATE/feature-request.md b/.github/ISSUE_TEMPLATE/feature-request.md
new file mode 100644
index 00000000000..7021fe490aa
--- /dev/null
+++ b/.github/ISSUE_TEMPLATE/feature-request.md
@@ -0,0 +1,22 @@
+---
+name: Feature request
+about: Suggest an idea for this project
+title: ''
+labels: ''
+assignees: ''
+
+---
+
+
+
+**Is your feature request related to a problem? Please describe.**
+A clear and concise description of what the problem is. Ex. I'm always frustrated when [...]
+
+**Describe the solution you'd like**
+A clear and concise description of what you want to happen.
+
+**Describe alternatives you've considered**
+A clear and concise description of any alternative solutions or features you've considered.
+
+**Additional context**
+Add any other context about the feature request here.
diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
index a921bddaa23..09ef053bb39 100644
--- a/.github/PULL_REQUEST_TEMPLATE.md
+++ b/.github/PULL_REQUEST_TEMPLATE.md
@@ -1,6 +1,15 @@
- - [ ] Closes #xxxx
- - [ ] Tests added
- - [ ] Passes `isort -rc . && black . && mypy . && flake8`
- - [ ] Fully documented, including `whats-new.rst` for all changes and `api.rst` for new API
+- [ ] Closes #xxxx
+- [ ] Tests added
+- [ ] Passes `pre-commit run --all-files`
+- [ ] User visible changes (including notable bug fixes) are documented in `whats-new.rst`
+- [ ] New functions/methods are listed in `api.rst`
+
+
+
+
+ Overriding CI behaviors
+
+ By default, the upstream dev CI is disabled on pull request and push events. You can override this behavior per commit by adding a [test-upstream] tag to the first line of the commit message. For documentation-only commits, you can skip the CI per commit by adding a [skip-ci] tag to the first line of the commit message
+
diff --git a/.github/actions/detect-ci-trigger/action.yaml b/.github/actions/detect-ci-trigger/action.yaml
new file mode 100644
index 00000000000..c255d0c57cc
--- /dev/null
+++ b/.github/actions/detect-ci-trigger/action.yaml
@@ -0,0 +1,29 @@
+name: Detect CI Trigger
+description: |
+ Detect a keyword used to control the CI in the subject line of a commit message.
+inputs:
+ keyword:
+ description: |
+ The keyword to detect.
+ required: true
+outputs:
+ trigger-found:
+ description: |
+ true if the keyword has been found in the subject line of the commit message
+ value: ${{ steps.detect-trigger.outputs.CI_TRIGGERED }}
+runs:
+ using: "composite"
+ steps:
+ - name: detect trigger
+ id: detect-trigger
+ run: |
+ bash $GITHUB_ACTION_PATH/script.sh ${{ github.event_name }} ${{ inputs.keyword }}
+ shell: bash
+ - name: show detection result
+ run: |
+ echo "::group::final summary"
+ echo "commit message: ${{ steps.detect-trigger.outputs.COMMIT_MESSAGE }}"
+ echo "trigger keyword: ${{ inputs.keyword }}"
+ echo "trigger found: ${{ steps.detect-trigger.outputs.CI_TRIGGERED }}"
+ echo "::endgroup::"
+ shell: bash
diff --git a/.github/actions/detect-ci-trigger/script.sh b/.github/actions/detect-ci-trigger/script.sh
new file mode 100644
index 00000000000..c98175a5a08
--- /dev/null
+++ b/.github/actions/detect-ci-trigger/script.sh
@@ -0,0 +1,47 @@
+#!/usr/bin/env bash
+event_name="$1"
+keyword="$2"
+
+echo "::group::fetch a sufficient number of commits"
+echo "skipped"
+# git log -n 5 2>&1
+# if [[ "$event_name" == "pull_request" ]]; then
+# ref=$(git log -1 --format='%H')
+# git -c protocol.version=2 fetch --deepen=2 --no-tags --prune --progress -q origin $ref 2>&1
+# git log FETCH_HEAD
+# git checkout FETCH_HEAD
+# else
+# echo "nothing to do."
+# fi
+# git log -n 5 2>&1
+echo "::endgroup::"
+
+echo "::group::extracting the commit message"
+echo "event name: $event_name"
+if [[ "$event_name" == "pull_request" ]]; then
+ ref="HEAD^2"
+else
+ ref="HEAD"
+fi
+
+commit_message="$(git log -n 1 --pretty=format:%s "$ref")"
+
+if [[ $(echo $commit_message | wc -l) -le 1 ]]; then
+ echo "commit message: '$commit_message'"
+else
+ echo -e "commit message:\n--- start ---\n$commit_message\n--- end ---"
+fi
+echo "::endgroup::"
+
+echo "::group::scanning for the keyword"
+echo "searching for: '$keyword'"
+if echo "$commit_message" | grep -qF "$keyword"; then
+ result="true"
+else
+ result="false"
+fi
+echo "keyword detected: $result"
+echo "::endgroup::"
+
+echo "::set-output name=COMMIT_MESSAGE::$commit_message"
+echo "::set-output name=CI_TRIGGERED::$result"
diff --git a/.github/stale.yml b/.github/stale.yml
index f4835b5eeec..f4057844d01 100644
--- a/.github/stale.yml
+++ b/.github/stale.yml
@@ -56,4 +56,4 @@ limitPerRun: 1 # start with a small number
# issues:
# exemptLabels:
-# - confirmed
\ No newline at end of file
+# - confirmed
diff --git a/.github/workflows/ci-additional.yaml b/.github/workflows/ci-additional.yaml
new file mode 100644
index 00000000000..fdc61f2f4f7
--- /dev/null
+++ b/.github/workflows/ci-additional.yaml
@@ -0,0 +1,191 @@
+name: CI Additional
+on:
+ push:
+ branches:
+ - "*"
+ pull_request:
+ branches:
+ - "*"
+ workflow_dispatch: # allows you to trigger manually
+
+jobs:
+ detect-ci-trigger:
+ name: detect ci trigger
+ runs-on: ubuntu-latest
+ if: github.event_name == 'push' || github.event_name == 'pull_request'
+ outputs:
+ triggered: ${{ steps.detect-trigger.outputs.trigger-found }}
+ steps:
+ - uses: actions/checkout@v2
+ with:
+ fetch-depth: 2
+ - uses: ./.github/actions/detect-ci-trigger
+ id: detect-trigger
+ with:
+ keyword: "[skip-ci]"
+
+ test:
+ name: ${{ matrix.os }} ${{ matrix.env }}
+ runs-on: ${{ matrix.os }}
+ needs: detect-ci-trigger
+ if: needs.detect-ci-trigger.outputs.triggered == 'false'
+ defaults:
+ run:
+ shell: bash -l {0}
+ strategy:
+ fail-fast: false
+ matrix:
+ os: ["ubuntu-latest"]
+ env:
+ [
+ "py37-bare-minimum",
+ "py37-min-all-deps",
+ "py37-min-nep18",
+ "py38-all-but-dask",
+ "py38-backend-api-v2",
+ "py38-flaky",
+ ]
+ steps:
+ - name: Cancel previous runs
+ uses: styfle/cancel-workflow-action@0.6.0
+ with:
+ access_token: ${{ github.token }}
+ - uses: actions/checkout@v2
+ with:
+ fetch-depth: 0 # Fetch all history for all branches and tags.
+
+ - name: Set environment variables
+ run: |
+ if [[ ${{ matrix.env }} == "py38-backend-api-v2" ]] ;
+ then
+ echo "CONDA_ENV_FILE=ci/requirements/environment.yml" >> $GITHUB_ENV
+ echo "XARRAY_BACKEND_API=v2" >> $GITHUB_ENV
+
+ elif [[ ${{ matrix.env }} == "py38-flaky" ]] ;
+ then
+ echo "CONDA_ENV_FILE=ci/requirements/environment.yml" >> $GITHUB_ENV
+ echo "PYTEST_EXTRA_FLAGS=--run-flaky --run-network-tests" >> $GITHUB_ENV
+
+ else
+ echo "CONDA_ENV_FILE=ci/requirements/${{ matrix.env }}.yml" >> $GITHUB_ENV
+ fi
+ - name: Cache conda
+ uses: actions/cache@v2
+ with:
+ path: ~/conda_pkgs_dir
+ key:
+ ${{ runner.os }}-conda-${{ matrix.env }}-${{
+ hashFiles('ci/requirements/**.yml') }}
+
+ - uses: conda-incubator/setup-miniconda@v2
+ with:
+ channels: conda-forge
+ channel-priority: strict
+ mamba-version: "*"
+ activate-environment: xarray-tests
+ auto-update-conda: false
+ python-version: 3.8
+ use-only-tar-bz2: true
+
+ - name: Install conda dependencies
+ run: |
+ mamba env update -f $CONDA_ENV_FILE
+
+ - name: Install xarray
+ run: |
+ python -m pip install --no-deps -e .
+
+ - name: Version info
+ run: |
+ conda info -a
+ conda list
+ python xarray/util/print_versions.py
+ - name: Import xarray
+ run: |
+ python -c "import xarray"
+ - name: Run tests
+ run: |
+ python -m pytest -n 4 \
+ --cov=xarray \
+ --cov-report=xml \
+ $PYTEST_EXTRA_FLAGS
+
+ - name: Upload code coverage to Codecov
+ uses: codecov/codecov-action@v1
+ with:
+ file: ./coverage.xml
+ flags: unittests,${{ matrix.env }}
+ env_vars: RUNNER_OS
+ name: codecov-umbrella
+ fail_ci_if_error: false
+ doctest:
+ name: Doctests
+ runs-on: "ubuntu-latest"
+ needs: detect-ci-trigger
+ if: needs.detect-ci-trigger.outputs.triggered == 'false'
+ defaults:
+ run:
+ shell: bash -l {0}
+
+ steps:
+ - name: Cancel previous runs
+ uses: styfle/cancel-workflow-action@0.6.0
+ with:
+ access_token: ${{ github.token }}
+ - uses: actions/checkout@v2
+ with:
+ fetch-depth: 0 # Fetch all history for all branches and tags.
+ - uses: conda-incubator/setup-miniconda@v2
+ with:
+ channels: conda-forge
+ channel-priority: strict
+ mamba-version: "*"
+ activate-environment: xarray-tests
+ auto-update-conda: false
+ python-version: "3.8"
+
+ - name: Install conda dependencies
+ run: |
+ mamba env update -f ci/requirements/environment.yml
+ - name: Install xarray
+ run: |
+ python -m pip install --no-deps -e .
+ - name: Version info
+ run: |
+ conda info -a
+ conda list
+ python xarray/util/print_versions.py
+ - name: Run doctests
+ run: |
+ python -m pytest --doctest-modules xarray --ignore xarray/tests
+
+ min-version-policy:
+ name: Minimum Version Policy
+ runs-on: "ubuntu-latest"
+ needs: detect-ci-trigger
+ if: needs.detect-ci-trigger.outputs.triggered == 'false'
+ defaults:
+ run:
+ shell: bash -l {0}
+
+ steps:
+ - name: Cancel previous runs
+ uses: styfle/cancel-workflow-action@0.6.0
+ with:
+ access_token: ${{ github.token }}
+ - uses: actions/checkout@v2
+ with:
+ fetch-depth: 0 # Fetch all history for all branches and tags.
+ - uses: conda-incubator/setup-miniconda@v2
+ with:
+ channels: conda-forge
+ channel-priority: strict
+ mamba-version: "*"
+ auto-update-conda: false
+ python-version: "3.8"
+
+ - name: minimum versions policy
+ run: |
+ mamba install -y pyyaml conda
+ python ci/min_deps_check.py ci/requirements/py37-bare-minimum.yml
+ python ci/min_deps_check.py ci/requirements/py37-min-all-deps.yml
diff --git a/.github/workflows/ci-pre-commit.yml b/.github/workflows/ci-pre-commit.yml
new file mode 100644
index 00000000000..1ab5642367e
--- /dev/null
+++ b/.github/workflows/ci-pre-commit.yml
@@ -0,0 +1,16 @@
+name: linting
+
+on:
+ push:
+ branches: "*"
+ pull_request:
+ branches: "*"
+
+jobs:
+ linting:
+ name: "pre-commit hooks"
+ runs-on: ubuntu-latest
+ steps:
+ - uses: actions/checkout@v2
+ - uses: actions/setup-python@v2
+ - uses: pre-commit/action@v2.0.0
diff --git a/.github/workflows/ci.yaml b/.github/workflows/ci.yaml
new file mode 100644
index 00000000000..7d7326eb5c2
--- /dev/null
+++ b/.github/workflows/ci.yaml
@@ -0,0 +1,104 @@
+name: CI
+on:
+ push:
+ branches:
+ - "*"
+ pull_request:
+ branches:
+ - "*"
+ workflow_dispatch: # allows you to trigger manually
+
+jobs:
+ detect-ci-trigger:
+ name: detect ci trigger
+ runs-on: ubuntu-latest
+ if: github.event_name == 'push' || github.event_name == 'pull_request'
+ outputs:
+ triggered: ${{ steps.detect-trigger.outputs.trigger-found }}
+ steps:
+ - uses: actions/checkout@v2
+ with:
+ fetch-depth: 2
+ - uses: ./.github/actions/detect-ci-trigger
+ id: detect-trigger
+ with:
+ keyword: "[skip-ci]"
+ test:
+ name: ${{ matrix.os }} py${{ matrix.python-version }}
+ runs-on: ${{ matrix.os }}
+ needs: detect-ci-trigger
+ if: needs.detect-ci-trigger.outputs.triggered == 'false'
+ defaults:
+ run:
+ shell: bash -l {0}
+ strategy:
+ fail-fast: false
+ matrix:
+ os: ["ubuntu-latest", "macos-latest", "windows-latest"]
+ python-version: ["3.7", "3.8"]
+ steps:
+ - name: Cancel previous runs
+ uses: styfle/cancel-workflow-action@0.6.0
+ with:
+ access_token: ${{ github.token }}
+ - uses: actions/checkout@v2
+ with:
+ fetch-depth: 0 # Fetch all history for all branches and tags.
+ - name: Set environment variables
+ run: |
+ if [[ ${{ matrix.os }} == windows* ]] ;
+ then
+ echo "CONDA_ENV_FILE=ci/requirements/environment-windows.yml" >> $GITHUB_ENV
+ else
+ echo "CONDA_ENV_FILE=ci/requirements/environment.yml" >> $GITHUB_ENV
+
+ fi
+ echo "PYTHON_VERSION=${{ matrix.python-version }}" >> $GITHUB_ENV
+
+ - name: Cache conda
+ uses: actions/cache@v2
+ with:
+ path: ~/conda_pkgs_dir
+ key:
+ ${{ runner.os }}-conda-py${{ matrix.python-version }}-${{
+ hashFiles('ci/requirements/**.yml') }}
+ - uses: conda-incubator/setup-miniconda@v2
+ with:
+ channels: conda-forge
+ channel-priority: strict
+ mamba-version: "*"
+ activate-environment: xarray-tests
+ auto-update-conda: false
+ python-version: ${{ matrix.python-version }}
+ use-only-tar-bz2: true
+
+ - name: Install conda dependencies
+ run: |
+ mamba env update -f $CONDA_ENV_FILE
+
+ - name: Install xarray
+ run: |
+ python -m pip install --no-deps -e .
+
+ - name: Version info
+ run: |
+ conda info -a
+ conda list
+ python xarray/util/print_versions.py
+ - name: Import xarray
+ run: |
+ python -c "import xarray"
+ - name: Run tests
+ run: |
+ python -m pytest -n 4 \
+ --cov=xarray \
+ --cov-report=xml
+
+ - name: Upload code coverage to Codecov
+ uses: codecov/codecov-action@v1
+ with:
+ file: ./coverage.xml
+ flags: unittests
+ env_vars: RUNNER_OS,PYTHON_VERSION
+ name: codecov-umbrella
+ fail_ci_if_error: false
diff --git a/.github/workflows/parse_logs.py b/.github/workflows/parse_logs.py
new file mode 100644
index 00000000000..4d3bea54e50
--- /dev/null
+++ b/.github/workflows/parse_logs.py
@@ -0,0 +1,57 @@
+# type: ignore
+import argparse
+import itertools
+import pathlib
+import textwrap
+
+parser = argparse.ArgumentParser()
+parser.add_argument("filepaths", nargs="+", type=pathlib.Path)
+args = parser.parse_args()
+
+filepaths = sorted(p for p in args.filepaths if p.is_file())
+
+
+def extract_short_test_summary_info(lines):
+ up_to_start_of_section = itertools.dropwhile(
+ lambda l: "=== short test summary info ===" not in l,
+ lines,
+ )
+ up_to_section_content = itertools.islice(up_to_start_of_section, 1, None)
+ section_content = itertools.takewhile(
+ lambda l: l.startswith("FAILED"), up_to_section_content
+ )
+ content = "\n".join(section_content)
+
+ return content
+
+
+def format_log_message(path):
+ py_version = path.name.split("-")[1]
+ summary = f"Python {py_version} Test Summary Info"
+ with open(path) as f:
+ data = extract_short_test_summary_info(line.rstrip() for line in f)
+ message = (
+ textwrap.dedent(
+ """\
+ {summary}
+
+ ```
+ {data}
+ ```
+
+
+ """
+ )
+ .rstrip()
+ .format(summary=summary, data=data)
+ )
+
+ return message
+
+
+print("Parsing logs ...")
+message = "\n\n".join(format_log_message(path) for path in filepaths)
+
+output_file = pathlib.Path("pytest-logs.txt")
+print(f"Writing output file to: {output_file.absolute()}")
+output_file.write_text(message)
diff --git a/.github/workflows/upstream-dev-ci.yaml b/.github/workflows/upstream-dev-ci.yaml
new file mode 100644
index 00000000000..dda762878c5
--- /dev/null
+++ b/.github/workflows/upstream-dev-ci.yaml
@@ -0,0 +1,174 @@
+name: CI Upstream
+on:
+ push:
+ branches:
+ - master
+ pull_request:
+ branches:
+ - master
+ schedule:
+ - cron: "0 0 * * *" # Daily “At 00:00” UTC
+ workflow_dispatch: # allows you to trigger the workflow run manually
+
+jobs:
+ detect-ci-trigger:
+ name: detect upstream-dev ci trigger
+ runs-on: ubuntu-latest
+ if: github.event_name == 'push' || github.event_name == 'pull_request'
+ outputs:
+ triggered: ${{ steps.detect-trigger.outputs.trigger-found }}
+ steps:
+ - uses: actions/checkout@v2
+ with:
+ fetch-depth: 2
+ - uses: ./.github/actions/detect-ci-trigger
+ id: detect-trigger
+ with:
+ keyword: "[test-upstream]"
+
+ upstream-dev:
+ name: upstream-dev
+ runs-on: ubuntu-latest
+ needs: detect-ci-trigger
+ if: |
+ always()
+ && github.repository == 'pydata/xarray'
+ && (
+ (github.event_name == 'schedule' || github.event_name == 'workflow_dispatch')
+ || needs.detect-ci-trigger.outputs.triggered == 'true'
+ )
+ defaults:
+ run:
+ shell: bash -l {0}
+ strategy:
+ fail-fast: false
+ matrix:
+ python-version: ["3.8"]
+ outputs:
+ artifacts_availability: ${{ steps.status.outputs.ARTIFACTS_AVAILABLE }}
+ steps:
+ - name: Cancel previous runs
+ uses: styfle/cancel-workflow-action@0.6.0
+ with:
+ access_token: ${{ github.token }}
+ - uses: actions/checkout@v2
+ - uses: conda-incubator/setup-miniconda@v2
+ with:
+ channels: conda-forge
+ channel-priority: strict
+ mamba-version: "*"
+ activate-environment: xarray-tests
+ auto-update-conda: false
+ python-version: ${{ matrix.python-version }}
+ - name: Set up conda environment
+ run: |
+ mamba env update -f ci/requirements/environment.yml
+ bash ci/install-upstream-wheels.sh
+ - name: Version info
+ run: |
+ conda info -a
+ conda list
+ python xarray/util/print_versions.py
+ - name: import xarray
+ run: |
+ python -c 'import xarray'
+ - name: Run Tests
+ if: success()
+ id: status
+ run: |
+ set -euo pipefail
+ python -m pytest -rf | tee output-${{ matrix.python-version }}-log || (
+ echo '::set-output name=ARTIFACTS_AVAILABLE::true' && false
+ )
+ - name: Upload artifacts
+ if: |
+ failure()
+ && steps.status.outcome == 'failure'
+ && github.event_name == 'schedule'
+ && github.repository == 'pydata/xarray'
+ uses: actions/upload-artifact@v2
+ with:
+ name: output-${{ matrix.python-version }}-log
+ path: output-${{ matrix.python-version }}-log
+ retention-days: 5
+
+ report:
+ name: report
+ needs: upstream-dev
+ if: |
+ always()
+ && github.event_name == 'schedule'
+ && github.repository == 'pydata/xarray'
+ && needs.upstream-dev.outputs.artifacts_availability == 'true'
+ runs-on: ubuntu-latest
+ defaults:
+ run:
+ shell: bash
+ steps:
+ - uses: actions/checkout@v2
+ - uses: actions/setup-python@v2
+ with:
+ python-version: "3.x"
+ - uses: actions/download-artifact@v2
+ with:
+ path: /tmp/workspace/logs
+ - name: Move all log files into a single directory
+ run: |
+ rsync -a /tmp/workspace/logs/output-*/ ./logs
+ ls -R ./logs
+ - name: Parse logs
+ run: |
+ shopt -s globstar
+ python .github/workflows/parse_logs.py logs/**/*-log
+ - name: Report failures
+ uses: actions/github-script@v3
+ with:
+ github-token: ${{ secrets.GITHUB_TOKEN }}
+ script: |
+ const fs = require('fs');
+ const pytest_logs = fs.readFileSync('pytest-logs.txt', 'utf8');
+ const title = "⚠️ Nightly upstream-dev CI failed ⚠️"
+ const workflow_url = `https://github.com/${process.env.GITHUB_REPOSITORY}/actions/runs/${process.env.GITHUB_RUN_ID}`
+ const issue_body = `[Workflow Run URL](${workflow_url})\n${pytest_logs}`
+
+ // Run GraphQL query against GitHub API to find the most recent open issue used for reporting failures
+ const query = `query($owner:String!, $name:String!, $creator:String!, $label:String!){
+ repository(owner: $owner, name: $name) {
+ issues(first: 1, states: OPEN, filterBy: {createdBy: $creator, labels: [$label]}, orderBy: {field: CREATED_AT, direction: DESC}) {
+ edges {
+ node {
+ body
+ id
+ number
+ }
+ }
+ }
+ }
+ }`;
+
+ const variables = {
+ owner: context.repo.owner,
+ name: context.repo.repo,
+ label: 'CI',
+ creator: "github-actions[bot]"
+ }
+ const result = await github.graphql(query, variables)
+
+ // If no issue is open, create a new issue,
+ // else update the body of the existing issue.
+ if (result.repository.issues.edges.length === 0) {
+ github.issues.create({
+ owner: variables.owner,
+ repo: variables.name,
+ body: issue_body,
+ title: title,
+ labels: [variables.label]
+ })
+ } else {
+ github.issues.update({
+ owner: variables.owner,
+ repo: variables.name,
+ issue_number: result.repository.issues.edges[0].node.number,
+ body: issue_body
+ })
+ }
diff --git a/.landscape.yml b/.landscape.yml
deleted file mode 100644
index 754c5715463..00000000000
--- a/.landscape.yml
+++ /dev/null
@@ -1,14 +0,0 @@
-doc-warnings: yes
-test-warnings: yes
-strictness: medium
-max-line-length: 79
-autodetect: yes
-ignore-paths:
- - ci
- - doc
- - examples
- - LICENSES
- - notebooks
-pylint:
- disable:
- - dangerous-default-value
diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml
index 26bf4803ef6..b0fa21a7bf9 100644
--- a/.pre-commit-config.yaml
+++ b/.pre-commit-config.yaml
@@ -1,24 +1,34 @@
# https://pre-commit.com/
repos:
+ - repo: https://github.com/pre-commit/pre-commit-hooks
+ rev: v3.4.0
+ hooks:
+ - id: trailing-whitespace
+ - id: end-of-file-fixer
+ - id: check-yaml
# isort should run before black as black sometimes tweaks the isort output
- - repo: https://github.com/timothycrosley/isort
- rev: 4.3.21-2
+ - repo: https://github.com/PyCQA/isort
+ rev: 5.7.0
hooks:
- id: isort
- files: .+\.py$
# https://github.com/python/black#version-control-integration
- - repo: https://github.com/python/black
- rev: stable
+ - repo: https://github.com/psf/black
+ rev: 20.8b1
hooks:
- id: black
+ - repo: https://github.com/keewis/blackdoc
+ rev: v0.3.2
+ hooks:
+ - id: blackdoc
- repo: https://gitlab.com/pycqa/flake8
- rev: 3.7.9
+ rev: 3.8.4
hooks:
- id: flake8
- repo: https://github.com/pre-commit/mirrors-mypy
- rev: v0.761 # Must match ci/requirements/*.yml
+ rev: v0.790 # Must match ci/requirements/*.yml
hooks:
- id: mypy
+ exclude: "properties|asv_bench"
# run this occasionally, ref discussion https://github.com/pydata/xarray/pull/3194
# - repo: https://github.com/asottile/pyupgrade
# rev: v1.22.1
diff --git a/CONTRIBUTING.md b/CONTRIBUTING.md
new file mode 100644
index 00000000000..7a909aefd08
--- /dev/null
+++ b/CONTRIBUTING.md
@@ -0,0 +1 @@
+Xarray's contributor guidelines [can be found in our online documentation](http://xarray.pydata.org/en/stable/contributing.html)
diff --git a/HOW_TO_RELEASE.md b/HOW_TO_RELEASE.md
index 3fdd1d7236d..5352d427909 100644
--- a/HOW_TO_RELEASE.md
+++ b/HOW_TO_RELEASE.md
@@ -1,70 +1,105 @@
-How to issue an xarray release in 16 easy steps
+# How to issue an xarray release in 20 easy steps
Time required: about an hour.
+These instructions assume that `upstream` refers to the main repository:
+
+```sh
+$ git remote -v
+{...}
+upstream https://github.com/pydata/xarray (fetch)
+upstream https://github.com/pydata/xarray (push)
+```
+
+
+
1. Ensure your master branch is synced to upstream:
- ```
- git pull upstream master
- ```
- 2. Look over whats-new.rst and the docs. Make sure "What's New" is complete
- (check the date!) and consider adding a brief summary note describing the
- release at the top.
+ ```sh
+ git switch master
+ git pull upstream master
+ ```
+ 2. Confirm there are no commits on stable that are not yet merged
+ ([ref](https://github.com/pydata/xarray/pull/4440)):
+ ```sh
+ git merge upstream stable
+ ```
+ 2. Add a list of contributors with:
+ ```sh
+ git log "$(git tag --sort="v:refname" | sed -n 'x;$p').." --format=%aN | sort -u | perl -pe 's/\n/$1, /'
+ ```
+ or by substituting the _previous_ release in {0.X.Y-1}:
+ ```sh
+ git log v{0.X.Y-1}.. --format=%aN | sort -u | perl -pe 's/\n/$1, /'
+ ```
+ This will return the number of contributors:
+ ```sh
+ git log v{0.X.Y-1}.. --format=%aN | sort -u | wc -l
+ ```
+ 3. Write a release summary: ~50 words describing the high level features. This
+ will be used in the release emails, tweets, GitHub release notes, etc.
+ 4. Look over whats-new.rst and the docs. Make sure "What's New" is complete
+ (check the date!) and add the release summary at the top.
Things to watch out for:
- Important new features should be highlighted towards the top.
- Function/method references should include links to the API docs.
- Sometimes notes get added in the wrong section of whats-new, typically
due to a bad merge. Check for these before a release by using git diff,
- e.g., `git diff v0.X.Y whats-new.rst` where 0.X.Y is the previous
+ e.g., `git diff v{0.X.Y-1} whats-new.rst` where {0.X.Y-1} is the previous
release.
- 3. If you have any doubts, run the full test suite one final time!
- ```
+ 5. If possible, open a PR with the release summary and whatsnew changes.
+ 6. After merging, again ensure your master branch is synced to upstream:
+ ```sh
+ git pull upstream master
+ ```
+ 7. If you have any doubts, run the full test suite one final time!
+ ```sh
pytest
```
- 4. Check that the ReadTheDocs build is passing.
- 5. On the master branch, commit the release in git:
- ```
- git commit -am 'Release v0.X.Y'
+ 8. Check that the ReadTheDocs build is passing.
+ 9. On the master branch, commit the release in git:
+ ```sh
+ git commit -am 'Release v{0.X.Y}'
```
- 6. Tag the release:
+10. Tag the release:
+ ```sh
+ git tag -a v{0.X.Y} -m 'v{0.X.Y}'
```
- git tag -a v0.X.Y -m 'v0.X.Y'
- ```
- 7. Build source and binary wheels for pypi:
- ```
- git clean -xdf # this deletes all uncommited changes!
+11. Build source and binary wheels for PyPI:
+ ```sh
+ git clean -xdf # this deletes all uncommitted changes!
python setup.py bdist_wheel sdist
```
- 8. Use twine to check the package build:
+12. Use twine to check the package build:
+ ```sh
+ twine check dist/xarray-{0.X.Y}*
```
- twine check dist/xarray-0.X.Y*
- ```
- 9. Use twine to register and upload the release on pypi. Be careful, you can't
+13. Use twine to register and upload the release on PyPI. Be careful, you can't
take this back!
- ```
- twine upload dist/xarray-0.X.Y*
+ ```sh
+ twine upload dist/xarray-{0.X.Y}*
```
You will need to be listed as a package owner at
- https://pypi.python.org/pypi/xarray for this to work.
-10. Push your changes to master:
- ```
+ for this to work.
+14. Push your changes to master:
+ ```sh
git push upstream master
git push upstream --tags
```
-11. Update the stable branch (used by ReadTheDocs) and switch back to master:
- ```
- git checkout stable
+15. Update the stable branch (used by ReadTheDocs) and switch back to master:
+ ```sh
+ git switch stable
git rebase master
- git push upstream stable
- git checkout master
+ git push --force upstream stable
+ git switch master
```
- It's OK to force push to 'stable' if necessary. (We also update the stable
- branch with `git cherrypick` for documentation only fixes that apply the
+ It's OK to force push to `stable` if necessary. (We also update the stable
+ branch with `git cherry-pick` for documentation only fixes that apply the
current released version.)
-12. Add a section for the next release (v.X.Y+1) to doc/whats-new.rst:
- ```
- .. _whats-new.0.X.Y+1:
+16. Add a section for the next release {0.X.Y+1} to doc/whats-new.rst:
+ ```rst
+ .. _whats-new.{0.X.Y+1}:
- v0.X.Y+1 (unreleased)
+ v{0.X.Y+1} (unreleased)
---------------------
Breaking changes
@@ -86,20 +121,20 @@ Time required: about an hour.
Internal Changes
~~~~~~~~~~~~~~~~
```
-13. Commit your changes and push to master again:
- ```
+17. Commit your changes and push to master again:
+ ```sh
git commit -am 'New whatsnew section'
git push upstream master
```
You're done pushing to master!
-14. Issue the release on GitHub. Click on "Draft a new release" at
- https://github.com/pydata/xarray/releases. Type in the version number, but
- don't bother to describe it -- we maintain that on the docs instead.
-15. Update the docs. Login to https://readthedocs.org/projects/xray/versions/
+18. Issue the release on GitHub. Click on "Draft a new release" at
+ . Type in the version number
+ and paste the release summary in the notes.
+19. Update the docs. Login to
and switch your new release tag (at the bottom) from "Inactive" to "Active".
It should now build automatically.
-16. Issue the release announcement! For bug fix releases, I usually only email
- xarray@googlegroups.com. For major/feature releases, I will email a broader
+20. Issue the release announcement to mailing lists & Twitter. For bug fix releases, I
+ usually only email xarray@googlegroups.com. For major/feature releases, I will email a broader
list (no more than once every 3-6 months):
- pydata@googlegroups.com
- xarray@googlegroups.com
@@ -109,18 +144,10 @@ Time required: about an hour.
Google search will turn up examples of prior release announcements (look for
"ANN xarray").
- You can get a list of contributors with:
- ```
- git log "$(git tag --sort="v:refname" | sed -n 'x;$p').." --format="%aN" | sort -u
- ```
- or by substituting the _previous_ release in:
- ```
- git log v0.X.Y-1.. --format="%aN" | sort -u
- ```
- NB: copying this output into a Google Groups form can cause
- [issues](https://groups.google.com/forum/#!topic/xarray/hK158wAviPs) with line breaks, so take care
-Note on version numbering:
+
+
+## Note on version numbering
We follow a rough approximation of semantic version. Only major releases (0.X.0)
should include breaking changes. Minor releases (0.X.Y) are for bug fixes and
diff --git a/MANIFEST.in b/MANIFEST.in
deleted file mode 100644
index cbfb8c8cdca..00000000000
--- a/MANIFEST.in
+++ /dev/null
@@ -1,7 +0,0 @@
-include LICENSE
-recursive-include licenses *
-recursive-include doc *
-prune doc/_build
-prune doc/generated
-global-exclude .DS_Store
-recursive-include xarray/static *
diff --git a/README.rst b/README.rst
index 5ee7234f221..e258a8ccd23 100644
--- a/README.rst
+++ b/README.rst
@@ -1,8 +1,8 @@
xarray: N-D labeled arrays and datasets
=======================================
-.. image:: https://dev.azure.com/xarray/xarray/_apis/build/status/pydata.xarray?branchName=master
- :target: https://dev.azure.com/xarray/xarray/_build/latest?definitionId=1&branchName=master
+.. image:: https://github.com/pydata/xarray/workflows/CI/badge.svg?branch=master
+ :target: https://github.com/pydata/xarray/actions?query=workflow%3ACI
.. image:: https://codecov.io/gh/pydata/xarray/branch/master/graph/badge.svg
:target: https://codecov.io/gh/pydata/xarray
.. image:: https://readthedocs.org/projects/xray/badge/?version=latest
@@ -13,6 +13,8 @@ xarray: N-D labeled arrays and datasets
:target: https://pypi.python.org/pypi/xarray/
.. image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/python/black
+.. image:: https://zenodo.org/badge/DOI/10.5281/zenodo.598201.svg
+ :target: https://doi.org/10.5281/zenodo.598201
**xarray** (formerly **xray**) is an open source project and Python package
diff --git a/asv_bench/benchmarks/indexing.py b/asv_bench/benchmarks/indexing.py
index c4cfbbbdfdf..859c41c913d 100644
--- a/asv_bench/benchmarks/indexing.py
+++ b/asv_bench/benchmarks/indexing.py
@@ -1,3 +1,5 @@
+import os
+
import numpy as np
import pandas as pd
@@ -138,3 +140,22 @@ def setup(self):
def time_indexing(self):
self.ds.isel(time=self.time_filter)
+
+
+class HugeAxisSmallSliceIndexing:
+ # https://github.com/pydata/xarray/pull/4560
+ def setup(self):
+ self.filepath = "test_indexing_huge_axis_small_slice.nc"
+ if not os.path.isfile(self.filepath):
+ xr.Dataset(
+ {"a": ("x", np.arange(10_000_000))},
+ coords={"x": np.arange(10_000_000)},
+ ).to_netcdf(self.filepath, format="NETCDF4")
+
+ self.ds = xr.open_dataset(self.filepath)
+
+ def time_indexing(self):
+ self.ds.isel(x=slice(100))
+
+ def cleanup(self):
+ self.ds.close()
diff --git a/asv_bench/benchmarks/pandas.py b/asv_bench/benchmarks/pandas.py
new file mode 100644
index 00000000000..42ef18ac0c2
--- /dev/null
+++ b/asv_bench/benchmarks/pandas.py
@@ -0,0 +1,24 @@
+import numpy as np
+import pandas as pd
+
+from . import parameterized
+
+
+class MultiIndexSeries:
+ def setup(self, dtype, subset):
+ data = np.random.rand(100000).astype(dtype)
+ index = pd.MultiIndex.from_product(
+ [
+ list("abcdefhijk"),
+ list("abcdefhijk"),
+ pd.date_range(start="2000-01-01", periods=1000, freq="B"),
+ ]
+ )
+ series = pd.Series(data, index)
+ if subset:
+ series = series[::3]
+ self.series = series
+
+ @parameterized(["dtype", "subset"], ([int, float], [True, False]))
+ def time_to_xarray(self, dtype, subset):
+ self.series.to_xarray()
diff --git a/azure-pipelines.yml b/azure-pipelines.yml
deleted file mode 100644
index ff85501c555..00000000000
--- a/azure-pipelines.yml
+++ /dev/null
@@ -1,128 +0,0 @@
-variables:
- pytest_extra_flags: ''
- allow_failure: false
- upstream_dev: false
-
-jobs:
-
-- job: Linux
- strategy:
- matrix:
- py36-bare-minimum:
- conda_env: py36-bare-minimum
- py36-min-all-deps:
- conda_env: py36-min-all-deps
- py36-min-nep18:
- conda_env: py36-min-nep18
- py36:
- conda_env: py36
- py37:
- conda_env: py37
- py38:
- conda_env: py38
- py38-all-but-dask:
- conda_env: py38-all-but-dask
- py38-upstream-dev:
- conda_env: py38
- upstream_dev: true
- py38-flaky:
- conda_env: py38
- pytest_extra_flags: --run-flaky --run-network-tests
- allow_failure: true
- pool:
- vmImage: 'ubuntu-16.04'
- steps:
- - template: ci/azure/unit-tests.yml
-
-- job: MacOSX
- strategy:
- matrix:
- py38:
- conda_env: py38
- pool:
- vmImage: 'macOS-10.15'
- steps:
- - template: ci/azure/unit-tests.yml
-
-- job: Windows
- strategy:
- matrix:
- py37:
- conda_env: py37-windows
- pool:
- vmImage: 'vs2017-win2016'
- steps:
- - template: ci/azure/unit-tests.yml
-
-- job: LintFlake8
- pool:
- vmImage: 'ubuntu-16.04'
- steps:
- - task: UsePythonVersion@0
- - bash: python -m pip install flake8
- displayName: Install flake8
- - bash: flake8
- displayName: flake8 lint checks
-
-- job: FormattingBlack
- pool:
- vmImage: 'ubuntu-16.04'
- steps:
- - task: UsePythonVersion@0
- - bash: python -m pip install black
- displayName: Install black
- - bash: black --check .
- displayName: black formatting check
-
-- job: TypeChecking
- variables:
- conda_env: py38
- pool:
- vmImage: 'ubuntu-16.04'
- steps:
- - template: ci/azure/install.yml
- - bash: |
- source activate xarray-tests
- mypy .
- displayName: mypy type checks
-
-- job: isort
- variables:
- conda_env: py38
- pool:
- vmImage: 'ubuntu-16.04'
- steps:
- - template: ci/azure/install.yml
- - bash: |
- source activate xarray-tests
- isort -rc --check .
- displayName: isort formatting checks
-
-- job: MinimumVersionsPolicy
- pool:
- vmImage: 'ubuntu-16.04'
- steps:
- - template: ci/azure/add-conda-to-path.yml
- - bash: |
- conda install -y pyyaml
- python ci/min_deps_check.py ci/requirements/py36-bare-minimum.yml
- python ci/min_deps_check.py ci/requirements/py36-min-all-deps.yml
- displayName: minimum versions policy
-
-- job: Docs
- pool:
- vmImage: 'ubuntu-16.04'
- steps:
- - template: ci/azure/install.yml
- parameters:
- env_file: ci/requirements/doc.yml
- - bash: |
- source activate xarray-tests
- # Replicate the exact environment created by the readthedocs CI
- conda install --yes --quiet -c pkgs/main mock pillow sphinx sphinx_rtd_theme
- displayName: Replicate readthedocs CI environment
- - bash: |
- source activate xarray-tests
- cd doc
- sphinx-build -W --keep-going -j auto -b html -d _build/doctrees . _build/html
- displayName: Build HTML docs
diff --git a/ci/azure/add-conda-to-path.yml b/ci/azure/add-conda-to-path.yml
deleted file mode 100644
index e5173835388..00000000000
--- a/ci/azure/add-conda-to-path.yml
+++ /dev/null
@@ -1,18 +0,0 @@
-# https://docs.microsoft.com/en-us/azure/devops/pipelines/languages/anaconda
-steps:
-
-- bash: |
- echo "##vso[task.prependpath]$CONDA/bin"
- displayName: Add conda to PATH (Linux)
- condition: eq(variables['Agent.OS'], 'Linux')
-
-- bash: |
- echo "##vso[task.prependpath]$CONDA/bin"
- sudo chown -R $USER $CONDA
- displayName: Add conda to PATH (OS X)
- condition: eq(variables['Agent.OS'], 'Darwin')
-
-- powershell: |
- Write-Host "##vso[task.prependpath]$env:CONDA\Scripts"
- displayName: Add conda to PATH (Windows)
- condition: eq(variables['Agent.OS'], 'Windows_NT')
diff --git a/ci/azure/install.yml b/ci/azure/install.yml
deleted file mode 100644
index 60559dd2064..00000000000
--- a/ci/azure/install.yml
+++ /dev/null
@@ -1,47 +0,0 @@
-parameters:
- env_file: ci/requirements/$CONDA_ENV.yml
-
-steps:
-
-- template: add-conda-to-path.yml
-
-- bash: |
- conda update -y conda
- conda env create -n xarray-tests --file ${{ parameters.env_file }}
- displayName: Install conda dependencies
-
-- bash: |
- source activate xarray-tests
- python -m pip install \
- -f https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com \
- --no-deps \
- --pre \
- --upgrade \
- matplotlib \
- numpy \
- scipy
- python -m pip install \
- --no-deps \
- --upgrade \
- git+https://github.com/dask/dask \
- git+https://github.com/dask/distributed \
- git+https://github.com/zarr-developers/zarr \
- git+https://github.com/Unidata/cftime \
- git+https://github.com/mapbox/rasterio \
- git+https://github.com/hgrecco/pint \
- git+https://github.com/pydata/bottleneck \
- git+https://github.com/pandas-dev/pandas
- condition: eq(variables['UPSTREAM_DEV'], 'true')
- displayName: Install upstream dev dependencies
-
-- bash: |
- source activate xarray-tests
- python -m pip install --no-deps -e .
- displayName: Install xarray
-
-- bash: |
- source activate xarray-tests
- conda info -a
- conda list
- python xarray/util/print_versions.py
- displayName: Version info
diff --git a/ci/azure/unit-tests.yml b/ci/azure/unit-tests.yml
deleted file mode 100644
index 7ee5132632f..00000000000
--- a/ci/azure/unit-tests.yml
+++ /dev/null
@@ -1,34 +0,0 @@
-steps:
-
-- template: install.yml
-
-- bash: |
- source activate xarray-tests
- python -OO -c "import xarray"
- displayName: Import xarray
-
-# Work around for allowed test failures:
-# https://github.com/microsoft/azure-pipelines-tasks/issues/9302
-- bash: |
- source activate xarray-tests
- pytest \
- --junitxml=junit/test-results.xml \
- --cov=xarray \
- --cov-report=xml \
- $(pytest_extra_flags) || [ "$ALLOW_FAILURE" = "true" ]
- displayName: Run tests
-
-- bash: |
- curl https://codecov.io/bash > codecov.sh
- bash codecov.sh -t 688f4d53-31bb-49b5-8370-4ce6f792cf3d
- displayName: Upload coverage to codecov.io
-
-# TODO: publish coverage results to Azure, once we can merge them across
-# multiple jobs: https://stackoverflow.com/questions/56776185
-
-- task: PublishTestResults@2
- condition: succeededOrFailed()
- inputs:
- testResultsFiles: '**/test-*.xml'
- failTaskOnFailedTests: false
- testRunTitle: '$(Agent.JobName)'
diff --git a/ci/install-upstream-wheels.sh b/ci/install-upstream-wheels.sh
new file mode 100755
index 00000000000..fe3e706f6a6
--- /dev/null
+++ b/ci/install-upstream-wheels.sh
@@ -0,0 +1,43 @@
+#!/usr/bin/env bash
+
+# TODO: add sparse back in, once Numba works with the development version of
+# NumPy again: https://github.com/pydata/xarray/issues/4146
+
+conda uninstall -y --force \
+ numpy \
+ scipy \
+ pandas \
+ matplotlib \
+ dask \
+ distributed \
+ zarr \
+ cftime \
+ rasterio \
+ pint \
+ bottleneck \
+ sparse
+python -m pip install \
+ -i https://pypi.anaconda.org/scipy-wheels-nightly/simple \
+ --no-deps \
+ --pre \
+ --upgrade \
+ numpy \
+ scipy \
+ pandas
+python -m pip install \
+ -f https://7933911d6844c6c53a7d-47bd50c35cd79bd838daf386af554a83.ssl.cf2.rackcdn.com \
+ --no-deps \
+ --pre \
+ --upgrade \
+ matplotlib
+python -m pip install \
+ --no-deps \
+ --upgrade \
+ git+https://github.com/dask/dask \
+ git+https://github.com/dask/distributed \
+ git+https://github.com/zarr-developers/zarr \
+ git+https://github.com/Unidata/cftime \
+ git+https://github.com/mapbox/rasterio \
+ git+https://github.com/hgrecco/pint \
+ git+https://github.com/pydata/bottleneck # \
+ # git+https://github.com/pydata/sparse
diff --git a/ci/min_deps_check.py b/ci/min_deps_check.py
index 527093cf5bc..3ffab645e8e 100755
--- a/ci/min_deps_check.py
+++ b/ci/min_deps_check.py
@@ -1,15 +1,16 @@
"""Fetch from conda database all available versions of the xarray dependencies and their
-publication date. Compare it against requirements/py36-min-all-deps.yml to verify the
+publication date. Compare it against requirements/py37-min-all-deps.yml to verify the
policy on obsolete dependencies is being followed. Print a pretty report :)
"""
-import subprocess
+import itertools
import sys
-from concurrent.futures import ThreadPoolExecutor
from datetime import datetime, timedelta
from typing import Dict, Iterator, Optional, Tuple
+import conda.api
import yaml
+CHANNELS = ["conda-forge", "defaults"]
IGNORE_DEPS = {
"black",
"coveralls",
@@ -21,11 +22,26 @@
"pytest",
"pytest-cov",
"pytest-env",
+ "pytest-xdist",
}
-POLICY_MONTHS = {"python": 42, "numpy": 24, "pandas": 12, "scipy": 12}
-POLICY_MONTHS_DEFAULT = 6
-
+POLICY_MONTHS = {"python": 42, "numpy": 24, "setuptools": 42}
+POLICY_MONTHS_DEFAULT = 12
+POLICY_OVERRIDE = {
+ # dask < 2.9 has trouble with nan-reductions
+ # TODO remove this special case and the matching note in installing.rst
+ # after January 2021.
+ "dask": (2, 9),
+ "distributed": (2, 9),
+ # setuptools-scm doesn't work with setuptools < 36.7 (Nov 2017).
+ # The conda metadata is malformed for setuptools < 38.4 (Jan 2018)
+ # (it's missing a timestamp which prevents this tool from working).
+ # setuptools < 40.4 (Sep 2018) from conda-forge cannot be installed into a py37
+ # environment
+ # TODO remove this special case and the matching note in installing.rst
+ # after March 2022.
+ "setuptools": (40, 4),
+}
has_errors = False
@@ -40,7 +56,7 @@ def warning(msg: str) -> None:
def parse_requirements(fname) -> Iterator[Tuple[str, int, int, Optional[int]]]:
- """Load requirements/py36-min-all-deps.yml
+ """Load requirements/py37-min-all-deps.yml
Yield (package name, major version, minor version, [patch version])
"""
@@ -76,30 +92,23 @@ def query_conda(pkg: str) -> Dict[Tuple[int, int], datetime]:
Return map of {(major version, minor version): publication date}
"""
- stdout = subprocess.check_output(
- ["conda", "search", pkg, "--info", "-c", "defaults", "-c", "conda-forge"]
- )
- out = {} # type: Dict[Tuple[int, int], datetime]
- major = None
- minor = None
-
- for row in stdout.decode("utf-8").splitlines():
- label, _, value = row.partition(":")
- label = label.strip()
- if label == "file name":
- value = value.strip()[len(pkg) :]
- smajor, sminor = value.split("-")[1].split(".")[:2]
- major = int(smajor)
- minor = int(sminor)
- if label == "timestamp":
- assert major is not None
- assert minor is not None
- ts = datetime.strptime(value.split()[0].strip(), "%Y-%m-%d")
-
- if (major, minor) in out:
- out[major, minor] = min(out[major, minor], ts)
- else:
- out[major, minor] = ts
+
+ def metadata(entry):
+ version = entry.version
+
+ time = datetime.fromtimestamp(entry.timestamp)
+ major, minor = map(int, version.split(".")[:2])
+
+ return (major, minor), time
+
+ raw_data = conda.api.SubdirData.query_all(pkg, channels=CHANNELS)
+ data = sorted(metadata(entry) for entry in raw_data if entry.timestamp != 0)
+
+ release_dates = {
+ version: [time for _, time in group if time is not None]
+ for version, group in itertools.groupby(data, key=lambda x: x[0])
+ }
+ out = {version: min(dates) for version, dates in release_dates.items() if dates}
# Hardcoded fix to work around incorrect dates in conda
if pkg == "python":
@@ -151,6 +160,11 @@ def process_pkg(
policy_minor = minor
policy_published_actual = published
+ try:
+ policy_major, policy_minor = POLICY_OVERRIDE[pkg]
+ except KeyError:
+ pass
+
if (req_major, req_minor) < (policy_major, policy_minor):
status = "<"
elif (req_major, req_minor) > (policy_major, policy_minor):
@@ -182,16 +196,14 @@ def fmt_version(major: int, minor: int, patch: int = None) -> str:
def main() -> None:
fname = sys.argv[1]
- with ThreadPoolExecutor(8) as ex:
- futures = [
- ex.submit(process_pkg, pkg, major, minor, patch)
- for pkg, major, minor, patch in parse_requirements(fname)
- ]
- rows = [f.result() for f in futures]
-
- print("Package Required Policy Status")
- print("------------- -------------------- -------------------- ------")
- fmt = "{:13} {:7} ({:10}) {:7} ({:10}) {}"
+ rows = [
+ process_pkg(pkg, major, minor, patch)
+ for pkg, major, minor, patch in parse_requirements(fname)
+ ]
+
+ print("Package Required Policy Status")
+ print("----------------- -------------------- -------------------- ------")
+ fmt = "{:17} {:7} ({:10}) {:7} ({:10}) {}"
for row in rows:
print(fmt.format(*row))
diff --git a/ci/requirements/doc.yml b/ci/requirements/doc.yml
index 2987303c92a..e092272654b 100644
--- a/ci/requirements/doc.yml
+++ b/ci/requirements/doc.yml
@@ -2,6 +2,7 @@ name: xarray-docs
channels:
# Don't change to pkgs/main, as it causes random timeouts in readthedocs
- conda-forge
+ - nodefaults
dependencies:
- python=3.8
- bottleneck
@@ -13,15 +14,21 @@ dependencies:
- ipython
- iris>=2.3
- jupyter_client
+ - matplotlib-base
- nbsphinx
- netcdf4>=1.5
- numba
- numpy>=1.17
- - numpydoc
- pandas>=1.0
- rasterio>=1.1
- seaborn
- setuptools
- - sphinx>=2.3
+ - sphinx=3.3
- sphinx_rtd_theme>=0.4
- - zarr>=2.4
\ No newline at end of file
+ - sphinx-autosummary-accessors
+ - zarr>=2.4
+ - pip
+ - pip:
+ - scanpydoc
+ # relative to this file. Needs to be editable to be accepted.
+ - -e ../..
diff --git a/ci/requirements/py37-windows.yml b/ci/requirements/environment-windows.yml
similarity index 73%
rename from ci/requirements/py37-windows.yml
rename to ci/requirements/environment-windows.yml
index e9e5c7a900a..6de2bc8dc64 100644
--- a/ci/requirements/py37-windows.yml
+++ b/ci/requirements/environment-windows.yml
@@ -2,27 +2,21 @@ name: xarray-tests
channels:
- conda-forge
dependencies:
- - python=3.7
- - black
- boto3
- bottleneck
- cartopy
# - cdms2 # Not available on Windows
- # - cfgrib # Causes Python interpreter crash on Windows
+ # - cfgrib # Causes Python interpreter crash on Windows: https://github.com/pydata/xarray/pull/3340
- cftime
- - coveralls
- dask
- distributed
- - flake8
- h5netcdf
- - h5py
+ - h5py=2
- hdf5
- hypothesis
- iris
- - isort
- lxml # Optional dep of pydap
- - matplotlib
- - mypy=0.761 # Must match .pre-commit-config.yaml
+ - matplotlib-base
- nc-time-axis
- netcdf4
- numba
@@ -30,12 +24,14 @@ dependencies:
- pandas
- pint
- pip
+ - pre-commit
- pseudonetcdf
- pydap
# - pynio # Not available on Windows
- pytest
- pytest-cov
- pytest-env
+ - pytest-xdist
- rasterio
- scipy
- seaborn
diff --git a/ci/requirements/py37.yml b/ci/requirements/environment.yml
similarity index 73%
rename from ci/requirements/py37.yml
rename to ci/requirements/environment.yml
index dba3926596e..0f59d9570c8 100644
--- a/ci/requirements/py37.yml
+++ b/ci/requirements/environment.yml
@@ -1,41 +1,38 @@
name: xarray-tests
channels:
- conda-forge
+ - nodefaults
dependencies:
- - python=3.7
- - black
- boto3
- bottleneck
- cartopy
- cdms2
- cfgrib
- cftime
- - coveralls
- dask
- distributed
- - flake8
- h5netcdf
- - h5py
+ - h5py=2
- hdf5
- hypothesis
- iris
- - isort
- lxml # Optional dep of pydap
- - matplotlib
- - mypy=0.761 # Must match .pre-commit-config.yaml
+ - matplotlib-base
- nc-time-axis
- netcdf4
- numba
- numpy
- pandas
- pint
- - pip
+ - pip=20.2
+ - pre-commit
- pseudonetcdf
- pydap
- - pynio
+ # - pynio: not compatible with netCDF4>1.5.3; only tested in py37-bare-minimum
- pytest
- pytest-cov
- pytest-env
+ - pytest-xdist
- rasterio
- scipy
- seaborn
diff --git a/ci/requirements/py36.yml b/ci/requirements/py36.yml
deleted file mode 100644
index a500173f277..00000000000
--- a/ci/requirements/py36.yml
+++ /dev/null
@@ -1,47 +0,0 @@
-name: xarray-tests
-channels:
- - conda-forge
-dependencies:
- - python=3.6
- - black
- - boto3
- - bottleneck
- - cartopy
- - cdms2
- - cfgrib
- - cftime
- - coveralls
- - dask
- - distributed
- - flake8
- - h5netcdf
- - h5py
- - hdf5
- - hypothesis
- - iris
- - isort
- - lxml # Optional dep of pydap
- - matplotlib
- - mypy=0.761 # Must match .pre-commit-config.yaml
- - nc-time-axis
- - netcdf4
- - numba
- - numpy
- - pandas
- - pint
- - pip
- - pseudonetcdf
- - pydap
- - pynio
- - pytest
- - pytest-cov
- - pytest-env
- - rasterio
- - scipy
- - seaborn
- - setuptools
- - sparse
- - toolz
- - zarr
- - pip:
- - numbagg
diff --git a/ci/requirements/py36-bare-minimum.yml b/ci/requirements/py37-bare-minimum.yml
similarity index 69%
rename from ci/requirements/py36-bare-minimum.yml
rename to ci/requirements/py37-bare-minimum.yml
index 00fef672855..fbeb87032b7 100644
--- a/ci/requirements/py36-bare-minimum.yml
+++ b/ci/requirements/py37-bare-minimum.yml
@@ -1,13 +1,15 @@
name: xarray-tests
channels:
- conda-forge
+ - nodefaults
dependencies:
- - python=3.6
+ - python=3.7
- coveralls
- pip
- pytest
- pytest-cov
- pytest-env
+ - pytest-xdist
- numpy=1.15
- pandas=0.25
- - setuptools=41.2
+ - setuptools=40.4
diff --git a/ci/requirements/py36-min-all-deps.yml b/ci/requirements/py37-min-all-deps.yml
similarity index 73%
rename from ci/requirements/py36-min-all-deps.yml
rename to ci/requirements/py37-min-all-deps.yml
index 86540197dcc..feef86ddf5c 100644
--- a/ci/requirements/py36-min-all-deps.yml
+++ b/ci/requirements/py37-min-all-deps.yml
@@ -1,12 +1,13 @@
name: xarray-tests
channels:
- conda-forge
+ - nodefaults
dependencies:
# MINIMUM VERSIONS POLICY: see doc/installing.rst
# Run ci/min_deps_check.py to verify that this file respects the policy.
# When upgrading python, numpy, or pandas, must also change
# doc/installing.rst and setup.py.
- - python=3.6
+ - python=3.7
- black
- boto3=1.9
- bottleneck=1.2
@@ -15,8 +16,8 @@ dependencies:
- cfgrib=0.9
- cftime=1.0
- coveralls
- - dask=2.2
- - distributed=2.2
+ - dask=2.9
+ - distributed=2.9
- flake8
- h5netcdf=0.7
- h5py=2.9 # Policy allows for 2.10, but it's a conflict-fest
@@ -25,15 +26,14 @@ dependencies:
- iris=2.2
- isort
- lxml=4.4 # Optional dep of pydap
- - matplotlib=3.1
- - msgpack-python=0.6 # remove once distributed is bumped. distributed GH3491
- - mypy=0.761 # Must match .pre-commit-config.yaml
+ - matplotlib-base=3.1
+ - mypy=0.782 # Must match .pre-commit-config.yaml
- nc-time-axis=1.2
- netcdf4=1.4
- - numba=0.44
+ - numba=0.46
- numpy=1.15
- pandas=0.25
- # - pint # See py36-min-nep18.yml
+ # - pint # See py37-min-nep18.yml
- pip
- pseudonetcdf=3.0
- pydap=3.2
@@ -41,11 +41,12 @@ dependencies:
- pytest
- pytest-cov
- pytest-env
+ - pytest-xdist
- rasterio=1.0
- scipy=1.3
- seaborn=0.9
- - setuptools=41.2
- # - sparse # See py36-min-nep18.yml
+ - setuptools=40.4
+ # - sparse # See py37-min-nep18.yml
- toolz=0.10
- zarr=2.3
- pip:
diff --git a/ci/requirements/py36-min-nep18.yml b/ci/requirements/py37-min-nep18.yml
similarity index 62%
rename from ci/requirements/py36-min-nep18.yml
rename to ci/requirements/py37-min-nep18.yml
index a5eded49cd4..aea86261a0e 100644
--- a/ci/requirements/py36-min-nep18.yml
+++ b/ci/requirements/py37-min-nep18.yml
@@ -1,21 +1,22 @@
name: xarray-tests
channels:
- conda-forge
+ - nodefaults
dependencies:
# Optional dependencies that require NEP18, such as sparse and pint,
# require drastically newer packages than everything else
- - python=3.6
+ - python=3.7
- coveralls
- - dask=2.4
- - distributed=2.4
- - msgpack-python=0.6 # remove once distributed is bumped. distributed GH3491
+ - dask=2.9
+ - distributed=2.9
- numpy=1.17
- pandas=0.25
- - pint=0.11
+ - pint=0.15
- pip
- pytest
- pytest-cov
- pytest-env
- - scipy=1.2
- - setuptools=41.2
+ - pytest-xdist
+ - scipy=1.3
+ - setuptools=40.4
- sparse=0.8
diff --git a/ci/requirements/py38-all-but-dask.yml b/ci/requirements/py38-all-but-dask.yml
index a375d9e1e5a..14930f5272d 100644
--- a/ci/requirements/py38-all-but-dask.yml
+++ b/ci/requirements/py38-all-but-dask.yml
@@ -1,6 +1,7 @@
name: xarray-tests
channels:
- conda-forge
+ - nodefaults
dependencies:
- python=3.8
- black
@@ -13,13 +14,13 @@ dependencies:
- coveralls
- flake8
- h5netcdf
- - h5py
+ - h5py=2
- hdf5
- hypothesis
- isort
- lxml # Optional dep of pydap
- - matplotlib
- - mypy=0.761 # Must match .pre-commit-config.yaml
+ - matplotlib-base
+ - mypy=0.790 # Must match .pre-commit-config.yaml
- nc-time-axis
- netcdf4
- numba
@@ -29,10 +30,11 @@ dependencies:
- pip
- pseudonetcdf
- pydap
- - pynio
+ # - pynio: not compatible with netCDF4>1.5.3; only tested in py37-bare-minimum
- pytest
- pytest-cov
- pytest-env
+ - pytest-xdist
- rasterio
- scipy
- seaborn
diff --git a/ci/requirements/py38.yml b/ci/requirements/py38.yml
deleted file mode 100644
index 24602f884e9..00000000000
--- a/ci/requirements/py38.yml
+++ /dev/null
@@ -1,47 +0,0 @@
-name: xarray-tests
-channels:
- - conda-forge
-dependencies:
- - python=3.8
- - black
- - boto3
- - bottleneck
- - cartopy
- - cdms2
- - cfgrib
- - cftime
- - coveralls
- - dask
- - distributed
- - flake8
- - h5netcdf
- - h5py
- - hdf5
- - hypothesis
- - iris
- - isort
- - lxml # Optional dep of pydap
- - matplotlib
- - mypy=0.761 # Must match .pre-commit-config.yaml
- - nc-time-axis
- - netcdf4
- - numba
- - numpy
- - pandas
- - pint
- - pip
- - pseudonetcdf
- - pydap
- - pynio
- - pytest
- - pytest-cov
- - pytest-env
- - rasterio
- - scipy
- - seaborn
- - setuptools
- - sparse
- - toolz
- - zarr
- - pip:
- - numbagg
diff --git a/conftest.py b/conftest.py
index 712af1d3759..862a1a1d0bc 100644
--- a/conftest.py
+++ b/conftest.py
@@ -19,16 +19,23 @@ def pytest_runtest_setup(item):
pytest.skip("set --run-flaky option to run flaky tests")
if "network" in item.keywords and not item.config.getoption("--run-network-tests"):
pytest.skip(
- "set --run-network-tests to run test requiring an " "internet connection"
+ "set --run-network-tests to run test requiring an internet connection"
)
@pytest.fixture(autouse=True)
-def add_standard_imports(doctest_namespace):
+def add_standard_imports(doctest_namespace, tmpdir):
import numpy as np
import pandas as pd
+
import xarray as xr
doctest_namespace["np"] = np
doctest_namespace["pd"] = pd
doctest_namespace["xr"] = xr
+
+ # always seed numpy.random to make the examples deterministic
+ np.random.seed(0)
+
+ # always switch to the temporary directory, so files get written there
+ tmpdir.chdir()
diff --git a/doc/_templates/autosummary/accessor.rst b/doc/_templates/autosummary/accessor.rst
new file mode 100644
index 00000000000..4ba745cd6fd
--- /dev/null
+++ b/doc/_templates/autosummary/accessor.rst
@@ -0,0 +1,6 @@
+{{ fullname }}
+{{ underline }}
+
+.. currentmodule:: {{ module.split('.')[0] }}
+
+.. autoaccessor:: {{ (module.split('.')[1:] + [objname]) | join('.') }}
diff --git a/doc/_templates/autosummary/accessor_attribute.rst b/doc/_templates/autosummary/accessor_attribute.rst
new file mode 100644
index 00000000000..b5ad65d6a73
--- /dev/null
+++ b/doc/_templates/autosummary/accessor_attribute.rst
@@ -0,0 +1,6 @@
+{{ fullname }}
+{{ underline }}
+
+.. currentmodule:: {{ module.split('.')[0] }}
+
+.. autoaccessorattribute:: {{ (module.split('.')[1:] + [objname]) | join('.') }}
diff --git a/doc/_templates/autosummary/accessor_callable.rst b/doc/_templates/autosummary/accessor_callable.rst
new file mode 100644
index 00000000000..7a3301814f5
--- /dev/null
+++ b/doc/_templates/autosummary/accessor_callable.rst
@@ -0,0 +1,6 @@
+{{ fullname }}
+{{ underline }}
+
+.. currentmodule:: {{ module.split('.')[0] }}
+
+.. autoaccessorcallable:: {{ (module.split('.')[1:] + [objname]) | join('.') }}.__call__
diff --git a/doc/_templates/autosummary/accessor_method.rst b/doc/_templates/autosummary/accessor_method.rst
new file mode 100644
index 00000000000..aefbba6ef1b
--- /dev/null
+++ b/doc/_templates/autosummary/accessor_method.rst
@@ -0,0 +1,6 @@
+{{ fullname }}
+{{ underline }}
+
+.. currentmodule:: {{ module.split('.')[0] }}
+
+.. autoaccessormethod:: {{ (module.split('.')[1:] + [objname]) | join('.') }}
diff --git a/doc/_templates/autosummary/base.rst b/doc/_templates/autosummary/base.rst
new file mode 100644
index 00000000000..53f2a29c193
--- /dev/null
+++ b/doc/_templates/autosummary/base.rst
@@ -0,0 +1,3 @@
+:github_url: {{ fullname | github_url | escape_underscores }}
+
+{% extends "!autosummary/base.rst" %}
diff --git a/doc/api-hidden.rst b/doc/api-hidden.rst
index cc9517a98ba..e5492ec73a4 100644
--- a/doc/api-hidden.rst
+++ b/doc/api-hidden.rst
@@ -9,8 +9,6 @@
.. autosummary::
:toctree: generated/
- auto_combine
-
Dataset.nbytes
Dataset.chunks
@@ -18,6 +16,8 @@
Dataset.any
Dataset.argmax
Dataset.argmin
+ Dataset.idxmax
+ Dataset.idxmin
Dataset.max
Dataset.min
Dataset.mean
@@ -41,8 +41,6 @@
core.rolling.DatasetCoarsen.all
core.rolling.DatasetCoarsen.any
- core.rolling.DatasetCoarsen.argmax
- core.rolling.DatasetCoarsen.argmin
core.rolling.DatasetCoarsen.count
core.rolling.DatasetCoarsen.max
core.rolling.DatasetCoarsen.mean
@@ -54,6 +52,7 @@
core.rolling.DatasetCoarsen.var
core.rolling.DatasetCoarsen.boundary
core.rolling.DatasetCoarsen.coord_func
+ core.rolling.DatasetCoarsen.keep_attrs
core.rolling.DatasetCoarsen.obj
core.rolling.DatasetCoarsen.side
core.rolling.DatasetCoarsen.trim_excess
@@ -68,8 +67,6 @@
core.groupby.DatasetGroupBy.where
core.groupby.DatasetGroupBy.all
core.groupby.DatasetGroupBy.any
- core.groupby.DatasetGroupBy.argmax
- core.groupby.DatasetGroupBy.argmin
core.groupby.DatasetGroupBy.count
core.groupby.DatasetGroupBy.max
core.groupby.DatasetGroupBy.mean
@@ -85,8 +82,6 @@
core.resample.DatasetResample.all
core.resample.DatasetResample.any
core.resample.DatasetResample.apply
- core.resample.DatasetResample.argmax
- core.resample.DatasetResample.argmin
core.resample.DatasetResample.assign
core.resample.DatasetResample.assign_coords
core.resample.DatasetResample.bfill
@@ -123,11 +118,15 @@
core.rolling.DatasetRolling.var
core.rolling.DatasetRolling.center
core.rolling.DatasetRolling.dim
+ core.rolling.DatasetRolling.keep_attrs
core.rolling.DatasetRolling.min_periods
core.rolling.DatasetRolling.obj
core.rolling.DatasetRolling.rollings
core.rolling.DatasetRolling.window
+ core.weighted.DatasetWeighted.obj
+ core.weighted.DatasetWeighted.weights
+
core.rolling_exp.RollingExp.mean
Dataset.argsort
@@ -160,6 +159,8 @@
DataArray.any
DataArray.argmax
DataArray.argmin
+ DataArray.idxmax
+ DataArray.idxmin
DataArray.max
DataArray.min
DataArray.mean
@@ -183,8 +184,6 @@
core.rolling.DataArrayCoarsen.all
core.rolling.DataArrayCoarsen.any
- core.rolling.DataArrayCoarsen.argmax
- core.rolling.DataArrayCoarsen.argmin
core.rolling.DataArrayCoarsen.count
core.rolling.DataArrayCoarsen.max
core.rolling.DataArrayCoarsen.mean
@@ -196,6 +195,7 @@
core.rolling.DataArrayCoarsen.var
core.rolling.DataArrayCoarsen.boundary
core.rolling.DataArrayCoarsen.coord_func
+ core.rolling.DataArrayCoarsen.keep_attrs
core.rolling.DataArrayCoarsen.obj
core.rolling.DataArrayCoarsen.side
core.rolling.DataArrayCoarsen.trim_excess
@@ -209,8 +209,6 @@
core.groupby.DataArrayGroupBy.where
core.groupby.DataArrayGroupBy.all
core.groupby.DataArrayGroupBy.any
- core.groupby.DataArrayGroupBy.argmax
- core.groupby.DataArrayGroupBy.argmin
core.groupby.DataArrayGroupBy.count
core.groupby.DataArrayGroupBy.max
core.groupby.DataArrayGroupBy.mean
@@ -226,8 +224,6 @@
core.resample.DataArrayResample.all
core.resample.DataArrayResample.any
core.resample.DataArrayResample.apply
- core.resample.DataArrayResample.argmax
- core.resample.DataArrayResample.argmin
core.resample.DataArrayResample.assign_coords
core.resample.DataArrayResample.bfill
core.resample.DataArrayResample.count
@@ -263,11 +259,15 @@
core.rolling.DataArrayRolling.var
core.rolling.DataArrayRolling.center
core.rolling.DataArrayRolling.dim
+ core.rolling.DataArrayRolling.keep_attrs
core.rolling.DataArrayRolling.min_periods
core.rolling.DataArrayRolling.obj
core.rolling.DataArrayRolling.window
core.rolling.DataArrayRolling.window_labels
+ core.weighted.DataArrayWeighted.obj
+ core.weighted.DataArrayWeighted.weights
+
DataArray.argsort
DataArray.clip
DataArray.conj
@@ -291,6 +291,14 @@
core.accessor_dt.DatetimeAccessor.days_in_month
core.accessor_dt.DatetimeAccessor.daysinmonth
core.accessor_dt.DatetimeAccessor.hour
+ core.accessor_dt.DatetimeAccessor.is_leap_year
+ core.accessor_dt.DatetimeAccessor.is_month_end
+ core.accessor_dt.DatetimeAccessor.is_month_start
+ core.accessor_dt.DatetimeAccessor.is_quarter_end
+ core.accessor_dt.DatetimeAccessor.is_quarter_start
+ core.accessor_dt.DatetimeAccessor.is_year_end
+ core.accessor_dt.DatetimeAccessor.is_year_start
+ core.accessor_dt.DatetimeAccessor.isocalendar
core.accessor_dt.DatetimeAccessor.microsecond
core.accessor_dt.DatetimeAccessor.minute
core.accessor_dt.DatetimeAccessor.month
@@ -305,6 +313,14 @@
core.accessor_dt.DatetimeAccessor.weekofyear
core.accessor_dt.DatetimeAccessor.year
+ core.accessor_dt.TimedeltaAccessor.ceil
+ core.accessor_dt.TimedeltaAccessor.floor
+ core.accessor_dt.TimedeltaAccessor.round
+ core.accessor_dt.TimedeltaAccessor.days
+ core.accessor_dt.TimedeltaAccessor.microseconds
+ core.accessor_dt.TimedeltaAccessor.nanoseconds
+ core.accessor_dt.TimedeltaAccessor.seconds
+
core.accessor_str.StringAccessor.capitalize
core.accessor_str.StringAccessor.center
core.accessor_str.StringAccessor.contains
@@ -379,6 +395,7 @@
Variable.min
Variable.no_conflicts
Variable.notnull
+ Variable.pad
Variable.prod
Variable.quantile
Variable.rank
@@ -452,6 +469,7 @@
IndexVariable.min
IndexVariable.no_conflicts
IndexVariable.notnull
+ IndexVariable.pad
IndexVariable.prod
IndexVariable.quantile
IndexVariable.rank
@@ -554,6 +572,16 @@
ufuncs.tanh
ufuncs.trunc
+ plot.plot
+ plot.line
+ plot.step
+ plot.hist
+ plot.contour
+ plot.contourf
+ plot.imshow
+ plot.pcolormesh
+ plot.scatter
+
plot.FacetGrid.map_dataarray
plot.FacetGrid.set_titles
plot.FacetGrid.set_ticks
@@ -562,14 +590,17 @@
CFTimeIndex.all
CFTimeIndex.any
CFTimeIndex.append
+ CFTimeIndex.argsort
CFTimeIndex.argmax
CFTimeIndex.argmin
- CFTimeIndex.argsort
CFTimeIndex.asof
CFTimeIndex.asof_locs
CFTimeIndex.astype
+ CFTimeIndex.calendar
+ CFTimeIndex.ceil
CFTimeIndex.contains
CFTimeIndex.copy
+ CFTimeIndex.days_in_month
CFTimeIndex.delete
CFTimeIndex.difference
CFTimeIndex.drop
@@ -580,6 +611,7 @@
CFTimeIndex.equals
CFTimeIndex.factorize
CFTimeIndex.fillna
+ CFTimeIndex.floor
CFTimeIndex.format
CFTimeIndex.get_indexer
CFTimeIndex.get_indexer_for
@@ -620,6 +652,7 @@
CFTimeIndex.reindex
CFTimeIndex.rename
CFTimeIndex.repeat
+ CFTimeIndex.round
CFTimeIndex.searchsorted
CFTimeIndex.set_names
CFTimeIndex.set_value
@@ -656,6 +689,7 @@
CFTimeIndex.dayofyear
CFTimeIndex.dtype
CFTimeIndex.empty
+ CFTimeIndex.freq
CFTimeIndex.has_duplicates
CFTimeIndex.hasnans
CFTimeIndex.hour
@@ -683,13 +717,10 @@
backends.NetCDF4DataStore.encode
backends.NetCDF4DataStore.encode_attribute
backends.NetCDF4DataStore.encode_variable
- backends.NetCDF4DataStore.get
backends.NetCDF4DataStore.get_attrs
backends.NetCDF4DataStore.get_dimensions
backends.NetCDF4DataStore.get_encoding
backends.NetCDF4DataStore.get_variables
- backends.NetCDF4DataStore.items
- backends.NetCDF4DataStore.keys
backends.NetCDF4DataStore.load
backends.NetCDF4DataStore.open
backends.NetCDF4DataStore.open_store_variable
@@ -703,28 +734,26 @@
backends.NetCDF4DataStore.store
backends.NetCDF4DataStore.store_dataset
backends.NetCDF4DataStore.sync
- backends.NetCDF4DataStore.values
- backends.NetCDF4DataStore.attrs
backends.NetCDF4DataStore.autoclose
- backends.NetCDF4DataStore.dimensions
backends.NetCDF4DataStore.ds
backends.NetCDF4DataStore.format
backends.NetCDF4DataStore.is_remote
backends.NetCDF4DataStore.lock
- backends.NetCDF4DataStore.variables
+ backends.H5NetCDFStore.autoclose
backends.H5NetCDFStore.close
backends.H5NetCDFStore.encode
backends.H5NetCDFStore.encode_attribute
backends.H5NetCDFStore.encode_variable
- backends.H5NetCDFStore.get
+ backends.H5NetCDFStore.format
backends.H5NetCDFStore.get_attrs
backends.H5NetCDFStore.get_dimensions
backends.H5NetCDFStore.get_encoding
backends.H5NetCDFStore.get_variables
- backends.H5NetCDFStore.items
- backends.H5NetCDFStore.keys
+ backends.H5NetCDFStore.is_remote
backends.H5NetCDFStore.load
+ backends.H5NetCDFStore.lock
+ backends.H5NetCDFStore.open
backends.H5NetCDFStore.open_store_variable
backends.H5NetCDFStore.prepare_variable
backends.H5NetCDFStore.set_attribute
@@ -736,39 +765,25 @@
backends.H5NetCDFStore.store
backends.H5NetCDFStore.store_dataset
backends.H5NetCDFStore.sync
- backends.H5NetCDFStore.values
- backends.H5NetCDFStore.attrs
- backends.H5NetCDFStore.dimensions
backends.H5NetCDFStore.ds
- backends.H5NetCDFStore.variables
backends.PydapDataStore.close
- backends.PydapDataStore.get
backends.PydapDataStore.get_attrs
backends.PydapDataStore.get_dimensions
backends.PydapDataStore.get_encoding
backends.PydapDataStore.get_variables
- backends.PydapDataStore.items
- backends.PydapDataStore.keys
backends.PydapDataStore.load
backends.PydapDataStore.open
backends.PydapDataStore.open_store_variable
- backends.PydapDataStore.values
- backends.PydapDataStore.attrs
- backends.PydapDataStore.dimensions
- backends.PydapDataStore.variables
backends.ScipyDataStore.close
backends.ScipyDataStore.encode
backends.ScipyDataStore.encode_attribute
backends.ScipyDataStore.encode_variable
- backends.ScipyDataStore.get
backends.ScipyDataStore.get_attrs
backends.ScipyDataStore.get_dimensions
backends.ScipyDataStore.get_encoding
backends.ScipyDataStore.get_variables
- backends.ScipyDataStore.items
- backends.ScipyDataStore.keys
backends.ScipyDataStore.load
backends.ScipyDataStore.open_store_variable
backends.ScipyDataStore.prepare_variable
@@ -781,11 +796,7 @@
backends.ScipyDataStore.store
backends.ScipyDataStore.store_dataset
backends.ScipyDataStore.sync
- backends.ScipyDataStore.values
- backends.ScipyDataStore.attrs
- backends.ScipyDataStore.dimensions
backends.ScipyDataStore.ds
- backends.ScipyDataStore.variables
backends.FileManager.acquire
backends.FileManager.acquire_context
diff --git a/doc/api.rst b/doc/api.rst
index b37c84e7a81..ceab7dcc976 100644
--- a/doc/api.rst
+++ b/doc/api.rst
@@ -21,14 +21,16 @@ Top-level functions
broadcast
concat
merge
- auto_combine
combine_by_coords
combine_nested
where
set_options
+ infer_freq
full_like
zeros_like
ones_like
+ cov
+ corr
dot
polyval
map_blocks
@@ -173,6 +175,7 @@ Computation
Dataset.quantile
Dataset.differentiate
Dataset.integrate
+ Dataset.map_blocks
Dataset.polyfit
**Aggregation**:
@@ -229,6 +232,15 @@ Reshaping and reorganizing
Dataset.sortby
Dataset.broadcast_like
+Plotting
+--------
+
+.. autosummary::
+ :toctree: generated/
+ :template: autosummary/accessor_method.rst
+
+ Dataset.plot.scatter
+
DataArray
=========
@@ -348,7 +360,6 @@ Computation
DataArray.rolling_exp
DataArray.weighted
DataArray.coarsen
- DataArray.dt
DataArray.resample
DataArray.get_axis_num
DataArray.diff
@@ -357,7 +368,8 @@ Computation
DataArray.differentiate
DataArray.integrate
DataArray.polyfit
- DataArray.str
+ DataArray.map_blocks
+
**Aggregation**:
:py:attr:`~DataArray.all`
@@ -397,6 +409,121 @@ Computation
:py:attr:`~core.groupby.DataArrayGroupBy.where`
:py:attr:`~core.groupby.DataArrayGroupBy.quantile`
+
+String manipulation
+-------------------
+
+.. autosummary::
+ :toctree: generated/
+ :template: autosummary/accessor_method.rst
+
+ DataArray.str.capitalize
+ DataArray.str.center
+ DataArray.str.contains
+ DataArray.str.count
+ DataArray.str.decode
+ DataArray.str.encode
+ DataArray.str.endswith
+ DataArray.str.find
+ DataArray.str.get
+ DataArray.str.index
+ DataArray.str.isalnum
+ DataArray.str.isalpha
+ DataArray.str.isdecimal
+ DataArray.str.isdigit
+ DataArray.str.isnumeric
+ DataArray.str.isspace
+ DataArray.str.istitle
+ DataArray.str.isupper
+ DataArray.str.len
+ DataArray.str.ljust
+ DataArray.str.lower
+ DataArray.str.lstrip
+ DataArray.str.match
+ DataArray.str.pad
+ DataArray.str.repeat
+ DataArray.str.replace
+ DataArray.str.rfind
+ DataArray.str.rindex
+ DataArray.str.rjust
+ DataArray.str.rstrip
+ DataArray.str.slice
+ DataArray.str.slice_replace
+ DataArray.str.startswith
+ DataArray.str.strip
+ DataArray.str.swapcase
+ DataArray.str.title
+ DataArray.str.translate
+ DataArray.str.upper
+ DataArray.str.wrap
+ DataArray.str.zfill
+
+Datetimelike properties
+-----------------------
+
+**Datetime properties**:
+
+.. autosummary::
+ :toctree: generated/
+ :template: autosummary/accessor_attribute.rst
+
+ DataArray.dt.year
+ DataArray.dt.month
+ DataArray.dt.day
+ DataArray.dt.hour
+ DataArray.dt.minute
+ DataArray.dt.second
+ DataArray.dt.microsecond
+ DataArray.dt.nanosecond
+ DataArray.dt.dayofweek
+ DataArray.dt.weekday
+ DataArray.dt.weekday_name
+ DataArray.dt.dayofyear
+ DataArray.dt.quarter
+ DataArray.dt.days_in_month
+ DataArray.dt.daysinmonth
+ DataArray.dt.season
+ DataArray.dt.time
+ DataArray.dt.is_month_start
+ DataArray.dt.is_month_end
+ DataArray.dt.is_quarter_end
+ DataArray.dt.is_year_start
+ DataArray.dt.is_leap_year
+
+**Datetime methods**:
+
+.. autosummary::
+ :toctree: generated/
+ :template: autosummary/accessor_method.rst
+
+ DataArray.dt.floor
+ DataArray.dt.ceil
+ DataArray.dt.isocalendar
+ DataArray.dt.round
+ DataArray.dt.strftime
+
+**Timedelta properties**:
+
+.. autosummary::
+ :toctree: generated/
+ :template: autosummary/accessor_attribute.rst
+
+ DataArray.dt.days
+ DataArray.dt.seconds
+ DataArray.dt.microseconds
+ DataArray.dt.nanoseconds
+
+**Timedelta methods**:
+
+.. autosummary::
+ :toctree: generated/
+ :template: autosummary/accessor_method.rst
+
+ DataArray.dt.floor
+ DataArray.dt.ceil
+ DataArray.dt.round
+
+
Reshaping and reorganizing
--------------------------
@@ -413,6 +540,27 @@ Reshaping and reorganizing
DataArray.sortby
DataArray.broadcast_like
+Plotting
+--------
+
+.. autosummary::
+ :toctree: generated/
+ :template: autosummary/accessor_callable.rst
+
+ DataArray.plot
+
+.. autosummary::
+ :toctree: generated/
+ :template: autosummary/accessor_method.rst
+
+ DataArray.plot.contourf
+ DataArray.plot.contour
+ DataArray.plot.hist
+ DataArray.plot.imshow
+ DataArray.plot.line
+ DataArray.plot.pcolormesh
+ DataArray.plot.step
+
.. _api.ufuncs:
Universal functions
@@ -423,7 +571,9 @@ Universal functions
With recent versions of numpy, dask and xarray, NumPy ufuncs are now
supported directly on all xarray and dask objects. This obviates the need
for the ``xarray.ufuncs`` module, which should not be used for new code
- unless compatibility with versions of NumPy prior to v1.13 is required.
+ unless compatibility with versions of NumPy prior to v1.13 is
+ required. They will be removed once support for NumPy prior to
+ v1.17 is dropped.
These functions are copied from NumPy, but extended to work on NumPy arrays,
dask arrays and all xarray objects. You can find them in the ``xarray.ufuncs``
@@ -518,7 +668,6 @@ Dataset methods
Dataset.load
Dataset.chunk
Dataset.unify_chunks
- Dataset.map_blocks
Dataset.filter_by_attrs
Dataset.info
@@ -550,7 +699,6 @@ DataArray methods
DataArray.load
DataArray.chunk
DataArray.unify_chunks
- DataArray.map_blocks
Coordinates objects
===================
@@ -660,25 +808,6 @@ Creating custom indexes
cftime_range
-Plotting
-========
-
-.. autosummary::
- :toctree: generated/
-
- Dataset.plot
- plot.scatter
- DataArray.plot
- plot.plot
- plot.contourf
- plot.contour
- plot.hist
- plot.imshow
- plot.line
- plot.pcolormesh
- plot.step
- plot.FacetGrid
-
Faceting
--------
.. autosummary::
@@ -766,3 +895,10 @@ Deprecated / Pending Deprecation
Dataset.apply
core.groupby.DataArrayGroupBy.apply
core.groupby.DatasetGroupBy.apply
+
+.. autosummary::
+ :toctree: generated/
+ :template: autosummary/accessor_attribute.rst
+
+ DataArray.dt.weekofyear
+ DataArray.dt.week
diff --git a/doc/combining.rst b/doc/combining.rst
index 05b7f2efc50..edd34826e6d 100644
--- a/doc/combining.rst
+++ b/doc/combining.rst
@@ -4,11 +4,12 @@ Combining data
--------------
.. ipython:: python
- :suppress:
+ :suppress:
import numpy as np
import pandas as pd
import xarray as xr
+
np.random.seed(123456)
* For combining datasets or data arrays along a single dimension, see concatenate_.
@@ -28,20 +29,22 @@ that dimension:
.. ipython:: python
- arr = xr.DataArray(np.random.randn(2, 3),
- [('x', ['a', 'b']), ('y', [10, 20, 30])])
- arr[:, :1]
- # this resembles how you would use np.concatenate
- xr.concat([arr[:, :1], arr[:, 1:]], dim='y')
+ da = xr.DataArray(
+ np.arange(6).reshape(2, 3), [("x", ["a", "b"]), ("y", [10, 20, 30])]
+ )
+ da.isel(y=slice(0, 1)) # same as da[:, :1]
+ # This resembles how you would use np.concatenate:
+ xr.concat([da[:, :1], da[:, 1:]], dim="y")
+ # For more friendly pandas-like indexing you can use:
+ xr.concat([da.isel(y=slice(0, 1)), da.isel(y=slice(1, None))], dim="y")
In addition to combining along an existing dimension, ``concat`` can create a
new dimension by stacking lower dimensional arrays together:
.. ipython:: python
- arr[0]
- # to combine these 1d arrays into a 2d array in numpy, you would use np.array
- xr.concat([arr[0], arr[1]], 'x')
+ da.sel(x="a")
+ xr.concat([da.isel(x=0), da.isel(x=1)], "x")
If the second argument to ``concat`` is a new dimension name, the arrays will
be concatenated along that new dimension, which is always inserted as the first
@@ -49,7 +52,7 @@ dimension:
.. ipython:: python
- xr.concat([arr[0], arr[1]], 'new_dim')
+ xr.concat([da.isel(x=0), da.isel(x=1)], "new_dim")
The second argument to ``concat`` can also be an :py:class:`~pandas.Index` or
:py:class:`~xarray.DataArray` object as well as a string, in which case it is
@@ -57,14 +60,14 @@ used to label the values along the new dimension:
.. ipython:: python
- xr.concat([arr[0], arr[1]], pd.Index([-90, -100], name='new_dim'))
+ xr.concat([da.isel(x=0), da.isel(x=1)], pd.Index([-90, -100], name="new_dim"))
Of course, ``concat`` also works on ``Dataset`` objects:
.. ipython:: python
- ds = arr.to_dataset(name='foo')
- xr.concat([ds.sel(x='a'), ds.sel(x='b')], 'x')
+ ds = da.to_dataset(name="foo")
+ xr.concat([ds.sel(x="a"), ds.sel(x="b")], "x")
:py:func:`~xarray.concat` has a number of options which provide deeper control
over which variables are concatenated and how it handles conflicting variables
@@ -84,8 +87,8 @@ To combine variables and coordinates between multiple ``DataArray`` and/or
.. ipython:: python
- xr.merge([ds, ds.rename({'foo': 'bar'})])
- xr.merge([xr.DataArray(n, name='var%d' % n) for n in range(5)])
+ xr.merge([ds, ds.rename({"foo": "bar"})])
+ xr.merge([xr.DataArray(n, name="var%d" % n) for n in range(5)])
If you merge another dataset (or a dictionary including data array objects), by
default the resulting dataset will be aligned on the **union** of all index
@@ -93,7 +96,7 @@ coordinates:
.. ipython:: python
- other = xr.Dataset({'bar': ('x', [1, 2, 3, 4]), 'x': list('abcd')})
+ other = xr.Dataset({"bar": ("x", [1, 2, 3, 4]), "x": list("abcd")})
xr.merge([ds, other])
This ensures that ``merge`` is non-destructive. ``xarray.MergeError`` is raised
@@ -116,7 +119,7 @@ used in the :py:class:`~xarray.Dataset` constructor:
.. ipython:: python
- xr.Dataset({'a': arr[:-1], 'b': arr[1:]})
+ xr.Dataset({"a": da.isel(x=slice(0, 1)), "b": da.isel(x=slice(1, 2))})
.. _combine:
@@ -131,8 +134,8 @@ are filled with ``NaN``. For example:
.. ipython:: python
- ar0 = xr.DataArray([[0, 0], [0, 0]], [('x', ['a', 'b']), ('y', [-1, 0])])
- ar1 = xr.DataArray([[1, 1], [1, 1]], [('x', ['b', 'c']), ('y', [0, 1])])
+ ar0 = xr.DataArray([[0, 0], [0, 0]], [("x", ["a", "b"]), ("y", [-1, 0])])
+ ar1 = xr.DataArray([[1, 1], [1, 1]], [("x", ["b", "c"]), ("y", [0, 1])])
ar0.combine_first(ar1)
ar1.combine_first(ar0)
@@ -152,7 +155,7 @@ variables with new values:
.. ipython:: python
- ds.update({'space': ('space', [10.2, 9.4, 3.9])})
+ ds.update({"space": ("space", [10.2, 9.4, 3.9])})
However, dimensions are still required to be consistent between different
Dataset variables, so you cannot change the size of a dimension unless you
@@ -170,7 +173,7 @@ syntax:
.. ipython:: python
- ds['baz'] = xr.DataArray([9, 9, 9, 9, 9], coords=[('x', list('abcde'))])
+ ds["baz"] = xr.DataArray([9, 9, 9, 9, 9], coords=[("x", list("abcde"))])
ds.baz
Equals and identical
@@ -186,14 +189,14 @@ values:
.. ipython:: python
- arr.equals(arr.copy())
+ da.equals(da.copy())
:py:attr:`~xarray.Dataset.identical` also checks attributes, and the name of each
object:
.. ipython:: python
- arr.identical(arr.rename('bar'))
+ da.identical(da.rename("bar"))
:py:attr:`~xarray.Dataset.broadcast_equals` does a more relaxed form of equality
check that allows variables to have different dimensions, as long as values
@@ -201,8 +204,8 @@ are constant along those new dimensions:
.. ipython:: python
- left = xr.Dataset(coords={'x': 0})
- right = xr.Dataset({'x': [0, 0, 0]})
+ left = xr.Dataset(coords={"x": 0})
+ right = xr.Dataset({"x": [0, 0, 0]})
left.broadcast_equals(right)
Like pandas objects, two xarray objects are still equal or identical if they have
@@ -213,7 +216,7 @@ numpy):
.. ipython:: python
- arr == arr.copy()
+ da == da.copy()
Note that ``NaN`` does not compare equal to ``NaN`` in element-wise comparison;
you may need to deal with missing values explicitly.
@@ -231,9 +234,9 @@ coordinates as long as any non-missing values agree or are disjoint:
.. ipython:: python
- ds1 = xr.Dataset({'a': ('x', [10, 20, 30, np.nan])}, {'x': [1, 2, 3, 4]})
- ds2 = xr.Dataset({'a': ('x', [np.nan, 30, 40, 50])}, {'x': [2, 3, 4, 5]})
- xr.merge([ds1, ds2], compat='no_conflicts')
+ ds1 = xr.Dataset({"a": ("x", [10, 20, 30, np.nan])}, {"x": [1, 2, 3, 4]})
+ ds2 = xr.Dataset({"a": ("x", [np.nan, 30, 40, 50])}, {"x": [2, 3, 4, 5]})
+ xr.merge([ds1, ds2], compat="no_conflicts")
Note that due to the underlying representation of missing values as floating
point numbers (``NaN``), variable data type is not always preserved when merging
@@ -244,16 +247,6 @@ in this manner.
Combining along multiple dimensions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
-.. note::
-
- There are currently three combining functions with similar names:
- :py:func:`~xarray.auto_combine`, :py:func:`~xarray.combine_by_coords`, and
- :py:func:`~xarray.combine_nested`. This is because
- ``auto_combine`` is in the process of being deprecated in favour of the other
- two functions, which are more general. If your code currently relies on
- ``auto_combine``, then you will be able to get similar functionality by using
- ``combine_nested``.
-
For combining many objects along multiple dimensions xarray provides
:py:func:`~xarray.combine_nested` and :py:func:`~xarray.combine_by_coords`. These
functions use a combination of ``concat`` and ``merge`` across different
@@ -273,10 +266,12 @@ datasets into a doubly-nested list, e.g:
.. ipython:: python
- arr = xr.DataArray(name='temperature', data=np.random.randint(5, size=(2, 2)), dims=['x', 'y'])
+ arr = xr.DataArray(
+ name="temperature", data=np.random.randint(5, size=(2, 2)), dims=["x", "y"]
+ )
arr
ds_grid = [[arr, arr], [arr, arr]]
- xr.combine_nested(ds_grid, concat_dim=['x', 'y'])
+ xr.combine_nested(ds_grid, concat_dim=["x", "y"])
:py:func:`~xarray.combine_nested` can also be used to explicitly merge datasets
with different variables. For example if we have 4 datasets, which are divided
@@ -286,10 +281,10 @@ we wish to use ``merge`` instead of ``concat``:
.. ipython:: python
- temp = xr.DataArray(name='temperature', data=np.random.randn(2), dims=['t'])
- precip = xr.DataArray(name='precipitation', data=np.random.randn(2), dims=['t'])
+ temp = xr.DataArray(name="temperature", data=np.random.randn(2), dims=["t"])
+ precip = xr.DataArray(name="precipitation", data=np.random.randn(2), dims=["t"])
ds_grid = [[temp, precip], [temp, precip]]
- xr.combine_nested(ds_grid, concat_dim=['t', None])
+ xr.combine_nested(ds_grid, concat_dim=["t", None])
:py:func:`~xarray.combine_by_coords` is for combining objects which have dimension
coordinates which specify their relationship to and order relative to one
@@ -302,8 +297,8 @@ coordinates, not on their position in the list passed to ``combine_by_coords``.
.. ipython:: python
:okwarning:
- x1 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [0, 1, 2])])
- x2 = xr.DataArray(name='foo', data=np.random.randn(3), coords=[('x', [3, 4, 5])])
+ x1 = xr.DataArray(name="foo", data=np.random.randn(3), coords=[("x", [0, 1, 2])])
+ x2 = xr.DataArray(name="foo", data=np.random.randn(3), coords=[("x", [3, 4, 5])])
xr.combine_by_coords([x2, x1])
These functions can be used by :py:func:`~xarray.open_mfdataset` to open many
diff --git a/doc/computation.rst b/doc/computation.rst
index 4b8014c4782..dcfe270a942 100644
--- a/doc/computation.rst
+++ b/doc/computation.rst
@@ -18,17 +18,19 @@ Arithmetic operations with a single DataArray automatically vectorize (like
numpy) over all array values:
.. ipython:: python
- :suppress:
+ :suppress:
import numpy as np
import pandas as pd
import xarray as xr
+
np.random.seed(123456)
.. ipython:: python
- arr = xr.DataArray(np.random.RandomState(0).randn(2, 3),
- [('x', ['a', 'b']), ('y', [10, 20, 30])])
+ arr = xr.DataArray(
+ np.random.RandomState(0).randn(2, 3), [("x", ["a", "b"]), ("y", [10, 20, 30])]
+ )
arr - 3
abs(arr)
@@ -45,7 +47,7 @@ Use :py:func:`~xarray.where` to conditionally switch between values:
.. ipython:: python
- xr.where(arr > 0, 'positive', 'negative')
+ xr.where(arr > 0, "positive", "negative")
Use `@` to perform matrix multiplication:
@@ -73,14 +75,14 @@ methods for working with missing data from pandas:
.. ipython:: python
- x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=['x'])
+ x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=["x"])
x.isnull()
x.notnull()
x.count()
- x.dropna(dim='x')
+ x.dropna(dim="x")
x.fillna(-1)
- x.ffill('x')
- x.bfill('x')
+ x.ffill("x")
+ x.bfill("x")
Like pandas, xarray uses the float value ``np.nan`` (not-a-number) to represent
missing values.
@@ -90,9 +92,12 @@ for filling missing values via 1D interpolation.
.. ipython:: python
- x = xr.DataArray([0, 1, np.nan, np.nan, 2], dims=['x'],
- coords={'xx': xr.Variable('x', [0, 1, 1.1, 1.9, 3])})
- x.interpolate_na(dim='x', method='linear', use_coordinate='xx')
+ x = xr.DataArray(
+ [0, 1, np.nan, np.nan, 2],
+ dims=["x"],
+ coords={"xx": xr.Variable("x", [0, 1, 1.1, 1.9, 3])},
+ )
+ x.interpolate_na(dim="x", method="linear", use_coordinate="xx")
Note that xarray slightly diverges from the pandas ``interpolate`` syntax by
providing the ``use_coordinate`` keyword which facilitates a clear specification
@@ -110,8 +115,8 @@ applied along particular dimension(s):
.. ipython:: python
- arr.sum(dim='x')
- arr.std(['x', 'y'])
+ arr.sum(dim="x")
+ arr.std(["x", "y"])
arr.min()
@@ -121,7 +126,7 @@ for wrapping code designed to work with numpy arrays), you can use the
.. ipython:: python
- arr.get_axis_num('y')
+ arr.get_axis_num("y")
These operations automatically skip missing values, like in pandas:
@@ -142,8 +147,7 @@ method supports rolling window aggregation:
.. ipython:: python
- arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5),
- dims=('x', 'y'))
+ arr = xr.DataArray(np.arange(0, 7.5, 0.5).reshape(3, 5), dims=("x", "y"))
arr
:py:meth:`~xarray.DataArray.rolling` is applied along one dimension using the
@@ -184,9 +188,16 @@ a value when aggregating:
r = arr.rolling(y=3, center=True, min_periods=2)
r.mean()
+From version 0.17, xarray supports multidimensional rolling,
+
+.. ipython:: python
+
+ r = arr.rolling(x=2, y=3, min_periods=2)
+ r.mean()
+
.. tip::
- Note that rolling window aggregations are faster and use less memory when bottleneck_ is installed. This only applies to numpy-backed xarray objects.
+ Note that rolling window aggregations are faster and use less memory when bottleneck_ is installed. This only applies to numpy-backed xarray objects with 1d-rolling.
.. _bottleneck: https://github.com/pydata/bottleneck/
@@ -194,8 +205,9 @@ We can also manually iterate through ``Rolling`` objects:
.. code:: python
- for label, arr_window in r:
- # arr_window is a view of x
+ for label, arr_window in r:
+ # arr_window is a view of x
+ ...
.. _comput.rolling_exp:
@@ -222,9 +234,9 @@ windowed rolling, convolution, short-time FFT etc.
.. ipython:: python
# rolling with 2-point stride
- rolling_da = r.construct('window_dim', stride=2)
+ rolling_da = r.construct(x="x_win", y="y_win", stride=2)
rolling_da
- rolling_da.mean('window_dim', skipna=False)
+ rolling_da.mean(["x_win", "y_win"], skipna=False)
Because the ``DataArray`` given by ``r.construct('window_dim')`` is a view
of the original array, it is memory efficient.
@@ -232,8 +244,8 @@ You can also use ``construct`` to compute a weighted rolling sum:
.. ipython:: python
- weight = xr.DataArray([0.25, 0.5, 0.25], dims=['window'])
- arr.rolling(y=3).construct('window').dot(weight)
+ weight = xr.DataArray([0.25, 0.5, 0.25], dims=["window"])
+ arr.rolling(y=3).construct(y="window").dot(weight)
.. note::
numpy's Nan-aggregation functions such as ``nansum`` copy the original array.
@@ -254,52 +266,52 @@ support weighted ``sum`` and weighted ``mean``.
.. ipython:: python
- coords = dict(month=('month', [1, 2, 3]))
+ coords = dict(month=("month", [1, 2, 3]))
- prec = xr.DataArray([1.1, 1.0, 0.9], dims=('month', ), coords=coords)
- weights = xr.DataArray([31, 28, 31], dims=('month', ), coords=coords)
+ prec = xr.DataArray([1.1, 1.0, 0.9], dims=("month",), coords=coords)
+ weights = xr.DataArray([31, 28, 31], dims=("month",), coords=coords)
Create a weighted object:
.. ipython:: python
- weighted_prec = prec.weighted(weights)
- weighted_prec
+ weighted_prec = prec.weighted(weights)
+ weighted_prec
Calculate the weighted sum:
.. ipython:: python
- weighted_prec.sum()
+ weighted_prec.sum()
Calculate the weighted mean:
.. ipython:: python
- weighted_prec.mean(dim="month")
+ weighted_prec.mean(dim="month")
The weighted sum corresponds to:
.. ipython:: python
- weighted_sum = (prec * weights).sum()
- weighted_sum
+ weighted_sum = (prec * weights).sum()
+ weighted_sum
and the weighted mean to:
.. ipython:: python
- weighted_mean = weighted_sum / weights.sum()
- weighted_mean
+ weighted_mean = weighted_sum / weights.sum()
+ weighted_mean
However, the functions also take missing values in the data into account:
.. ipython:: python
- data = xr.DataArray([np.NaN, 2, 4])
- weights = xr.DataArray([8, 1, 1])
+ data = xr.DataArray([np.NaN, 2, 4])
+ weights = xr.DataArray([8, 1, 1])
- data.weighted(weights).mean()
+ data.weighted(weights).mean()
Using ``(data * weights).sum() / weights.sum()`` would (incorrectly) result
in 0.6.
@@ -309,16 +321,16 @@ If the weights add up to to 0, ``sum`` returns 0:
.. ipython:: python
- data = xr.DataArray([1.0, 1.0])
- weights = xr.DataArray([-1.0, 1.0])
+ data = xr.DataArray([1.0, 1.0])
+ weights = xr.DataArray([-1.0, 1.0])
- data.weighted(weights).sum()
+ data.weighted(weights).sum()
and ``mean`` returns ``NaN``:
.. ipython:: python
- data.weighted(weights).mean()
+ data.weighted(weights).mean()
.. note::
@@ -336,18 +348,21 @@ methods. This supports the block aggregation along multiple dimensions,
.. ipython:: python
- x = np.linspace(0, 10, 300)
- t = pd.date_range('15/12/1999', periods=364)
- da = xr.DataArray(np.sin(x) * np.cos(np.linspace(0, 1, 364)[:, np.newaxis]),
- dims=['time', 'x'], coords={'time': t, 'x': x})
- da
+ x = np.linspace(0, 10, 300)
+ t = pd.date_range("15/12/1999", periods=364)
+ da = xr.DataArray(
+ np.sin(x) * np.cos(np.linspace(0, 1, 364)[:, np.newaxis]),
+ dims=["time", "x"],
+ coords={"time": t, "x": x},
+ )
+ da
In order to take a block mean for every 7 days along ``time`` dimension and
every 2 points along ``x`` dimension,
.. ipython:: python
- da.coarsen(time=7, x=2).mean()
+ da.coarsen(time=7, x=2).mean()
:py:meth:`~xarray.DataArray.coarsen` raises an ``ValueError`` if the data
length is not a multiple of the corresponding window size.
@@ -356,14 +371,14 @@ the excess entries or padding ``nan`` to insufficient entries,
.. ipython:: python
- da.coarsen(time=30, x=2, boundary='trim').mean()
+ da.coarsen(time=30, x=2, boundary="trim").mean()
If you want to apply a specific function to coordinate, you can pass the
function or method name to ``coord_func`` option,
.. ipython:: python
- da.coarsen(time=7, x=2, coord_func={'time': 'min'}).mean()
+ da.coarsen(time=7, x=2, coord_func={"time": "min"}).mean()
.. _compute.using_coordinates:
@@ -377,24 +392,25 @@ central finite differences using their coordinates,
.. ipython:: python
- a = xr.DataArray([0, 1, 2, 3], dims=['x'], coords=[[0.1, 0.11, 0.2, 0.3]])
+ a = xr.DataArray([0, 1, 2, 3], dims=["x"], coords=[[0.1, 0.11, 0.2, 0.3]])
a
- a.differentiate('x')
+ a.differentiate("x")
This method can be used also for multidimensional arrays,
.. ipython:: python
- a = xr.DataArray(np.arange(8).reshape(4, 2), dims=['x', 'y'],
- coords={'x': [0.1, 0.11, 0.2, 0.3]})
- a.differentiate('x')
+ a = xr.DataArray(
+ np.arange(8).reshape(4, 2), dims=["x", "y"], coords={"x": [0.1, 0.11, 0.2, 0.3]}
+ )
+ a.differentiate("x")
:py:meth:`~xarray.DataArray.integrate` computes integration based on
trapezoidal rule using their coordinates,
.. ipython:: python
- a.integrate('x')
+ a.integrate("x")
.. note::
These methods are limited to simple cartesian geometry. Differentiation
@@ -412,9 +428,9 @@ best fitting coefficients along a given dimension and for a given order,
.. ipython:: python
- x = xr.DataArray(np.arange(10), dims=['x'], name='x')
- a = xr.DataArray(3 + 4 * x, dims=['x'], coords={'x': x})
- out = a.polyfit(dim='x', deg=1, full=True)
+ x = xr.DataArray(np.arange(10), dims=["x"], name="x")
+ a = xr.DataArray(3 + 4 * x, dims=["x"], coords={"x": x})
+ out = a.polyfit(dim="x", deg=1, full=True)
out
The method outputs a dataset containing the coefficients (and more if `full=True`).
@@ -443,9 +459,9 @@ arrays with different sizes aligned along different dimensions:
.. ipython:: python
- a = xr.DataArray([1, 2], [('x', ['a', 'b'])])
+ a = xr.DataArray([1, 2], [("x", ["a", "b"])])
a
- b = xr.DataArray([-1, -2, -3], [('y', [10, 20, 30])])
+ b = xr.DataArray([-1, -2, -3], [("y", [10, 20, 30])])
b
With xarray, we can apply binary mathematical operations to these arrays, and
@@ -460,7 +476,7 @@ appeared:
.. ipython:: python
- c = xr.DataArray(np.arange(6).reshape(3, 2), [b['y'], a['x']])
+ c = xr.DataArray(np.arange(6).reshape(3, 2), [b["y"], a["x"]])
c
a + c
@@ -494,7 +510,7 @@ operations. The default result of a binary operation is by the *intersection*
.. ipython:: python
- arr = xr.DataArray(np.arange(3), [('x', range(3))])
+ arr = xr.DataArray(np.arange(3), [("x", range(3))])
arr + arr[:-1]
If coordinate values for a dimension are missing on either argument, all
@@ -503,7 +519,7 @@ matching dimensions must have the same size:
.. ipython::
:verbatim:
- In [1]: arr + xr.DataArray([1, 2], dims='x')
+ In [1]: arr + xr.DataArray([1, 2], dims="x")
ValueError: arguments without labels along dimension 'x' cannot be aligned because they have different dimension size(s) {2} than the size of the aligned dimension labels: 3
@@ -562,16 +578,20 @@ variables:
.. ipython:: python
- ds = xr.Dataset({'x_and_y': (('x', 'y'), np.random.randn(3, 5)),
- 'x_only': ('x', np.random.randn(3))},
- coords=arr.coords)
+ ds = xr.Dataset(
+ {
+ "x_and_y": (("x", "y"), np.random.randn(3, 5)),
+ "x_only": ("x", np.random.randn(3)),
+ },
+ coords=arr.coords,
+ )
ds > 0
Datasets support most of the same methods found on data arrays:
.. ipython:: python
- ds.mean(dim='x')
+ ds.mean(dim="x")
abs(ds)
Datasets also support NumPy ufuncs (requires NumPy v1.13 or newer), or
@@ -594,7 +614,7 @@ Arithmetic between two datasets matches data variables of the same name:
.. ipython:: python
- ds2 = xr.Dataset({'x_and_y': 0, 'x_only': 100})
+ ds2 = xr.Dataset({"x_and_y": 0, "x_only": 100})
ds - ds2
Similarly to index based alignment, the result has the intersection of all
@@ -638,7 +658,7 @@ any additional arguments:
.. ipython:: python
squared_error = lambda x, y: (x - y) ** 2
- arr1 = xr.DataArray([0, 1, 2, 3], dims='x')
+ arr1 = xr.DataArray([0, 1, 2, 3], dims="x")
xr.apply_ufunc(squared_error, arr1, 1)
For using more complex operations that consider some array values collectively,
@@ -658,21 +678,21 @@ to set ``axis=-1``. As an example, here is how we would wrap
.. code-block:: python
def vector_norm(x, dim, ord=None):
- return xr.apply_ufunc(np.linalg.norm, x,
- input_core_dims=[[dim]],
- kwargs={'ord': ord, 'axis': -1})
+ return xr.apply_ufunc(
+ np.linalg.norm, x, input_core_dims=[[dim]], kwargs={"ord": ord, "axis": -1}
+ )
.. ipython:: python
- :suppress:
+ :suppress:
def vector_norm(x, dim, ord=None):
- return xr.apply_ufunc(np.linalg.norm, x,
- input_core_dims=[[dim]],
- kwargs={'ord': ord, 'axis': -1})
+ return xr.apply_ufunc(
+ np.linalg.norm, x, input_core_dims=[[dim]], kwargs={"ord": ord, "axis": -1}
+ )
.. ipython:: python
- vector_norm(arr1, dim='x')
+ vector_norm(arr1, dim="x")
Because ``apply_ufunc`` follows a standard convention for ufuncs, it plays
nicely with tools for building vectorized functions, like
diff --git a/doc/conf.py b/doc/conf.py
index 578f9cf550d..d83e966f3fa 100644
--- a/doc/conf.py
+++ b/doc/conf.py
@@ -1,4 +1,3 @@
-# -*- coding: utf-8 -*-
#
# xarray documentation build configuration file, created by
# sphinx-quickstart on Thu Feb 6 18:57:54 2014.
@@ -20,12 +19,10 @@
import sys
from contextlib import suppress
-# make sure the source version is preferred (#3567)
-root = pathlib.Path(__file__).absolute().parent.parent
-os.environ["PYTHONPATH"] = str(root)
-sys.path.insert(0, str(root))
+import sphinx_autosummary_accessors
+from jinja2.defaults import DEFAULT_FILTERS
-import xarray # isort:skip
+import xarray
allowed_failures = set()
@@ -39,7 +36,7 @@
print("pip environment:")
subprocess.run(["pip", "list"])
-print("xarray: %s, %s" % (xarray.__version__, xarray.__file__))
+print(f"xarray: {xarray.__version__}, {xarray.__file__}")
with suppress(ImportError):
import matplotlib
@@ -47,14 +44,14 @@
matplotlib.use("Agg")
try:
- import rasterio
+ import rasterio # noqa: F401
except ImportError:
allowed_failures.update(
["gallery/plot_rasterio_rgb.py", "gallery/plot_rasterio.py"]
)
try:
- import cartopy
+ import cartopy # noqa: F401
except ImportError:
allowed_failures.update(
[
@@ -79,10 +76,11 @@
"sphinx.ext.extlinks",
"sphinx.ext.mathjax",
"sphinx.ext.napoleon",
- "numpydoc",
"IPython.sphinxext.ipython_directive",
"IPython.sphinxext.ipython_console_highlighting",
"nbsphinx",
+ "sphinx_autosummary_accessors",
+ "scanpydoc.rtd_github_links",
]
extlinks = {
@@ -102,16 +100,78 @@
"""
autosummary_generate = True
+
+# for scanpydoc's jinja filter
+project_dir = pathlib.Path(__file__).parent.parent
+html_context = {
+ "github_user": "pydata",
+ "github_repo": "xarray",
+ "github_version": "master",
+}
+
autodoc_typehints = "none"
-napoleon_use_param = True
-napoleon_use_rtype = True
+napoleon_google_docstring = False
+napoleon_numpy_docstring = True
+
+napoleon_use_param = False
+napoleon_use_rtype = False
+napoleon_preprocess_types = True
+napoleon_type_aliases = {
+ # general terms
+ "sequence": ":term:`sequence`",
+ "iterable": ":term:`iterable`",
+ "callable": ":py:func:`callable`",
+ "dict_like": ":term:`dict-like `",
+ "dict-like": ":term:`dict-like `",
+ "mapping": ":term:`mapping`",
+ "file-like": ":term:`file-like `",
+ # special terms
+ # "same type as caller": "*same type as caller*", # does not work, yet
+ # "same type as values": "*same type as values*", # does not work, yet
+ # stdlib type aliases
+ "MutableMapping": "~collections.abc.MutableMapping",
+ "sys.stdout": ":obj:`sys.stdout`",
+ "timedelta": "~datetime.timedelta",
+ "string": ":class:`string `",
+ # numpy terms
+ "array_like": ":term:`array_like`",
+ "array-like": ":term:`array-like `",
+ "scalar": ":term:`scalar`",
+ "array": ":term:`array`",
+ "hashable": ":term:`hashable `",
+ # matplotlib terms
+ "color-like": ":py:func:`color-like `",
+ "matplotlib colormap name": ":doc:matplotlib colormap name ",
+ "matplotlib axes object": ":py:class:`matplotlib axes object `",
+ "colormap": ":py:class:`colormap `",
+ # objects without namespace
+ "DataArray": "~xarray.DataArray",
+ "Dataset": "~xarray.Dataset",
+ "Variable": "~xarray.Variable",
+ "ndarray": "~numpy.ndarray",
+ "MaskedArray": "~numpy.ma.MaskedArray",
+ "dtype": "~numpy.dtype",
+ "ComplexWarning": "~numpy.ComplexWarning",
+ "Index": "~pandas.Index",
+ "MultiIndex": "~pandas.MultiIndex",
+ "CategoricalIndex": "~pandas.CategoricalIndex",
+ "TimedeltaIndex": "~pandas.TimedeltaIndex",
+ "DatetimeIndex": "~pandas.DatetimeIndex",
+ "Series": "~pandas.Series",
+ "DataFrame": "~pandas.DataFrame",
+ "Categorical": "~pandas.Categorical",
+ "Path": "~~pathlib.Path",
+ # objects with abbreviated namespace (from pandas)
+ "pd.Index": "~pandas.Index",
+ "pd.NaT": "~pandas.NaT",
+}
numpydoc_class_members_toctree = True
numpydoc_show_class_members = False
# Add any paths that contain templates here, relative to this directory.
-templates_path = ["_templates"]
+templates_path = ["_templates", sphinx_autosummary_accessors.templates_path]
# The suffix of source filenames.
source_suffix = ".rst"
@@ -270,21 +330,21 @@
# -- Options for LaTeX output ---------------------------------------------
-latex_elements = {
- # The paper size ('letterpaper' or 'a4paper').
- # 'papersize': 'letterpaper',
- # The font size ('10pt', '11pt' or '12pt').
- # 'pointsize': '10pt',
- # Additional stuff for the LaTeX preamble.
- # 'preamble': '',
-}
+# latex_elements = {
+# # The paper size ('letterpaper' or 'a4paper').
+# # 'papersize': 'letterpaper',
+# # The font size ('10pt', '11pt' or '12pt').
+# # 'pointsize': '10pt',
+# # Additional stuff for the LaTeX preamble.
+# # 'preamble': '',
+# }
# Grouping the document tree into LaTeX files. List of tuples
# (source start file, target name, title,
# author, documentclass [howto, manual, or own class]).
-latex_documents = [
- ("index", "xarray.tex", "xarray Documentation", "xarray Developers", "manual")
-]
+# latex_documents = [
+# ("index", "xarray.tex", "xarray Documentation", "xarray Developers", "manual")
+# ]
# The name of an image file (relative to this directory) to place at the top of
# the title page.
@@ -311,7 +371,7 @@
# One entry per manual page. List of tuples
# (source start file, name, description, authors, manual section).
-man_pages = [("index", "xarray", "xarray Documentation", ["xarray Developers"], 1)]
+# man_pages = [("index", "xarray", "xarray Documentation", ["xarray Developers"], 1)]
# If true, show URL addresses after external links.
# man_show_urls = False
@@ -322,17 +382,17 @@
# Grouping the document tree into Texinfo files. List of tuples
# (source start file, target name, title, author,
# dir menu entry, description, category)
-texinfo_documents = [
- (
- "index",
- "xarray",
- "xarray Documentation",
- "xarray Developers",
- "xarray",
- "N-D labeled arrays and datasets in Python.",
- "Miscellaneous",
- )
-]
+# texinfo_documents = [
+# (
+# "index",
+# "xarray",
+# "xarray Documentation",
+# "xarray Developers",
+# "xarray",
+# "N-D labeled arrays and datasets in Python.",
+# "Miscellaneous",
+# )
+# ]
# Documents to append as an appendix to all manuals.
# texinfo_appendices = []
@@ -352,10 +412,20 @@
"python": ("https://docs.python.org/3/", None),
"pandas": ("https://pandas.pydata.org/pandas-docs/stable", None),
"iris": ("https://scitools.org.uk/iris/docs/latest", None),
- "numpy": ("https://docs.scipy.org/doc/numpy", None),
+ "numpy": ("https://numpy.org/doc/stable", None),
"scipy": ("https://docs.scipy.org/doc/scipy/reference", None),
"numba": ("https://numba.pydata.org/numba-doc/latest", None),
"matplotlib": ("https://matplotlib.org", None),
"dask": ("https://docs.dask.org/en/latest", None),
"cftime": ("https://unidata.github.io/cftime", None),
+ "rasterio": ("https://rasterio.readthedocs.io/en/latest", None),
+ "sparse": ("https://sparse.pydata.org/en/latest/", None),
}
+
+
+def escape_underscores(string):
+ return string.replace("_", r"\_")
+
+
+def setup(app):
+ DEFAULT_FILTERS["escape_underscores"] = escape_underscores
diff --git a/doc/contributing.rst b/doc/contributing.rst
index f581bcd9741..9c4ce5a0af2 100644
--- a/doc/contributing.rst
+++ b/doc/contributing.rst
@@ -40,8 +40,8 @@ report will allow others to reproduce the bug and provide insight into fixing. S
`this stackoverflow article `_ for tips on
writing a good bug report.
-Trying the bug-producing code out on the *master* branch is often a worthwhile exercise
-to confirm the bug still exists. It is also worth searching existing bug reports and
+Trying out the bug-producing code on the *master* branch is often a worthwhile exercise
+to confirm that the bug still exists. It is also worth searching existing bug reports and
pull requests to see if the issue has already been reported and/or fixed.
Bug reports must:
@@ -51,8 +51,9 @@ Bug reports must:
`_::
```python
- >>> import xarray as xr
- >>> df = xr.Dataset(...)
+ import xarray as xr
+ df = xr.Dataset(...)
+
...
```
@@ -148,11 +149,16 @@ We'll now kick off a two-step process:
1. Install the build dependencies
2. Build and install xarray
-.. code-block:: none
+.. code-block:: sh
# Create and activate the build environment
- # This is for Linux and MacOS. On Windows, use py37-windows.yml instead.
- conda env create -f ci/requirements/py37.yml
+ conda create -c conda-forge -n xarray-tests python=3.8
+
+ # This is for Linux and MacOS
+ conda env update -f ci/requirements/environment.yml
+
+ # On windows, use environment-windows.yml instead
+ conda env update -f ci/requirements/environment-windows.yml
conda activate xarray-tests
@@ -162,7 +168,10 @@ We'll now kick off a two-step process:
# Build and install xarray
pip install -e .
-At this point you should be able to import *xarray* from your locally built version::
+At this point you should be able to import *xarray* from your locally
+built version:
+
+.. code-block:: sh
$ python # start an interpreter
>>> import xarray
@@ -186,7 +195,7 @@ Creating a branch
-----------------
You want your master branch to reflect only production-ready code, so create a
-feature branch for making your changes. For example::
+feature branch before making your changes. For example::
git branch shiny-new-feature
git checkout shiny-new-feature
@@ -203,12 +212,12 @@ and switch in between them using the ``git checkout`` command.
To update this branch, you need to retrieve the changes from the master branch::
git fetch upstream
- git rebase upstream/master
+ git merge upstream/master
-This will replay your commits on top of the latest *xarray* git master. If this
+This will combine your commits with the latest *xarray* git master. If this
leads to merge conflicts, you must resolve these before submitting your pull
request. If you have uncommitted changes, you will need to ``git stash`` them
-prior to updating. This will effectively store your changes and they can be
+prior to updating. This will effectively store your changes, which can be
reapplied after updating.
.. _contributing.documentation:
@@ -249,30 +258,32 @@ Some other important things to know about the docs:
- The docstrings follow the **Numpy Docstring Standard**, which is used widely
in the Scientific Python community. This standard specifies the format of
the different sections of the docstring. See `this document
- `_
+ `_
for a detailed explanation, or look at some of the existing functions to
extend it in a similar manner.
- The tutorials make heavy use of the `ipython directive
`_ sphinx extension.
This directive lets you put code in the documentation which will be run
- during the doc build. For example::
+ during the doc build. For example:
+
+ .. code:: rst
.. ipython:: python
x = 2
- x**3
+ x ** 3
will be rendered as::
In [1]: x = 2
- In [2]: x**3
+ In [2]: x ** 3
Out[2]: 8
Almost all code examples in the docs are run (and the output saved) during the
doc build. This approach means that code examples will always be up to date,
- but it does make the doc building a bit more complex.
+ but it does make building the docs a bit more complex.
- Our API documentation in ``doc/api.rst`` houses the auto-generated
documentation from the docstrings. For classes, there are a few subtleties
@@ -290,7 +301,7 @@ Requirements
Make sure to follow the instructions on :ref:`creating a development environment above `, but
to build the docs you need to use the environment file ``ci/requirements/doc.yml``.
-.. code-block:: none
+.. code-block:: sh
# Create and activate the docs environment
conda env create -f ci/requirements/doc.yml
@@ -313,7 +324,7 @@ Then you can find the HTML output in the folder ``xarray/doc/_build/html/``.
The first time you build the docs, it will take quite a while because it has to run
all the code examples and build all the generated docstring pages. In subsequent
-evocations, sphinx will try to only build the pages that have been modified.
+evocations, Sphinx will try to only build the pages that have been modified.
If you want to do a full clean build, do::
@@ -347,34 +358,19 @@ Code Formatting
xarray uses several tools to ensure a consistent code format throughout the project:
-- `Black `_ for standardized code formatting
+- `Black `_ for standardized
+ code formatting
+- `blackdoc `_ for
+ standardized code formatting in documentation
- `Flake8 `_ for general code quality
- `isort `_ for standardized order in imports.
See also `flake8-isort `_.
- `mypy `_ for static type checking on `type hints
`_
-``pip``::
-
- pip install black flake8 isort mypy
-
-and then run from the root of the Xarray repository::
-
- isort -rc .
- black -t py36 .
- flake8
- mypy .
-
-to auto-format your code. Additionally, many editors have plugins that will
-apply ``black`` as you edit files.
-
-Optionally, you may wish to setup `pre-commit hooks `_
+We highly recommend that you setup `pre-commit hooks `_
to automatically run all the above tools every time you make a git commit. This
-can be done by installing ``pre-commit``::
-
- pip install pre-commit
-
-and then running::
+can be done by running::
pre-commit install
@@ -396,12 +392,8 @@ Testing With Continuous Integration
-----------------------------------
The *xarray* test suite runs automatically the
-`Azure Pipelines `__,
-continuous integration service, once your pull request is submitted. However,
-if you wish to run the test suite on a branch prior to submitting the pull
-request, then Azure Pipelines
-`needs to be configured `_
-for your GitHub repository.
+`GitHub Actions `__,
+continuous integration service, once your pull request is submitted.
A pull-request will be considered for merging when you have an all 'green' build. If any
tests are failing, then you will get a red 'X', where you can click through to see the
@@ -431,7 +423,7 @@ taken from the original GitHub issue. However, it is always worth considering a
use cases and writing corresponding tests.
Adding tests is one of the most common requests after code is pushed to *xarray*. Therefore,
-it is worth getting in the habit of writing tests ahead of time so this is never an issue.
+it is worth getting in the habit of writing tests ahead of time so that this is never an issue.
Like many packages, *xarray* uses `pytest
`_ and the convenient
@@ -467,7 +459,7 @@ typically find tests wrapped in a class.
.. code-block:: python
class TestReallyCoolFeature:
- ....
+ ...
Going forward, we are moving to a more *functional* style using the
`pytest `__ framework, which offers a richer
@@ -477,7 +469,7 @@ writing test classes, we will write test functions like this:
.. code-block:: python
def test_really_cool_feature():
- ....
+ ...
Using ``pytest``
~~~~~~~~~~~~~~~~
@@ -508,17 +500,23 @@ We would name this file ``test_cool_feature.py`` and put in an appropriate place
from xarray.testing import assert_equal
- @pytest.mark.parametrize('dtype', ['int8', 'int16', 'int32', 'int64'])
+ @pytest.mark.parametrize("dtype", ["int8", "int16", "int32", "int64"])
def test_dtypes(dtype):
assert str(np.dtype(dtype)) == dtype
- @pytest.mark.parametrize('dtype', ['float32',
- pytest.param('int16', marks=pytest.mark.skip),
- pytest.param('int32', marks=pytest.mark.xfail(
- reason='to show how it works'))])
+ @pytest.mark.parametrize(
+ "dtype",
+ [
+ "float32",
+ pytest.param("int16", marks=pytest.mark.skip),
+ pytest.param(
+ "int32", marks=pytest.mark.xfail(reason="to show how it works")
+ ),
+ ],
+ )
def test_mark(dtype):
- assert str(np.dtype(dtype)) == 'float32'
+ assert str(np.dtype(dtype)) == "float32"
@pytest.fixture
@@ -526,7 +524,7 @@ We would name this file ``test_cool_feature.py`` and put in an appropriate place
return xr.DataArray([1, 2, 3])
- @pytest.fixture(params=['int8', 'int16', 'int32', 'int64'])
+ @pytest.fixture(params=["int8", "int16", "int32", "int64"])
def dtype(request):
return request.param
@@ -610,7 +608,7 @@ need to install `pytest-xdist` via::
pip install pytest-xdist
-Then, run pytest with the optional -n argument:
+Then, run pytest with the optional -n argument::
pytest xarray -n 4
@@ -797,7 +795,7 @@ release. To submit a pull request:
This request then goes to the repository maintainers, and they will review
the code. If you need to make more changes, you can make them in
your branch, add them to a new commit, push them to GitHub, and the pull request
-will be automatically updated. Pushing them to GitHub again is done by::
+will automatically be updated. Pushing them to GitHub again is done by::
git push origin shiny-new-feature
@@ -809,8 +807,7 @@ Delete your merged branch (optional)
------------------------------------
Once your feature branch is accepted into upstream, you'll probably want to get rid of
-the branch. First, merge upstream master into your branch so git knows it is safe to
-delete your branch::
+the branch. First, update your ``master`` branch to check that the merge was successful::
git fetch upstream
git checkout master
@@ -818,12 +815,14 @@ delete your branch::
Then you can do::
- git branch -d shiny-new-feature
+ git branch -D shiny-new-feature
-Make sure you use a lower-case ``-d``, or else git won't warn you if your feature
-branch has not actually been merged.
+You need to use a upper-case ``-D`` because the branch was squashed into a
+single commit before merging. Be careful with this because ``git`` won't warn
+you if you accidentally delete an unmerged branch.
-The branch will still exist on GitHub, so to delete it there do::
+If you didn't delete your branch using GitHub's interface, then it will still exist on
+GitHub. To delete it there do::
git push origin --delete shiny-new-feature
@@ -840,8 +839,7 @@ PR checklist
- **Properly format your code** and verify that it passes the formatting guidelines set by `Black `_ and `Flake8 `_. See `"Code formatting" `_. You can use `pre-commit `_ to run these automatically on each commit.
- - Run ``black .`` in the root directory. This may modify some files. Confirm and commit any formatting changes.
- - Run ``flake8`` in the root directory. If this fails, it will log an error message.
+ - Run ``pre-commit run --all-files`` in the root directory. This may modify some files. Confirm and commit any formatting changes.
- **Push your code and** `create a PR on GitHub `_.
-- **Use a helpful title for your pull request** by summarizing the main contributions rather than using the latest commit message. If this addresses an `issue `_, please `reference it `_.
+- **Use a helpful title for your pull request** by summarizing the main contributions rather than using the latest commit message. If the PR addresses an `issue `_, please `reference it `_.
diff --git a/doc/dask.rst b/doc/dask.rst
index 07b3939af6e..4844967350b 100644
--- a/doc/dask.rst
+++ b/doc/dask.rst
@@ -1,3 +1,5 @@
+.. currentmodule:: xarray
+
.. _dask:
Parallel computing with Dask
@@ -56,19 +58,26 @@ argument to :py:func:`~xarray.open_dataset` or using the
import numpy as np
import pandas as pd
import xarray as xr
+
np.random.seed(123456)
np.set_printoptions(precision=3, linewidth=100, threshold=100, edgeitems=3)
- ds = xr.Dataset({'temperature': (('time', 'latitude', 'longitude'),
- np.random.randn(30, 180, 180)),
- 'time': pd.date_range('2015-01-01', periods=30),
- 'longitude': np.arange(180),
- 'latitude': np.arange(89.5, -90.5, -1)})
- ds.to_netcdf('example-data.nc')
+ ds = xr.Dataset(
+ {
+ "temperature": (
+ ("time", "latitude", "longitude"),
+ np.random.randn(30, 180, 180),
+ ),
+ "time": pd.date_range("2015-01-01", periods=30),
+ "longitude": np.arange(180),
+ "latitude": np.arange(89.5, -90.5, -1),
+ }
+ )
+ ds.to_netcdf("example-data.nc")
.. ipython:: python
- ds = xr.open_dataset('example-data.nc', chunks={'time': 10})
+ ds = xr.open_dataset("example-data.nc", chunks={"time": 10})
ds
In this example ``latitude`` and ``longitude`` do not appear in the ``chunks``
@@ -83,7 +92,7 @@ use :py:func:`~xarray.open_mfdataset`::
xr.open_mfdataset('my/files/*.nc', parallel=True)
This function will automatically concatenate and merge datasets into one in
-the simple cases that it understands (see :py:func:`~xarray.auto_combine`
+the simple cases that it understands (see :py:func:`~xarray.combine_by_coords`
for the full disclaimer). By default, :py:meth:`~xarray.open_mfdataset` will chunk each
netCDF file into a single Dask array; again, supply the ``chunks`` argument to
control the size of the resulting Dask arrays. In more complex cases, you can
@@ -106,7 +115,7 @@ usual way.
.. ipython:: python
- ds.to_netcdf('manipulated-example-data.nc')
+ ds.to_netcdf("manipulated-example-data.nc")
By setting the ``compute`` argument to ``False``, :py:meth:`~xarray.Dataset.to_netcdf`
will return a ``dask.delayed`` object that can be computed later.
@@ -114,8 +123,9 @@ will return a ``dask.delayed`` object that can be computed later.
.. ipython:: python
from dask.diagnostics import ProgressBar
+
# or distributed.progress when using the distributed scheduler
- delayed_obj = ds.to_netcdf('manipulated-example-data.nc', compute=False)
+ delayed_obj = ds.to_netcdf("manipulated-example-data.nc", compute=False)
with ProgressBar():
results = delayed_obj.compute()
@@ -141,8 +151,9 @@ Dask DataFrames do not support multi-indexes so the coordinate variables from th
:suppress:
import os
- os.remove('example-data.nc')
- os.remove('manipulated-example-data.nc')
+
+ os.remove("example-data.nc")
+ os.remove("manipulated-example-data.nc")
Using Dask with xarray
----------------------
@@ -199,7 +210,7 @@ Dask arrays using the :py:meth:`~xarray.Dataset.persist` method:
.. ipython:: python
- ds = ds.persist()
+ ds = ds.persist()
:py:meth:`~xarray.Dataset.persist` is particularly useful when using a
distributed cluster because the data will be loaded into distributed memory
@@ -224,11 +235,11 @@ sizes of Dask arrays is done with the :py:meth:`~xarray.Dataset.chunk` method:
.. ipython:: python
:suppress:
- ds = ds.chunk({'time': 10})
+ ds = ds.chunk({"time": 10})
.. ipython:: python
- rechunked = ds.chunk({'latitude': 100, 'longitude': 100})
+ rechunked = ds.chunk({"latitude": 100, "longitude": 100})
You can view the size of existing chunks on an array by viewing the
:py:attr:`~xarray.Dataset.chunks` attribute:
@@ -256,6 +267,7 @@ lazy Dask arrays, in the :ref:`xarray.ufuncs ` module:
.. ipython:: python
import xarray.ufuncs as xu
+
xu.sin(rechunked)
To access Dask arrays directly, use the new
@@ -274,12 +286,21 @@ loaded into Dask or not:
.. _dask.automatic-parallelization:
-Automatic parallelization
--------------------------
+Automatic parallelization with ``apply_ufunc`` and ``map_blocks``
+-----------------------------------------------------------------
Almost all of xarray's built-in operations work on Dask arrays. If you want to
-use a function that isn't wrapped by xarray, one option is to extract Dask
-arrays from xarray objects (``.data``) and use Dask directly.
+use a function that isn't wrapped by xarray, and have it applied in parallel on
+each block of your xarray object, you have three options:
+
+1. Extract Dask arrays from xarray objects (``.data``) and use Dask directly.
+2. Use :py:func:`~xarray.apply_ufunc` to apply functions that consume and return NumPy arrays.
+3. Use :py:func:`~xarray.map_blocks`, :py:meth:`Dataset.map_blocks` or :py:meth:`DataArray.map_blocks`
+ to apply functions that consume and return xarray objects.
+
+
+``apply_ufunc``
+~~~~~~~~~~~~~~~
Another option is to use xarray's :py:func:`~xarray.apply_ufunc`, which can
automate `embarrassingly parallel
@@ -302,24 +323,32 @@ we use to calculate `Spearman's rank-correlation coefficient ` and
@@ -453,15 +470,15 @@ dataset variables:
.. ipython:: python
- ds.rename({'temperature': 'temp', 'precipitation': 'precip'})
+ ds.rename({"temperature": "temp", "precipitation": "precip"})
The related :py:meth:`~xarray.Dataset.swap_dims` method allows you do to swap
dimension and non-dimension variables:
.. ipython:: python
- ds.coords['day'] = ('time', [6, 7, 8])
- ds.swap_dims({'time': 'day'})
+ ds.coords["day"] = ("time", [6, 7, 8])
+ ds.swap_dims({"time": "day"})
.. _coordinates:
@@ -519,8 +536,8 @@ To convert back and forth between data and coordinates, you can use the
.. ipython:: python
ds.reset_coords()
- ds.set_coords(['temperature', 'precipitation'])
- ds['temperature'].reset_coords(drop=True)
+ ds.set_coords(["temperature", "precipitation"])
+ ds["temperature"].reset_coords(drop=True)
Notice that these operations skip coordinates with names given by dimensions,
as used for indexing. This mostly because we are not entirely sure how to
@@ -544,7 +561,7 @@ logic used for merging coordinates in arithmetic operations
.. ipython:: python
- alt = xr.Dataset(coords={'z': [10], 'lat': 0, 'lon': 0})
+ alt = xr.Dataset(coords={"z": [10], "lat": 0, "lon": 0})
ds.coords.merge(alt.coords)
The ``coords.merge`` method may be useful if you want to implement your own
@@ -560,7 +577,7 @@ To convert a coordinate (or any ``DataArray``) into an actual
.. ipython:: python
- ds['time'].to_index()
+ ds["time"].to_index()
A useful shortcut is the ``indexes`` property (on both ``DataArray`` and
``Dataset``), which lazily constructs a dictionary whose keys are given by each
@@ -577,9 +594,10 @@ Xarray supports labeling coordinate values with a :py:class:`pandas.MultiIndex`:
.. ipython:: python
- midx = pd.MultiIndex.from_arrays([['R', 'R', 'V', 'V'], [.1, .2, .7, .9]],
- names=('band', 'wn'))
- mda = xr.DataArray(np.random.rand(4), coords={'spec': midx}, dims='spec')
+ midx = pd.MultiIndex.from_arrays(
+ [["R", "R", "V", "V"], [0.1, 0.2, 0.7, 0.9]], names=("band", "wn")
+ )
+ mda = xr.DataArray(np.random.rand(4), coords={"spec": midx}, dims="spec")
mda
For convenience multi-index levels are directly accessible as "virtual" or
@@ -587,8 +605,8 @@ For convenience multi-index levels are directly accessible as "virtual" or
.. ipython:: python
- mda['band']
- mda.wn
+ mda["band"]
+ mda.wn
Indexing with multi-index levels is also possible using the ``sel`` method
(see :ref:`multi-level indexing`).
diff --git a/doc/duckarrays.rst b/doc/duckarrays.rst
new file mode 100644
index 00000000000..ba13d5160ae
--- /dev/null
+++ b/doc/duckarrays.rst
@@ -0,0 +1,65 @@
+.. currentmodule:: xarray
+
+Working with numpy-like arrays
+==============================
+
+.. warning::
+
+ This feature should be considered experimental. Please report any bug you may find on
+ xarray’s github repository.
+
+Numpy-like arrays (:term:`duck array`) extend the :py:class:`numpy.ndarray` with
+additional features, like propagating physical units or a different layout in memory.
+
+:py:class:`DataArray` and :py:class:`Dataset` objects can wrap these duck arrays, as
+long as they satisfy certain conditions (see :ref:`internals.duck_arrays`).
+
+.. note::
+
+ For ``dask`` support see :ref:`dask`.
+
+
+Missing features
+----------------
+Most of the API does support :term:`duck array` objects, but there are a few areas where
+the code will still cast to ``numpy`` arrays:
+
+- dimension coordinates, and thus all indexing operations:
+
+ * :py:meth:`Dataset.sel` and :py:meth:`DataArray.sel`
+ * :py:meth:`Dataset.loc` and :py:meth:`DataArray.loc`
+ * :py:meth:`Dataset.drop_sel` and :py:meth:`DataArray.drop_sel`
+ * :py:meth:`Dataset.reindex`, :py:meth:`Dataset.reindex_like`,
+ :py:meth:`DataArray.reindex` and :py:meth:`DataArray.reindex_like`: duck arrays in
+ data variables and non-dimension coordinates won't be casted
+
+- functions and methods that depend on external libraries or features of ``numpy`` not
+ covered by ``__array_function__`` / ``__array_ufunc__``:
+
+ * :py:meth:`Dataset.ffill` and :py:meth:`DataArray.ffill` (uses ``bottleneck``)
+ * :py:meth:`Dataset.bfill` and :py:meth:`DataArray.bfill` (uses ``bottleneck``)
+ * :py:meth:`Dataset.interp`, :py:meth:`Dataset.interp_like`,
+ :py:meth:`DataArray.interp` and :py:meth:`DataArray.interp_like` (uses ``scipy``):
+ duck arrays in data variables and non-dimension coordinates will be casted in
+ addition to not supporting duck arrays in dimension coordinates
+ * :py:meth:`Dataset.rolling_exp` and :py:meth:`DataArray.rolling_exp` (uses
+ ``numbagg``)
+ * :py:meth:`Dataset.rolling` and :py:meth:`DataArray.rolling` (uses internal functions
+ of ``numpy``)
+ * :py:meth:`Dataset.interpolate_na` and :py:meth:`DataArray.interpolate_na` (uses
+ :py:class:`numpy.vectorize`)
+ * :py:func:`apply_ufunc` with ``vectorize=True`` (uses :py:class:`numpy.vectorize`)
+
+- incompatibilities between different :term:`duck array` libraries:
+
+ * :py:meth:`Dataset.chunk` and :py:meth:`DataArray.chunk`: this fails if the data was
+ not already chunked and the :term:`duck array` (e.g. a ``pint`` quantity) should
+ wrap the new ``dask`` array; changing the chunk sizes works.
+
+
+Extensions using duck arrays
+----------------------------
+Here's a list of libraries extending ``xarray`` to make working with wrapped duck arrays
+easier:
+
+- `pint-xarray `_
diff --git a/doc/examples.rst b/doc/examples.rst
index 1d48d29bcc5..102138b6e4e 100644
--- a/doc/examples.rst
+++ b/doc/examples.rst
@@ -2,7 +2,7 @@ Examples
========
.. toctree::
- :maxdepth: 2
+ :maxdepth: 1
examples/weather-data
examples/monthly-means
@@ -15,7 +15,7 @@ Examples
Using apply_ufunc
------------------
.. toctree::
- :maxdepth: 2
+ :maxdepth: 1
examples/apply_ufunc_vectorize_1d
diff --git a/doc/examples/apply_ufunc_vectorize_1d.ipynb b/doc/examples/apply_ufunc_vectorize_1d.ipynb
index 6d18d48fdb5..a79a4868b63 100644
--- a/doc/examples/apply_ufunc_vectorize_1d.ipynb
+++ b/doc/examples/apply_ufunc_vectorize_1d.ipynb
@@ -333,7 +333,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
- "Now our function currently only works on one vector of data which is not so useful given our 3D dataset.\n",
+ "Now our function currently only works on one vector of data which is not so useful given our 3D dataset.\n",
"Let's try passing the whole dataset. We add a `print` statement so we can see what our function receives."
]
},
diff --git a/doc/examples/area_weighted_temperature.ipynb b/doc/examples/area_weighted_temperature.ipynb
index 72876e3fc29..de705966583 100644
--- a/doc/examples/area_weighted_temperature.ipynb
+++ b/doc/examples/area_weighted_temperature.ipynb
@@ -106,7 +106,7 @@
"source": [
"### Creating weights\n",
"\n",
- "For a for a rectangular grid the cosine of the latitude is proportional to the grid cell area."
+ "For a rectangular grid the cosine of the latitude is proportional to the grid cell area."
]
},
{
diff --git a/doc/faq.rst b/doc/faq.rst
index 576cec5c2b1..a2b8be47e06 100644
--- a/doc/faq.rst
+++ b/doc/faq.rst
@@ -4,11 +4,12 @@ Frequently Asked Questions
==========================
.. ipython:: python
- :suppress:
+ :suppress:
import numpy as np
import pandas as pd
import xarray as xr
+
np.random.seed(123456)
@@ -103,21 +104,21 @@ code fragment
.. ipython:: python
arr = xr.DataArray([1, 2, 3])
- pd.Series({'x': arr[0], 'mean': arr.mean(), 'std': arr.std()})
+ pd.Series({"x": arr[0], "mean": arr.mean(), "std": arr.std()})
does not yield the pandas DataFrame we expected. We need to specify the type
conversion ourselves:
.. ipython:: python
- pd.Series({'x': arr[0], 'mean': arr.mean(), 'std': arr.std()}, dtype=float)
+ pd.Series({"x": arr[0], "mean": arr.mean(), "std": arr.std()}, dtype=float)
Alternatively, we could use the ``item`` method or the ``float`` constructor to
convert values one at a time
.. ipython:: python
- pd.Series({'x': arr[0].item(), 'mean': float(arr.mean())})
+ pd.Series({"x": arr[0].item(), "mean": float(arr.mean())})
.. _approach to metadata:
diff --git a/doc/gallery/README.txt b/doc/gallery/README.txt
index b17f803696b..63f7d477cf4 100644
--- a/doc/gallery/README.txt
+++ b/doc/gallery/README.txt
@@ -2,4 +2,3 @@
Gallery
=======
-
diff --git a/doc/groupby.rst b/doc/groupby.rst
index 223185bd0d5..d0c0b1849f9 100644
--- a/doc/groupby.rst
+++ b/doc/groupby.rst
@@ -26,11 +26,12 @@ Split
Let's create a simple example dataset:
.. ipython:: python
- :suppress:
+ :suppress:
import numpy as np
import pandas as pd
import xarray as xr
+
np.random.seed(123456)
.. ipython:: python
@@ -47,20 +48,20 @@ use a DataArray directly), we get back a ``GroupBy`` object:
.. ipython:: python
- ds.groupby('letters')
+ ds.groupby("letters")
This object works very similarly to a pandas GroupBy object. You can view
the group indices with the ``groups`` attribute:
.. ipython:: python
- ds.groupby('letters').groups
+ ds.groupby("letters").groups
You can also iterate over groups in ``(label, group)`` pairs:
.. ipython:: python
- list(ds.groupby('letters'))
+ list(ds.groupby("letters"))
Just like in pandas, creating a GroupBy object is cheap: it does not actually
split the data until you access particular values.
@@ -75,8 +76,8 @@ a customized coordinate, but xarray facilitates this via the
.. ipython:: python
- x_bins = [0,25,50]
- ds.groupby_bins('x', x_bins).groups
+ x_bins = [0, 25, 50]
+ ds.groupby_bins("x", x_bins).groups
The binning is implemented via :func:`pandas.cut`, whose documentation details how
the bins are assigned. As seen in the example above, by default, the bins are
@@ -86,8 +87,8 @@ choose `float` labels which identify the bin centers:
.. ipython:: python
- x_bin_labels = [12.5,37.5]
- ds.groupby_bins('x', x_bins, labels=x_bin_labels).groups
+ x_bin_labels = [12.5, 37.5]
+ ds.groupby_bins("x", x_bins, labels=x_bin_labels).groups
Apply
@@ -102,7 +103,8 @@ concatenated back together along the group axis:
def standardize(x):
return (x - x.mean()) / x.std()
- arr.groupby('letters').map(standardize)
+
+ arr.groupby("letters").map(standardize)
GroupBy objects also have a :py:meth:`~xarray.core.groupby.DatasetGroupBy.reduce` method and
methods like :py:meth:`~xarray.core.groupby.DatasetGroupBy.mean` as shortcuts for applying an
@@ -110,19 +112,19 @@ aggregation function:
.. ipython:: python
- arr.groupby('letters').mean(dim='x')
+ arr.groupby("letters").mean(dim="x")
Using a groupby is thus also a convenient shortcut for aggregating over all
dimensions *other than* the provided one:
.. ipython:: python
- ds.groupby('x').std(...)
+ ds.groupby("x").std(...)
.. note::
We use an ellipsis (`...`) here to indicate we want to reduce over all
- other dimensions
+ other dimensions
First and last
@@ -134,7 +136,7 @@ values for group along the grouped dimension:
.. ipython:: python
- ds.groupby('letters').first(...)
+ ds.groupby("letters").first(...)
By default, they skip missing values (control this with ``skipna``).
@@ -149,9 +151,9 @@ coordinates. For example:
.. ipython:: python
- alt = arr.groupby('letters').mean(...)
+ alt = arr.groupby("letters").mean(...)
alt
- ds.groupby('letters') - alt
+ ds.groupby("letters") - alt
This last line is roughly equivalent to the following::
@@ -169,11 +171,11 @@ the ``squeeze`` parameter:
.. ipython:: python
- next(iter(arr.groupby('x')))
+ next(iter(arr.groupby("x")))
.. ipython:: python
- next(iter(arr.groupby('x', squeeze=False)))
+ next(iter(arr.groupby("x", squeeze=False)))
Although xarray will attempt to automatically
:py:attr:`~xarray.DataArray.transpose` dimensions back into their original order
@@ -197,13 +199,17 @@ __ http://cfconventions.org/cf-conventions/v1.6.0/cf-conventions.html#_two_dimen
.. ipython:: python
- da = xr.DataArray([[0,1],[2,3]],
- coords={'lon': (['ny','nx'], [[30,40],[40,50]] ),
- 'lat': (['ny','nx'], [[10,10],[20,20]] ),},
- dims=['ny','nx'])
+ da = xr.DataArray(
+ [[0, 1], [2, 3]],
+ coords={
+ "lon": (["ny", "nx"], [[30, 40], [40, 50]]),
+ "lat": (["ny", "nx"], [[10, 10], [20, 20]]),
+ },
+ dims=["ny", "nx"],
+ )
da
- da.groupby('lon').sum(...)
- da.groupby('lon').map(lambda x: x - x.mean(), shortcut=False)
+ da.groupby("lon").sum(...)
+ da.groupby("lon").map(lambda x: x - x.mean(), shortcut=False)
Because multidimensional groups have the ability to generate a very large
number of bins, coarse-binning via :py:meth:`~xarray.Dataset.groupby_bins`
@@ -211,13 +217,13 @@ may be desirable:
.. ipython:: python
- da.groupby_bins('lon', [0,45,50]).sum()
+ da.groupby_bins("lon", [0, 45, 50]).sum()
These methods group by `lon` values. It is also possible to groupby each
-cell in a grid, regardless of value, by stacking multiple dimensions,
+cell in a grid, regardless of value, by stacking multiple dimensions,
applying your function, and then unstacking the result:
.. ipython:: python
- stacked = da.stack(gridcell=['ny', 'nx'])
- stacked.groupby('gridcell').sum(...).unstack('gridcell')
+ stacked = da.stack(gridcell=["ny", "nx"])
+ stacked.groupby("gridcell").sum(...).unstack("gridcell")
diff --git a/doc/howdoi.rst b/doc/howdoi.rst
index 84c0c786027..3604d66bd0c 100644
--- a/doc/howdoi.rst
+++ b/doc/howdoi.rst
@@ -59,4 +59,3 @@ How do I ...
- ``obj.dt.ceil``, ``obj.dt.floor``, ``obj.dt.round``. See :ref:`dt_accessor` for more.
* - make a mask that is ``True`` where an object contains any of the values in a array
- :py:meth:`Dataset.isin`, :py:meth:`DataArray.isin`
-
diff --git a/doc/index.rst b/doc/index.rst
index 972eb0a732e..ee44d0ad4d9 100644
--- a/doc/index.rst
+++ b/doc/index.rst
@@ -60,6 +60,7 @@ Documentation
* :doc:`io`
* :doc:`dask`
* :doc:`plotting`
+* :doc:`duckarrays`
.. toctree::
:maxdepth: 1
@@ -80,6 +81,7 @@ Documentation
io
dask
plotting
+ duckarrays
**Help & reference**
@@ -107,6 +109,7 @@ Documentation
See also
--------
+- `Xarray's Tutorial`_ presented at the 2020 SciPy Conference (`video recording`_).
- Stephan Hoyer and Joe Hamman's `Journal of Open Research Software paper`_ describing the xarray project.
- The `UW eScience Institute's Geohackweek`_ tutorial on xarray for geospatial data scientists.
- Stephan Hoyer's `SciPy2015 talk`_ introducing xarray to a general audience.
@@ -114,6 +117,8 @@ See also
xarray to users familiar with netCDF.
- `Nicolas Fauchereau's tutorial`_ on xarray for netCDF users.
+.. _Xarray's Tutorial: https://xarray-contrib.github.io/xarray-tutorial/
+.. _video recording: https://youtu.be/mecN-Ph_-78
.. _Journal of Open Research Software paper: http://doi.org/10.5334/jors.148
.. _UW eScience Institute's Geohackweek : https://geohackweek.github.io/nDarrays/
.. _SciPy2015 talk: https://www.youtube.com/watch?v=X0pAhJgySxk
diff --git a/doc/indexing.rst b/doc/indexing.rst
index cfbb84a8343..78766b8fd81 100644
--- a/doc/indexing.rst
+++ b/doc/indexing.rst
@@ -4,11 +4,12 @@ Indexing and selecting data
===========================
.. ipython:: python
- :suppress:
+ :suppress:
import numpy as np
import pandas as pd
import xarray as xr
+
np.random.seed(123456)
xarray offers extremely flexible indexing routines that combine the best
@@ -60,9 +61,13 @@ DataArray:
.. ipython:: python
- da = xr.DataArray(np.random.rand(4, 3),
- [('time', pd.date_range('2000-01-01', periods=4)),
- ('space', ['IA', 'IL', 'IN'])])
+ da = xr.DataArray(
+ np.random.rand(4, 3),
+ [
+ ("time", pd.date_range("2000-01-01", periods=4)),
+ ("space", ["IA", "IL", "IN"]),
+ ],
+ )
da[:2]
da[0, 0]
da[:, [2, 1]]
@@ -81,7 +86,7 @@ fast. To do label based indexing, use the :py:attr:`~xarray.DataArray.loc` attri
.. ipython:: python
- da.loc['2000-01-01':'2000-01-02', 'IA']
+ da.loc["2000-01-01":"2000-01-02", "IA"]
In this example, the selected is a subpart of the array
in the range '2000-01-01':'2000-01-02' along the first coordinate `time`
@@ -98,7 +103,7 @@ Setting values with label based indexing is also supported:
.. ipython:: python
- da.loc['2000-01-01', ['IL', 'IN']] = -10
+ da.loc["2000-01-01", ["IL", "IN"]] = -10
da
@@ -117,7 +122,7 @@ use them explicitly to slice data. There are two ways to do this:
da[dict(space=0, time=slice(None, 2))]
# index by dimension coordinate labels
- da.loc[dict(time=slice('2000-01-01', '2000-01-02'))]
+ da.loc[dict(time=slice("2000-01-01", "2000-01-02"))]
2. Use the :py:meth:`~xarray.DataArray.sel` and :py:meth:`~xarray.DataArray.isel`
convenience methods:
@@ -128,7 +133,7 @@ use them explicitly to slice data. There are two ways to do this:
da.isel(space=0, time=slice(None, 2))
# index by dimension coordinate labels
- da.sel(time=slice('2000-01-01', '2000-01-02'))
+ da.sel(time=slice("2000-01-01", "2000-01-02"))
The arguments to these methods can be any objects that could index the array
along the dimension given by the keyword, e.g., labels for an individual value,
@@ -156,16 +161,16 @@ enabling nearest neighbor (inexact) lookups by use of the methods ``'pad'``,
.. ipython:: python
- da = xr.DataArray([1, 2, 3], [('x', [0, 1, 2])])
- da.sel(x=[1.1, 1.9], method='nearest')
- da.sel(x=0.1, method='backfill')
- da.reindex(x=[0.5, 1, 1.5, 2, 2.5], method='pad')
+ da = xr.DataArray([1, 2, 3], [("x", [0, 1, 2])])
+ da.sel(x=[1.1, 1.9], method="nearest")
+ da.sel(x=0.1, method="backfill")
+ da.reindex(x=[0.5, 1, 1.5, 2, 2.5], method="pad")
Tolerance limits the maximum distance for valid matches with an inexact lookup:
.. ipython:: python
- da.reindex(x=[1.1, 1.5], method='nearest', tolerance=0.2)
+ da.reindex(x=[1.1, 1.5], method="nearest", tolerance=0.2)
The method parameter is not yet supported if any of the arguments
to ``.sel()`` is a ``slice`` object:
@@ -173,7 +178,7 @@ to ``.sel()`` is a ``slice`` object:
.. ipython::
:verbatim:
- In [1]: da.sel(x=slice(1, 3), method='nearest')
+ In [1]: da.sel(x=slice(1, 3), method="nearest")
NotImplementedError
However, you don't need to use ``method`` to do inexact slicing. Slicing
@@ -182,15 +187,15 @@ labels are monotonic increasing:
.. ipython:: python
- da.sel(x=slice(0.9, 3.1))
+ da.sel(x=slice(0.9, 3.1))
Indexing axes with monotonic decreasing labels also works, as long as the
``slice`` or ``.loc`` arguments are also decreasing:
.. ipython:: python
- reversed_da = da[::-1]
- reversed_da.loc[3.1:0.9]
+ reversed_da = da[::-1]
+ reversed_da.loc[3.1:0.9]
.. note::
@@ -227,7 +232,7 @@ arrays). However, you can do normal indexing with dimension names:
.. ipython:: python
ds[dict(space=[0], time=[0])]
- ds.loc[dict(time='2000-01-01')]
+ ds.loc[dict(time="2000-01-01")]
Using indexing to *assign* values to a subset of dataset (e.g.,
``ds[dict(space=0)] = 1``) is not yet supported.
@@ -240,7 +245,7 @@ index labels along a dimension dropped:
.. ipython:: python
- ds.drop_sel(space=['IN', 'IL'])
+ ds.drop_sel(space=["IN", "IL"])
``drop_sel`` is both a ``Dataset`` and ``DataArray`` method.
@@ -249,7 +254,7 @@ Any variables with these dimensions are also dropped:
.. ipython:: python
- ds.drop_dims('time')
+ ds.drop_dims("time")
.. _masking with where:
@@ -263,7 +268,7 @@ xarray, use :py:meth:`~xarray.DataArray.where`:
.. ipython:: python
- da = xr.DataArray(np.arange(16).reshape(4, 4), dims=['x', 'y'])
+ da = xr.DataArray(np.arange(16).reshape(4, 4), dims=["x", "y"])
da.where(da.x + da.y < 4)
This is particularly useful for ragged indexing of multi-dimensional data,
@@ -296,7 +301,7 @@ multiple values, use :py:meth:`~xarray.DataArray.isin`:
.. ipython:: python
- da = xr.DataArray([1, 2, 3, 4, 5], dims=['x'])
+ da = xr.DataArray([1, 2, 3, 4, 5], dims=["x"])
da.isin([2, 4])
:py:meth:`~xarray.DataArray.isin` works particularly well with
@@ -305,7 +310,7 @@ already labels of an array:
.. ipython:: python
- lookup = xr.DataArray([-1, -2, -3, -4, -5], dims=['x'])
+ lookup = xr.DataArray([-1, -2, -3, -4, -5], dims=["x"])
da.where(lookup.isin([-2, -4]), drop=True)
However, some caution is in order: when done repeatedly, this type of indexing
@@ -328,14 +333,13 @@ MATLAB, or after using the :py:func:`numpy.ix_` helper:
.. ipython:: python
-
da = xr.DataArray(
np.arange(12).reshape((3, 4)),
dims=["x", "y"],
coords={"x": [0, 1, 2], "y": ["a", "b", "c", "d"]},
)
da
- da[[0, 1], [1, 1]]
+ da[[0, 2, 2], [1, 3]]
For more flexibility, you can supply :py:meth:`~xarray.DataArray` objects
as indexers.
@@ -344,8 +348,8 @@ dimensions:
.. ipython:: python
- ind_x = xr.DataArray([0, 1], dims=['x'])
- ind_y = xr.DataArray([0, 1], dims=['y'])
+ ind_x = xr.DataArray([0, 1], dims=["x"])
+ ind_y = xr.DataArray([0, 1], dims=["y"])
da[ind_x, ind_y] # orthogonal indexing
da[ind_x, ind_x] # vectorized indexing
@@ -364,7 +368,7 @@ indexers' dimension:
.. ipython:: python
- ind = xr.DataArray([[0, 1], [0, 1]], dims=['a', 'b'])
+ ind = xr.DataArray([[0, 1], [0, 1]], dims=["a", "b"])
da[ind]
Similar to how NumPy's `advanced indexing`_ works, vectorized
@@ -378,18 +382,18 @@ Vectorized indexing also works with ``isel``, ``loc``, and ``sel``:
.. ipython:: python
- ind = xr.DataArray([[0, 1], [0, 1]], dims=['a', 'b'])
+ ind = xr.DataArray([[0, 1], [0, 1]], dims=["a", "b"])
da.isel(y=ind) # same as da[:, ind]
- ind = xr.DataArray([['a', 'b'], ['b', 'a']], dims=['a', 'b'])
+ ind = xr.DataArray([["a", "b"], ["b", "a"]], dims=["a", "b"])
da.loc[:, ind] # same as da.sel(y=ind)
These methods may also be applied to ``Dataset`` objects
.. ipython:: python
- ds = da.to_dataset(name='bar')
- ds.isel(x=xr.DataArray([0, 1, 2], dims=['points']))
+ ds = da.to_dataset(name="bar")
+ ds.isel(x=xr.DataArray([0, 1, 2], dims=["points"]))
.. tip::
@@ -476,8 +480,8 @@ Like ``numpy.ndarray``, value assignment sometimes works differently from what o
.. ipython:: python
- da = xr.DataArray([0, 1, 2, 3], dims=['x'])
- ind = xr.DataArray([0, 0, 0], dims=['x'])
+ da = xr.DataArray([0, 1, 2, 3], dims=["x"])
+ ind = xr.DataArray([0, 0, 0], dims=["x"])
da[ind] -= 1
da
@@ -511,7 +515,7 @@ __ https://docs.scipy.org/doc/numpy/user/basics.indexing.html#assigning-values-t
.. ipython:: python
- da = xr.DataArray([0, 1, 2, 3], dims=['x'])
+ da = xr.DataArray([0, 1, 2, 3], dims=["x"])
# DO NOT do this
da.isel(x=[0, 1, 2])[1] = -1
da
@@ -544,7 +548,7 @@ you can supply a :py:class:`~xarray.DataArray` with a coordinate,
x=xr.DataArray([0, 1, 6], dims="z", coords={"z": ["a", "b", "c"]}),
y=xr.DataArray([0, 1, 0], dims="z"),
)
-
+
Analogously, label-based pointwise-indexing is also possible by the ``.sel``
method:
@@ -581,15 +585,15 @@ To reindex a particular dimension, use :py:meth:`~xarray.DataArray.reindex`:
.. ipython:: python
- da.reindex(space=['IA', 'CA'])
+ da.reindex(space=["IA", "CA"])
The :py:meth:`~xarray.DataArray.reindex_like` method is a useful shortcut.
To demonstrate, we will make a subset DataArray with new values:
.. ipython:: python
- foo = da.rename('foo')
- baz = (10 * da[:2, :2]).rename('baz')
+ foo = da.rename("foo")
+ baz = (10 * da[:2, :2]).rename("baz")
baz
Reindexing ``foo`` with ``baz`` selects out the first two values along each
@@ -611,8 +615,8 @@ The :py:func:`~xarray.align` function lets us perform more flexible database-lik
.. ipython:: python
- xr.align(foo, baz, join='inner')
- xr.align(foo, baz, join='outer')
+ xr.align(foo, baz, join="inner")
+ xr.align(foo, baz, join="outer")
Both ``reindex_like`` and ``align`` work interchangeably between
:py:class:`~xarray.DataArray` and :py:class:`~xarray.Dataset` objects, and with any number of matching dimension names:
@@ -621,7 +625,7 @@ Both ``reindex_like`` and ``align`` work interchangeably between
ds
ds.reindex_like(baz)
- other = xr.DataArray(['a', 'b', 'c'], dims='other')
+ other = xr.DataArray(["a", "b", "c"], dims="other")
# this is a no-op, because there are no shared dimension names
ds.reindex_like(other)
@@ -636,7 +640,7 @@ integer-based indexing as a fallback for dimensions without a coordinate label:
.. ipython:: python
- da = xr.DataArray([1, 2, 3], dims='x')
+ da = xr.DataArray([1, 2, 3], dims="x")
da.sel(x=[0, -1])
Alignment between xarray objects where one or both do not have coordinate labels
@@ -675,9 +679,9 @@ labels:
.. ipython:: python
- da = xr.DataArray([1, 2, 3], dims='x')
+ da = xr.DataArray([1, 2, 3], dims="x")
da
- da.get_index('x')
+ da.get_index("x")
.. _copies_vs_views:
@@ -721,7 +725,6 @@ pandas:
.. ipython:: python
-
midx = pd.MultiIndex.from_product([list("abc"), [0, 1]], names=("one", "two"))
mda = xr.DataArray(np.random.rand(6, 3), [("x", midx), ("y", range(3))])
mda
@@ -732,20 +735,20 @@ a slice of tuples:
.. ipython:: python
- mda.sel(x=[('a', 0), ('b', 1)])
+ mda.sel(x=[("a", 0), ("b", 1)])
Additionally, xarray supports dictionaries:
.. ipython:: python
- mda.sel(x={'one': 'a', 'two': 0})
+ mda.sel(x={"one": "a", "two": 0})
For convenience, ``sel`` also accepts multi-index levels directly
as keyword arguments:
.. ipython:: python
- mda.sel(one='a', two=0)
+ mda.sel(one="a", two=0)
Note that using ``sel`` it is not possible to mix a dimension
indexer with level indexers for that dimension
@@ -757,7 +760,7 @@ multi-index is reduced to a single index.
.. ipython:: python
- mda.loc[{'one': 'a'}, ...]
+ mda.loc[{"one": "a"}, ...]
Unlike pandas, xarray does not guess whether you provide index levels or
dimensions when using ``loc`` in some ambiguous cases. For example, for
diff --git a/doc/installing.rst b/doc/installing.rst
index a25bf65e342..99b8b621aed 100644
--- a/doc/installing.rst
+++ b/doc/installing.rst
@@ -6,8 +6,8 @@ Installation
Required dependencies
---------------------
-- Python (3.6 or later)
-- setuptools
+- Python (3.7 or later)
+- setuptools (40.4 or later)
- `numpy `__ (1.15 or later)
- `pandas `__ (0.25 or later)
@@ -16,6 +16,12 @@ Required dependencies
Optional dependencies
---------------------
+.. note::
+
+ If you are using pip to install xarray, optional dependencies can be installed by
+ specifying *extras*. :ref:`installation-instructions` for both pip and conda
+ are given below.
+
For netCDF and IO
~~~~~~~~~~~~~~~~~
@@ -25,8 +31,9 @@ For netCDF and IO
- `pydap `__: used as a fallback for accessing OPeNDAP
- `h5netcdf `__: an alternative library for
reading and writing netCDF4 files that does not use the netCDF-C libraries
-- `pynio `__: for reading GRIB and other
- geoscience specific file formats. Note that pynio is not available for Windows.
+- `PyNIO `__: for reading GRIB and other
+ geoscience specific file formats. Note that PyNIO is not available for Windows and
+ that the PyNIO backend may be moved outside of xarray in the future.
- `zarr `__: for chunked, compressed, N-dimensional arrays.
- `cftime `__: recommended if you
want to encode/decode datetimes for non-standard calendars or dates before
@@ -93,16 +100,16 @@ dependencies:
- **Python:** 42 months
(`NEP-29 `_)
+- **setuptools:** 42 months (but no older than 40.4)
- **numpy:** 24 months
(`NEP-29 `_)
-- **pandas:** 12 months
-- **scipy:** 12 months
+- **dask and dask.distributed:** 12 months (but no older than 2.9)
- **sparse, pint** and other libraries that rely on
`NEP-18 `_
for integration: very latest available versions only, until the technology will have
matured. This extends to dask when used in conjunction with any of these libraries.
numpy >=1.17.
-- **all other libraries:** 6 months
+- **all other libraries:** 12 months
The above should be interpreted as *the minor version (X.Y) initially published no more
than N months ago*. Patch versions (x.y.Z) are not pinned, and only the latest available
@@ -111,10 +118,11 @@ at the moment of publishing the xarray release is guaranteed to work.
You can see the actual minimum tested versions:
- `For NEP-18 libraries
- `_
+ `_
- `For everything else
- `_
+ `_
+.. _installation-instructions:
Instructions
------------
@@ -138,6 +146,26 @@ pandas) installed first. Then, install xarray with pip::
$ pip install xarray
+We also maintain other dependency sets for different subsets of functionality::
+
+ $ pip install "xarray[io]" # Install optional dependencies for handling I/O
+ $ pip install "xarray[accel]" # Install optional dependencies for accelerating xarray
+ $ pip install "xarray[parallel]" # Install optional dependencies for dask arrays
+ $ pip install "xarray[viz]" # Install optional dependencies for visualization
+ $ pip install "xarray[complete]" # Install all the above
+
+The above commands should install most of the `optional dependencies`_. However,
+some packages which are either not listed on PyPI or require extra
+installation steps are excluded. To know which dependencies would be
+installed, take a look at the ``[options.extras_require]`` section in
+``setup.cfg``:
+
+.. literalinclude:: ../setup.cfg
+ :language: ini
+ :start-at: [options.extras_require]
+ :end-before: [options.package_data]
+
+
Testing
-------
diff --git a/doc/internals.rst b/doc/internals.rst
index a4870f2316a..60d32128c60 100644
--- a/doc/internals.rst
+++ b/doc/internals.rst
@@ -42,15 +42,49 @@ xarray objects via the (readonly) :py:attr:`Dataset.variables
` and
:py:attr:`DataArray.variable ` attributes.
+
+.. _internals.duck_arrays:
+
+Integrating with duck arrays
+----------------------------
+
+.. warning::
+
+ This is a experimental feature.
+
+xarray can wrap custom :term:`duck array` objects as long as they define numpy's
+``shape``, ``dtype`` and ``ndim`` properties and the ``__array__``,
+``__array_ufunc__`` and ``__array_function__`` methods.
+
+In certain situations (e.g. when printing the collapsed preview of
+variables of a ``Dataset``), xarray will display the repr of a :term:`duck array`
+in a single line, truncating it to a certain number of characters. If that
+would drop too much information, the :term:`duck array` may define a
+``_repr_inline_`` method that takes ``max_width`` (number of characters) as an
+argument:
+
+.. code:: python
+
+ class MyDuckArray:
+ ...
+
+ def _repr_inline_(self, max_width):
+ """ format to a single line with at most max_width characters """
+ ...
+
+ ...
+
+
Extending xarray
----------------
.. ipython:: python
- :suppress:
+ :suppress:
import numpy as np
import pandas as pd
import xarray as xr
+
np.random.seed(123456)
xarray is designed as a general purpose library, and hence tries to avoid
@@ -82,16 +116,21 @@ xarray:
.. literalinclude:: examples/_code/accessor_example.py
+In general, the only restriction on the accessor class is that the ``__init__`` method
+must have a single parameter: the ``Dataset`` or ``DataArray`` object it is supposed
+to work on.
+
This achieves the same result as if the ``Dataset`` class had a cached property
defined that returns an instance of your class:
.. code-block:: python
- class Dataset:
- ...
- @property
- def geo(self)
- return GeoAccessor(self)
+ class Dataset:
+ ...
+
+ @property
+ def geo(self):
+ return GeoAccessor(self)
However, using the register accessor decorators is preferable to simply adding
your own ad-hoc property (i.e., ``Dataset.geo = property(...)``), for several
@@ -116,14 +155,13 @@ reasons:
Back in an interactive IPython session, we can use these properties:
.. ipython:: python
- :suppress:
+ :suppress:
- exec(open("examples/_code/accessor_example.py").read())
+ exec(open("examples/_code/accessor_example.py").read())
.. ipython:: python
- ds = xr.Dataset({'longitude': np.linspace(0, 10),
- 'latitude': np.linspace(0, 20)})
+ ds = xr.Dataset({"longitude": np.linspace(0, 10), "latitude": np.linspace(0, 20)})
ds.geo.center
ds.geo.plot()
@@ -137,3 +175,59 @@ To help users keep things straight, please `let us know
`_ if you plan to write a new accessor
for an open source library. In the future, we will maintain a list of accessors
and the libraries that implement them on this page.
+
+To make documenting accessors with ``sphinx`` and ``sphinx.ext.autosummary``
+easier, you can use `sphinx-autosummary-accessors`_.
+
+.. _sphinx-autosummary-accessors: https://sphinx-autosummary-accessors.readthedocs.io/
+
+.. _zarr_encoding:
+
+Zarr Encoding Specification
+---------------------------
+
+In implementing support for the `Zarr `_ storage
+format, Xarray developers made some *ad hoc* choices about how to store
+NetCDF data in Zarr.
+Future versions of the Zarr spec will likely include a more formal convention
+for the storage of the NetCDF data model in Zarr; see
+`Zarr spec repo `_ for ongoing
+discussion.
+
+First, Xarray can only read and write Zarr groups. There is currently no support
+for reading / writting individual Zarr arrays. Zarr groups are mapped to
+Xarray ``Dataset`` objects.
+
+Second, from Xarray's point of view, the key difference between
+NetCDF and Zarr is that all NetCDF arrays have *dimension names* while Zarr
+arrays do not. Therefore, in order to store NetCDF data in Zarr, Xarray must
+somehow encode and decode the name of each array's dimensions.
+
+To accomplish this, Xarray developers decided to define a special Zarr array
+attribute: ``_ARRAY_DIMENSIONS``. The value of this attribute is a list of
+dimension names (strings), for example ``["time", "lon", "lat"]``. When writing
+data to Zarr, Xarray sets this attribute on all variables based on the variable
+dimensions. When reading a Zarr group, Xarray looks for this attribute on all
+arrays, raising an error if it can't be found. The attribute is used to define
+the variable dimension names and then removed from the attributes dictionary
+returned to the user.
+
+Because of these choices, Xarray cannot read arbitrary array data, but only
+Zarr data with valid ``_ARRAY_DIMENSIONS`` attributes on each array.
+
+After decoding the ``_ARRAY_DIMENSIONS`` attribute and assigning the variable
+dimensions, Xarray proceeds to [optionally] decode each variable using its
+standard CF decoding machinery used for NetCDF data (see :py:func:`decode_cf`).
+
+As a concrete example, here we write a tutorial dataset to Zarr and then
+re-open it directly with Zarr:
+
+.. ipython:: python
+
+ ds = xr.tutorial.load_dataset("rasm")
+ ds.to_zarr("rasm.zarr", mode="w")
+ import zarr
+
+ zgroup = zarr.open("rasm.zarr")
+ print(zgroup.tree())
+ dict(zgroup["Tair"].attrs)
diff --git a/doc/interpolation.rst b/doc/interpolation.rst
index 4cf39807e5a..9a3b7a7ee2d 100644
--- a/doc/interpolation.rst
+++ b/doc/interpolation.rst
@@ -4,11 +4,12 @@ Interpolating data
==================
.. ipython:: python
- :suppress:
+ :suppress:
import numpy as np
import pandas as pd
import xarray as xr
+
np.random.seed(123456)
xarray offers flexible interpolation routines, which have a similar interface
@@ -27,9 +28,10 @@ indexing of a :py:class:`~xarray.DataArray`,
.. ipython:: python
- da = xr.DataArray(np.sin(0.3 * np.arange(12).reshape(4, 3)),
- [('time', np.arange(4)),
- ('space', [0.1, 0.2, 0.3])])
+ da = xr.DataArray(
+ np.sin(0.3 * np.arange(12).reshape(4, 3)),
+ [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
+ )
# label lookup
da.sel(time=3)
@@ -52,20 +54,21 @@ To interpolate data with a :py:doc:`numpy.datetime64
.. ipython:: python
- da_dt64 = xr.DataArray([1, 3],
- [('time', pd.date_range('1/1/2000', '1/3/2000', periods=2))])
- da_dt64.interp(time='2000-01-02')
+ da_dt64 = xr.DataArray(
+ [1, 3], [("time", pd.date_range("1/1/2000", "1/3/2000", periods=2))]
+ )
+ da_dt64.interp(time="2000-01-02")
The interpolated data can be merged into the original :py:class:`~xarray.DataArray`
by specifying the time periods required.
.. ipython:: python
- da_dt64.interp(time=pd.date_range('1/1/2000', '1/3/2000', periods=3))
+ da_dt64.interp(time=pd.date_range("1/1/2000", "1/3/2000", periods=3))
Interpolation of data indexed by a :py:class:`~xarray.CFTimeIndex` is also
allowed. See :ref:`CFTimeIndex` for examples.
-
+
.. note::
Currently, our interpolation only works for regular grids.
@@ -108,9 +111,10 @@ different coordinates,
.. ipython:: python
- other = xr.DataArray(np.sin(0.4 * np.arange(9).reshape(3, 3)),
- [('time', [0.9, 1.9, 2.9]),
- ('space', [0.15, 0.25, 0.35])])
+ other = xr.DataArray(
+ np.sin(0.4 * np.arange(9).reshape(3, 3)),
+ [("time", [0.9, 1.9, 2.9]), ("space", [0.15, 0.25, 0.35])],
+ )
it might be a good idea to first interpolate ``da`` so that it will stay on the
same coordinates of ``other``, and then subtract it.
@@ -118,9 +122,9 @@ same coordinates of ``other``, and then subtract it.
.. ipython:: python
- # interpolate da along other's coordinates
- interpolated = da.interp_like(other)
- interpolated
+ # interpolate da along other's coordinates
+ interpolated = da.interp_like(other)
+ interpolated
It is now possible to safely compute the difference ``other - interpolated``.
@@ -135,12 +139,15 @@ The interpolation method can be specified by the optional ``method`` argument.
.. ipython:: python
- da = xr.DataArray(np.sin(np.linspace(0, 2 * np.pi, 10)), dims='x',
- coords={'x': np.linspace(0, 1, 10)})
+ da = xr.DataArray(
+ np.sin(np.linspace(0, 2 * np.pi, 10)),
+ dims="x",
+ coords={"x": np.linspace(0, 1, 10)},
+ )
- da.plot.line('o', label='original')
- da.interp(x=np.linspace(0, 1, 100)).plot.line(label='linear (default)')
- da.interp(x=np.linspace(0, 1, 100), method='cubic').plot.line(label='cubic')
+ da.plot.line("o", label="original")
+ da.interp(x=np.linspace(0, 1, 100)).plot.line(label="linear (default)")
+ da.interp(x=np.linspace(0, 1, 100), method="cubic").plot.line(label="cubic")
@savefig interpolation_sample1.png width=4in
plt.legend()
@@ -149,15 +156,16 @@ Additional keyword arguments can be passed to scipy's functions.
.. ipython:: python
# fill 0 for the outside of the original coordinates.
- da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={'fill_value': 0.0})
+ da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": 0.0})
# 1-dimensional extrapolation
- da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={'fill_value': 'extrapolate'})
+ da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": "extrapolate"})
# multi-dimensional extrapolation
- da = xr.DataArray(np.sin(0.3 * np.arange(12).reshape(4, 3)),
- [('time', np.arange(4)),
- ('space', [0.1, 0.2, 0.3])])
+ da = xr.DataArray(
+ np.sin(0.3 * np.arange(12).reshape(4, 3)),
+ [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
+ )
- da.interp(time=4, space=np.linspace(-0.1, 0.5, 10), kwargs={'fill_value': None})
+ da.interp(time=4, space=np.linspace(-0.1, 0.5, 10), kwargs={"fill_value": None})
Advanced Interpolation
@@ -181,17 +189,18 @@ For example:
.. ipython:: python
- da = xr.DataArray(np.sin(0.3 * np.arange(20).reshape(5, 4)),
- [('x', np.arange(5)),
- ('y', [0.1, 0.2, 0.3, 0.4])])
+ da = xr.DataArray(
+ np.sin(0.3 * np.arange(20).reshape(5, 4)),
+ [("x", np.arange(5)), ("y", [0.1, 0.2, 0.3, 0.4])],
+ )
# advanced indexing
- x = xr.DataArray([0, 2, 4], dims='z')
- y = xr.DataArray([0.1, 0.2, 0.3], dims='z')
+ x = xr.DataArray([0, 2, 4], dims="z")
+ y = xr.DataArray([0.1, 0.2, 0.3], dims="z")
da.sel(x=x, y=y)
# advanced interpolation
- x = xr.DataArray([0.5, 1.5, 2.5], dims='z')
- y = xr.DataArray([0.15, 0.25, 0.35], dims='z')
+ x = xr.DataArray([0.5, 1.5, 2.5], dims="z")
+ y = xr.DataArray([0.15, 0.25, 0.35], dims="z")
da.interp(x=x, y=y)
where values on the original coordinates
@@ -203,9 +212,8 @@ If you want to add a coordinate to the new dimension ``z``, you can supply
.. ipython:: python
- x = xr.DataArray([0.5, 1.5, 2.5], dims='z', coords={'z': ['a', 'b','c']})
- y = xr.DataArray([0.15, 0.25, 0.35], dims='z',
- coords={'z': ['a', 'b','c']})
+ x = xr.DataArray([0.5, 1.5, 2.5], dims="z", coords={"z": ["a", "b", "c"]})
+ y = xr.DataArray([0.15, 0.25, 0.35], dims="z", coords={"z": ["a", "b", "c"]})
da.interp(x=x, y=y)
For the details of the advanced indexing,
@@ -224,19 +232,18 @@ while other methods such as ``cubic`` or ``quadratic`` return all NaN arrays.
.. ipython:: python
- da = xr.DataArray([0, 2, np.nan, 3, 3.25], dims='x',
- coords={'x': range(5)})
+ da = xr.DataArray([0, 2, np.nan, 3, 3.25], dims="x", coords={"x": range(5)})
da.interp(x=[0.5, 1.5, 2.5])
- da.interp(x=[0.5, 1.5, 2.5], method='cubic')
+ da.interp(x=[0.5, 1.5, 2.5], method="cubic")
To avoid this, you can drop NaN by :py:meth:`~xarray.DataArray.dropna`, and
then make the interpolation
.. ipython:: python
- dropped = da.dropna('x')
+ dropped = da.dropna("x")
dropped
- dropped.interp(x=[0.5, 1.5, 2.5], method='cubic')
+ dropped.interp(x=[0.5, 1.5, 2.5], method="cubic")
If NaNs are distributed randomly in your multidimensional array,
dropping all the columns containing more than one NaNs by
@@ -246,7 +253,7 @@ which is similar to :py:meth:`pandas.Series.interpolate`.
.. ipython:: python
- filled = da.interpolate_na(dim='x')
+ filled = da.interpolate_na(dim="x")
filled
This fills NaN by interpolating along the specified dimension.
@@ -254,7 +261,7 @@ After filling NaNs, you can interpolate:
.. ipython:: python
- filled.interp(x=[0.5, 1.5, 2.5], method='cubic')
+ filled.interp(x=[0.5, 1.5, 2.5], method="cubic")
For the details of :py:meth:`~xarray.DataArray.interpolate_na`,
see :ref:`Missing values `.
@@ -268,18 +275,18 @@ Let's see how :py:meth:`~xarray.DataArray.interp` works on real data.
.. ipython:: python
# Raw data
- ds = xr.tutorial.open_dataset('air_temperature').isel(time=0)
+ ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)
fig, axes = plt.subplots(ncols=2, figsize=(10, 4))
ds.air.plot(ax=axes[0])
- axes[0].set_title('Raw data')
+ axes[0].set_title("Raw data")
# Interpolated data
- new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.dims['lon'] * 4)
- new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.dims['lat'] * 4)
+ new_lon = np.linspace(ds.lon[0], ds.lon[-1], ds.dims["lon"] * 4)
+ new_lat = np.linspace(ds.lat[0], ds.lat[-1], ds.dims["lat"] * 4)
dsi = ds.interp(lat=new_lat, lon=new_lon)
dsi.air.plot(ax=axes[1])
@savefig interpolation_sample3.png width=8in
- axes[1].set_title('Interpolated data')
+ axes[1].set_title("Interpolated data")
Our advanced interpolation can be used to remap the data to the new coordinate.
Consider the new coordinates x and z on the two dimensional plane.
@@ -291,20 +298,23 @@ The remapping can be done as follows
x = np.linspace(240, 300, 100)
z = np.linspace(20, 70, 100)
# relation between new and original coordinates
- lat = xr.DataArray(z, dims=['z'], coords={'z': z})
- lon = xr.DataArray((x[:, np.newaxis]-270)/np.cos(z*np.pi/180)+270,
- dims=['x', 'z'], coords={'x': x, 'z': z})
+ lat = xr.DataArray(z, dims=["z"], coords={"z": z})
+ lon = xr.DataArray(
+ (x[:, np.newaxis] - 270) / np.cos(z * np.pi / 180) + 270,
+ dims=["x", "z"],
+ coords={"x": x, "z": z},
+ )
fig, axes = plt.subplots(ncols=2, figsize=(10, 4))
ds.air.plot(ax=axes[0])
# draw the new coordinate on the original coordinates.
for idx in [0, 33, 66, 99]:
- axes[0].plot(lon.isel(x=idx), lat, '--k')
+ axes[0].plot(lon.isel(x=idx), lat, "--k")
for idx in [0, 33, 66, 99]:
- axes[0].plot(*xr.broadcast(lon.isel(z=idx), lat.isel(z=idx)), '--k')
- axes[0].set_title('Raw data')
+ axes[0].plot(*xr.broadcast(lon.isel(z=idx), lat.isel(z=idx)), "--k")
+ axes[0].set_title("Raw data")
dsi = ds.interp(lon=lon, lat=lat)
dsi.air.plot(ax=axes[1])
@savefig interpolation_sample4.png width=8in
- axes[1].set_title('Remapped data')
+ axes[1].set_title("Remapped data")
diff --git a/doc/io.rst b/doc/io.rst
index 0c666099df8..2e46879929b 100644
--- a/doc/io.rst
+++ b/doc/io.rst
@@ -9,11 +9,12 @@ simple :ref:`io.pickle` files to the more flexible :ref:`io.netcdf`
format (recommended).
.. ipython:: python
- :suppress:
+ :suppress:
import numpy as np
import pandas as pd
import xarray as xr
+
np.random.seed(123456)
.. _io.netcdf:
@@ -25,7 +26,7 @@ The recommended way to store xarray data structures is `netCDF`__, which
is a binary file format for self-described datasets that originated
in the geosciences. xarray is based on the netCDF data model, so netCDF files
on disk directly correspond to :py:class:`Dataset` objects (more accurately,
-a group in a netCDF file directly corresponds to a to :py:class:`Dataset` object.
+a group in a netCDF file directly corresponds to a :py:class:`Dataset` object.
See :ref:`io.netcdf_groups` for more.)
NetCDF is supported on almost all platforms, and parsers exist
@@ -42,7 +43,7 @@ __ http://www.unidata.ucar.edu/software/netcdf/
.. _netCDF FAQ: http://www.unidata.ucar.edu/software/netcdf/docs/faq.html#What-Is-netCDF
Reading and writing netCDF files with xarray requires scipy or the
-`netCDF4-Python`__ library to be installed (the later is required to
+`netCDF4-Python`__ library to be installed (the latter is required to
read/write netCDF V4 files and use the compression options described below).
__ https://github.com/Unidata/netcdf4-python
@@ -52,12 +53,16 @@ We can save a Dataset to disk using the
.. ipython:: python
- ds = xr.Dataset({'foo': (('x', 'y'), np.random.rand(4, 5))},
- coords={'x': [10, 20, 30, 40],
- 'y': pd.date_range('2000-01-01', periods=5),
- 'z': ('x', list('abcd'))})
+ ds = xr.Dataset(
+ {"foo": (("x", "y"), np.random.rand(4, 5))},
+ coords={
+ "x": [10, 20, 30, 40],
+ "y": pd.date_range("2000-01-01", periods=5),
+ "z": ("x", list("abcd")),
+ },
+ )
- ds.to_netcdf('saved_on_disk.nc')
+ ds.to_netcdf("saved_on_disk.nc")
By default, the file is saved as netCDF4 (assuming netCDF4-Python is
installed). You can control the format and engine used to write the file with
@@ -76,7 +81,7 @@ We can load netCDF files to create a new Dataset using
.. ipython:: python
- ds_disk = xr.open_dataset('saved_on_disk.nc')
+ ds_disk = xr.open_dataset("saved_on_disk.nc")
ds_disk
Similarly, a DataArray can be saved to disk using the
@@ -100,6 +105,12 @@ Dataset and DataArray objects, and no array values are loaded into memory until
you try to perform some sort of actual computation. For an example of how these
lazy arrays work, see the OPeNDAP section below.
+There may be minor differences in the :py:class:`Dataset` object returned
+when reading a NetCDF file with different engines. For example,
+single-valued attributes are returned as scalars by the default
+``engine=netcdf4``, but as arrays of size ``(1,)`` when reading with
+``engine=h5netcdf``.
+
It is important to note that when you modify values of a Dataset, even one
linked to files on disk, only the in-memory copy you are manipulating in xarray
is modified: the original file on disk is never touched.
@@ -117,7 +128,7 @@ netCDF file. However, it's often cleaner to use a ``with`` statement:
.. ipython:: python
# this automatically closes the dataset after use
- with xr.open_dataset('saved_on_disk.nc') as ds:
+ with xr.open_dataset("saved_on_disk.nc") as ds:
print(ds.keys())
Although xarray provides reasonable support for incremental reads of files on
@@ -171,7 +182,7 @@ You can view this encoding information (among others) in the
.. ipython::
:verbatim:
- In [1]: ds_disk['y'].encoding
+ In [1]: ds_disk["y"].encoding
Out[1]:
{'zlib': False,
'shuffle': False,
@@ -230,7 +241,7 @@ See its docstring for more details.
.. note::
A common use-case involves a dataset distributed across a large number of files with
- each file containing a large number of variables. Commonly a few of these variables
+ each file containing a large number of variables. Commonly, a few of these variables
need to be concatenated along a dimension (say ``"time"``), while the rest are equal
across the datasets (ignoring floating point differences). The following command
with suitable modifications (such as ``parallel=True``) works well with such datasets::
@@ -287,8 +298,8 @@ library::
combined = read_netcdfs('/all/my/files/*.nc', dim='time')
This function will work in many cases, but it's not very robust. First, it
-never closes files, which means it will fail one you need to load more than
-a few thousands file. Second, it assumes that you want all the data from each
+never closes files, which means it will fail if you need to load more than
+a few thousand files. Second, it assumes that you want all the data from each
file and that it can all fit into memory. In many situations, you only need
a small subset or an aggregated summary of the data from each file.
@@ -340,7 +351,7 @@ default encoding, or the options in the ``encoding`` attribute, if set.
This works perfectly fine in most cases, but encoding can be useful for
additional control, especially for enabling compression.
-In the file on disk, these encodings as saved as attributes on each variable, which
+In the file on disk, these encodings are saved as attributes on each variable, which
allow xarray and other CF-compliant tools for working with netCDF files to correctly
read the data.
@@ -353,7 +364,7 @@ These encoding options work on any version of the netCDF file format:
or ``'float32'``. This controls the type of the data written on disk.
- ``_FillValue``: Values of ``NaN`` in xarray variables are remapped to this value when
saved on disk. This is important when converting floating point with missing values
- to integers on disk, because ``NaN`` is not a valid value for integer dtypes. As a
+ to integers on disk, because ``NaN`` is not a valid value for integer dtypes. By
default, variables with float types are attributed a ``_FillValue`` of ``NaN`` in the
output file, unless explicitly disabled with an encoding ``{'_FillValue': None}``.
- ``scale_factor`` and ``add_offset``: Used to convert from encoded data on disk to
@@ -395,8 +406,8 @@ If character arrays are used:
by setting the ``_Encoding`` field in ``encoding``. But
`we don't recommend it `_.
- The character dimension name can be specifed by the ``char_dim_name`` field of a variable's
- ``encoding``. If this is not specified the default name for the character dimension is
- ``'string%s' % data.shape[-1]``. When decoding character arrays from existing files, the
+ ``encoding``. If the name of the character dimension is not specified, the default is
+ ``f'string{data.shape[-1]}'``. When decoding character arrays from existing files, the
``char_dim_name`` is added to the variables ``encoding`` to preserve if encoding happens, but
the field can be edited by the user.
@@ -458,7 +469,7 @@ This is not CF-compliant but again facilitates roundtripping of xarray datasets.
Invalid netCDF files
~~~~~~~~~~~~~~~~~~~~
-The library ``h5netcdf`` allows writing some dtypes (booleans, complex, ...) that aren't
+The library ``h5netcdf`` allows writing some dtypes (booleans, complex, ...) that aren't
allowed in netCDF4 (see
`h5netcdf documentation `_).
This feature is availabe through :py:meth:`DataArray.to_netcdf` and
@@ -469,7 +480,7 @@ and currently raises a warning unless ``invalid_netcdf=True`` is set:
:okwarning:
# Writing complex valued data
- da = xr.DataArray([1.+1.j, 2.+2.j, 3.+3.j])
+ da = xr.DataArray([1.0 + 1.0j, 2.0 + 2.0j, 3.0 + 3.0j])
da.to_netcdf("complex.nc", engine="h5netcdf", invalid_netcdf=True)
# Reading it back
@@ -479,7 +490,8 @@ and currently raises a warning unless ``invalid_netcdf=True`` is set:
:suppress:
import os
- os.remove('complex.nc')
+
+ os.remove("complex.nc")
.. warning::
@@ -494,14 +506,16 @@ Iris
The Iris_ tool allows easy reading of common meteorological and climate model formats
(including GRIB and UK MetOffice PP files) into ``Cube`` objects which are in many ways very
similar to ``DataArray`` objects, while enforcing a CF-compliant data model. If iris is
-installed xarray can convert a ``DataArray`` into a ``Cube`` using
+installed, xarray can convert a ``DataArray`` into a ``Cube`` using
:py:meth:`DataArray.to_iris`:
.. ipython:: python
- da = xr.DataArray(np.random.rand(4, 5), dims=['x', 'y'],
- coords=dict(x=[10, 20, 30, 40],
- y=pd.date_range('2000-01-01', periods=5)))
+ da = xr.DataArray(
+ np.random.rand(4, 5),
+ dims=["x", "y"],
+ coords=dict(x=[10, 20, 30, 40], y=pd.date_range("2000-01-01", periods=5)),
+ )
cube = da.to_iris()
cube
@@ -548,8 +562,9 @@ __ http://iri.columbia.edu/
:verbatim:
In [3]: remote_data = xr.open_dataset(
- ...: 'http://iridl.ldeo.columbia.edu/SOURCES/.OSU/.PRISM/.monthly/dods',
- ...: decode_times=False)
+ ...: "http://iridl.ldeo.columbia.edu/SOURCES/.OSU/.PRISM/.monthly/dods",
+ ...: decode_times=False,
+ ...: )
In [4]: remote_data
Out[4]:
@@ -587,7 +602,7 @@ over the network until we look at particular values:
.. ipython::
:verbatim:
- In [4]: tmax = remote_data['tmax'][:500, ::3, ::3]
+ In [4]: tmax = remote_data["tmax"][:500, ::3, ::3]
In [5]: tmax
Out[5]:
@@ -701,7 +716,7 @@ require external libraries and dicts can easily be pickled, or converted to
json, or geojson. All the values are converted to lists, so dicts might
be quite large.
-To export just the dataset schema, without the data itself, use the
+To export just the dataset schema without the data itself, use the
``data=False`` option:
.. ipython:: python
@@ -715,7 +730,8 @@ search indices or other automated data discovery tools.
:suppress:
import os
- os.remove('saved_on_disk.nc')
+
+ os.remove("saved_on_disk.nc")
.. _io.rasterio:
@@ -729,7 +745,7 @@ rasterio is installed. Here is an example of how to use
.. ipython::
:verbatim:
- In [7]: rio = xr.open_rasterio('RGB.byte.tif')
+ In [7]: rio = xr.open_rasterio("RGB.byte.tif")
In [8]: rio
Out[8]:
@@ -756,7 +772,7 @@ for an example of how to convert these to longitudes and latitudes.
.. warning::
This feature has been added in xarray v0.9.6 and should still be
- considered as being experimental. Please report any bug you may find
+ considered experimental. Please report any bugs you may find
on xarray's github repository.
@@ -769,7 +785,7 @@ GDAL readable raster data using `rasterio`_ as well as for exporting to a geoTIF
In [1]: import rioxarray
- In [2]: rds = rioxarray.open_rasterio('RGB.byte.tif')
+ In [2]: rds = rioxarray.open_rasterio("RGB.byte.tif")
In [3]: rds
Out[3]:
@@ -794,12 +810,12 @@ GDAL readable raster data using `rasterio`_ as well as for exporting to a geoTIF
In [4]: rds.rio.crs
Out[4]: CRS.from_epsg(32618)
- In [5]: rds4326 = rio.rio.reproject("epsg:4326")
+ In [5]: rds4326 = rds.rio.reproject("epsg:4326")
In [6]: rds4326.rio.crs
Out[6]: CRS.from_epsg(4326)
- In [7]: rds4326.rio.to_raster('RGB.byte.4326.tif')
+ In [7]: rds4326.rio.to_raster("RGB.byte.4326.tif")
.. _rasterio: https://rasterio.readthedocs.io/en/latest/
@@ -812,12 +828,14 @@ GDAL readable raster data using `rasterio`_ as well as for exporting to a geoTIF
Zarr
----
-`Zarr`_ is a Python package providing an implementation of chunked, compressed,
+`Zarr`_ is a Python package that provides an implementation of chunked, compressed,
N-dimensional arrays.
Zarr has the ability to store arrays in a range of ways, including in memory,
in files, and in cloud-based object storage such as `Amazon S3`_ and
`Google Cloud Storage`_.
-Xarray's Zarr backend allows xarray to leverage these capabilities.
+Xarray's Zarr backend allows xarray to leverage these capabilities, including
+the ability to store and analyze datasets far too large fit onto disk
+(particularly :ref:`in combination with dask `).
.. warning::
@@ -827,58 +845,44 @@ Xarray's Zarr backend allows xarray to leverage these capabilities.
Xarray can't open just any zarr dataset, because xarray requires special
metadata (attributes) describing the dataset dimensions and coordinates.
At this time, xarray can only open zarr datasets that have been written by
-xarray. To write a dataset with zarr, we use the :py:attr:`Dataset.to_zarr` method.
-To write to a local directory, we pass a path to a directory
+xarray. For implementation details, see :ref:`zarr_encoding`.
+
+To write a dataset with zarr, we use the :py:meth:`Dataset.to_zarr` method.
+
+To write to a local directory, we pass a path to a directory:
.. ipython:: python
- :suppress:
+ :suppress:
! rm -rf path/to/directory.zarr
.. ipython:: python
- ds = xr.Dataset({'foo': (('x', 'y'), np.random.rand(4, 5))},
- coords={'x': [10, 20, 30, 40],
- 'y': pd.date_range('2000-01-01', periods=5),
- 'z': ('x', list('abcd'))})
- ds.to_zarr('path/to/directory.zarr')
+ ds = xr.Dataset(
+ {"foo": (("x", "y"), np.random.rand(4, 5))},
+ coords={
+ "x": [10, 20, 30, 40],
+ "y": pd.date_range("2000-01-01", periods=5),
+ "z": ("x", list("abcd")),
+ },
+ )
+ ds.to_zarr("path/to/directory.zarr")
(The suffix ``.zarr`` is optional--just a reminder that a zarr store lives
there.) If the directory does not exist, it will be created. If a zarr
store is already present at that path, an error will be raised, preventing it
from being overwritten. To override this behavior and overwrite an existing
-store, add ``mode='w'`` when invoking ``to_zarr``.
-
-It is also possible to append to an existing store. For that, set
-``append_dim`` to the name of the dimension along which to append. ``mode``
-can be omitted as it will internally be set to ``'a'``.
-
-.. ipython:: python
- :suppress:
-
- ! rm -rf path/to/directory.zarr
-
-.. ipython:: python
+store, add ``mode='w'`` when invoking :py:meth:`~Dataset.to_zarr`.
- ds1 = xr.Dataset({'foo': (('x', 'y', 't'), np.random.rand(4, 5, 2))},
- coords={'x': [10, 20, 30, 40],
- 'y': [1,2,3,4,5],
- 't': pd.date_range('2001-01-01', periods=2)})
- ds1.to_zarr('path/to/directory.zarr')
- ds2 = xr.Dataset({'foo': (('x', 'y', 't'), np.random.rand(4, 5, 2))},
- coords={'x': [10, 20, 30, 40],
- 'y': [1,2,3,4,5],
- 't': pd.date_range('2001-01-03', periods=2)})
- ds2.to_zarr('path/to/directory.zarr', append_dim='t')
-
-To store variable length strings use ``dtype=object``.
+To store variable length strings, convert them to object arrays first with
+``dtype=object``.
To read back a zarr dataset that has been created this way, we use the
:py:func:`open_zarr` method:
.. ipython:: python
- ds_zarr = xr.open_zarr('path/to/directory.zarr')
+ ds_zarr = xr.open_zarr("path/to/directory.zarr")
ds_zarr
Cloud Storage Buckets
@@ -912,15 +916,16 @@ These options can be passed to the ``to_zarr`` method as variable encoding.
For example:
.. ipython:: python
- :suppress:
+ :suppress:
! rm -rf foo.zarr
.. ipython:: python
import zarr
- compressor = zarr.Blosc(cname='zstd', clevel=3, shuffle=2)
- ds.to_zarr('foo.zarr', encoding={'foo': {'compressor': compressor}})
+
+ compressor = zarr.Blosc(cname="zstd", clevel=3, shuffle=2)
+ ds.to_zarr("foo.zarr", encoding={"foo": {"compressor": compressor}})
.. note::
@@ -956,34 +961,137 @@ Xarray can't perform consolidation on pre-existing zarr datasets. This should
be done directly from zarr, as described in the
`zarr docs `_.
+.. _io.zarr.appending:
+
+Appending to existing Zarr stores
+~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
+
+Xarray supports several ways of incrementally writing variables to a Zarr
+store. These options are useful for scenarios when it is infeasible or
+undesirable to write your entire dataset at once.
+
+.. tip::
+
+ If you can load all of your data into a single ``Dataset`` using dask, a
+ single call to ``to_zarr()`` will write all of your data in parallel.
+
+.. warning::
+
+ Alignment of coordinates is currently not checked when modifying an
+ existing Zarr store. It is up to the user to ensure that coordinates are
+ consistent.
+
+To add or overwrite entire variables, simply call :py:meth:`~Dataset.to_zarr`
+with ``mode='a'`` on a Dataset containing the new variables, passing in an
+existing Zarr store or path to a Zarr store.
+
+To resize and then append values along an existing dimension in a store, set
+``append_dim``. This is a good option if data always arives in a particular
+order, e.g., for time-stepping a simulation:
+
+.. ipython:: python
+ :suppress:
+
+ ! rm -rf path/to/directory.zarr
+
+.. ipython:: python
+
+ ds1 = xr.Dataset(
+ {"foo": (("x", "y", "t"), np.random.rand(4, 5, 2))},
+ coords={
+ "x": [10, 20, 30, 40],
+ "y": [1, 2, 3, 4, 5],
+ "t": pd.date_range("2001-01-01", periods=2),
+ },
+ )
+ ds1.to_zarr("path/to/directory.zarr")
+ ds2 = xr.Dataset(
+ {"foo": (("x", "y", "t"), np.random.rand(4, 5, 2))},
+ coords={
+ "x": [10, 20, 30, 40],
+ "y": [1, 2, 3, 4, 5],
+ "t": pd.date_range("2001-01-03", periods=2),
+ },
+ )
+ ds2.to_zarr("path/to/directory.zarr", append_dim="t")
+
+Finally, you can use ``region`` to write to limited regions of existing arrays
+in an existing Zarr store. This is a good option for writing data in parallel
+from independent processes.
+
+To scale this up to writing large datasets, the first step is creating an
+initial Zarr store without writing all of its array data. This can be done by
+first creating a ``Dataset`` with dummy values stored in :ref:`dask `,
+and then calling ``to_zarr`` with ``compute=False`` to write only metadata
+(including ``attrs``) to Zarr:
+
+.. ipython:: python
+ :suppress:
+
+ ! rm -rf path/to/directory.zarr
+
+.. ipython:: python
+
+ import dask.array
+
+ # The values of this dask array are entirely irrelevant; only the dtype,
+ # shape and chunks are used
+ dummies = dask.array.zeros(30, chunks=10)
+ ds = xr.Dataset({"foo": ("x", dummies)})
+ path = "path/to/directory.zarr"
+ # Now we write the metadata without computing any array values
+ ds.to_zarr(path, compute=False, consolidated=True)
+
+Now, a Zarr store with the correct variable shapes and attributes exists that
+can be filled out by subsequent calls to ``to_zarr``. The ``region`` provides a
+mapping from dimension names to Python ``slice`` objects indicating where the
+data should be written (in index space, not coordinate space), e.g.,
+
+.. ipython:: python
+
+ # For convenience, we'll slice a single dataset, but in the real use-case
+ # we would create them separately, possibly even from separate processes.
+ ds = xr.Dataset({"foo": ("x", np.arange(30))})
+ ds.isel(x=slice(0, 10)).to_zarr(path, region={"x": slice(0, 10)})
+ ds.isel(x=slice(10, 20)).to_zarr(path, region={"x": slice(10, 20)})
+ ds.isel(x=slice(20, 30)).to_zarr(path, region={"x": slice(20, 30)})
+
+Concurrent writes with ``region`` are safe as long as they modify distinct
+chunks in the underlying Zarr arrays (or use an appropriate ``lock``).
+
+As a safety check to make it harder to inadvertently override existing values,
+if you set ``region`` then *all* variables included in a Dataset must have
+dimensions included in ``region``. Other variables (typically coordinates)
+need to be explicitly dropped and/or written in a separate calls to ``to_zarr``
+with ``mode='a'``.
+
.. _io.cfgrib:
.. ipython:: python
- :suppress:
+ :suppress:
import shutil
- shutil.rmtree('foo.zarr')
- shutil.rmtree('path/to/directory.zarr')
+
+ shutil.rmtree("foo.zarr")
+ shutil.rmtree("path/to/directory.zarr")
GRIB format via cfgrib
----------------------
-xarray supports reading GRIB files via ECMWF cfgrib_ python driver and ecCodes_
-C-library, if they are installed. To open a GRIB file supply ``engine='cfgrib'``
+xarray supports reading GRIB files via ECMWF cfgrib_ python driver,
+if it is installed. To open a GRIB file supply ``engine='cfgrib'``
to :py:func:`open_dataset`:
.. ipython::
:verbatim:
- In [1]: ds_grib = xr.open_dataset('example.grib', engine='cfgrib')
+ In [1]: ds_grib = xr.open_dataset("example.grib", engine="cfgrib")
-We recommend installing ecCodes via conda::
+We recommend installing cfgrib via conda::
- conda install -c conda-forge eccodes
- pip install cfgrib
+ conda install -c conda-forge cfgrib
.. _cfgrib: https://github.com/ecmwf/cfgrib
-.. _ecCodes: https://confluence.ecmwf.int/display/ECC/ecCodes+Home
.. _io.pynio:
@@ -998,6 +1106,11 @@ We recommend installing PyNIO via conda::
conda install -c conda-forge pynio
+ .. note::
+
+ PyNIO is no longer actively maintained and conflicts with netcdf4 > 1.5.3.
+ The PyNIO backend may be moved outside of xarray in the future.
+
.. _PyNIO: https://www.pyngl.ucar.edu/Nio.shtml
.. _io.PseudoNetCDF:
@@ -1010,7 +1123,7 @@ formats supported by PseudoNetCDF_, if PseudoNetCDF is installed.
PseudoNetCDF can also provide Climate Forecasting Conventions to
CMAQ files. In addition, PseudoNetCDF can automatically register custom
readers that subclass PseudoNetCDF.PseudoNetCDFFile. PseudoNetCDF can
-identify readers heuristically, or format can be specified via a key in
+identify readers either heuristically, or by a format specified via a key in
`backend_kwargs`.
To use PseudoNetCDF to read such files, supply
@@ -1032,3 +1145,11 @@ For CSV files, one might also consider `xarray_extras`_.
.. _xarray_extras: https://xarray-extras.readthedocs.io/en/latest/api/csv.html
.. _IO tools: http://pandas.pydata.org/pandas-docs/stable/io.html
+
+
+Third party libraries
+---------------------
+
+More formats are supported by extension libraries:
+
+- `xarray-mongodb `_: Store xarray objects on MongoDB
diff --git a/doc/pandas.rst b/doc/pandas.rst
index b0ec2a117dc..acf1d16b6ee 100644
--- a/doc/pandas.rst
+++ b/doc/pandas.rst
@@ -20,6 +20,7 @@ __ http://seaborn.pydata.org/
import numpy as np
import pandas as pd
import xarray as xr
+
np.random.seed(123456)
Hierarchical and tidy data
@@ -47,10 +48,15 @@ To convert any dataset to a ``DataFrame`` in tidy form, use the
.. ipython:: python
- ds = xr.Dataset({'foo': (('x', 'y'), np.random.randn(2, 3))},
- coords={'x': [10, 20], 'y': ['a', 'b', 'c'],
- 'along_x': ('x', np.random.randn(2)),
- 'scalar': 123})
+ ds = xr.Dataset(
+ {"foo": (("x", "y"), np.random.randn(2, 3))},
+ coords={
+ "x": [10, 20],
+ "y": ["a", "b", "c"],
+ "along_x": ("x", np.random.randn(2)),
+ "scalar": 123,
+ },
+ )
ds
df = ds.to_dataframe()
df
@@ -91,7 +97,7 @@ DataFrames:
.. ipython:: python
- s = ds['foo'].to_series()
+ s = ds["foo"].to_series()
s
# or equivalently, with Series.to_xarray()
xr.DataArray.from_series(s)
@@ -117,8 +123,9 @@ available in pandas (i.e., a 1D array is converted to a
.. ipython:: python
- arr = xr.DataArray(np.random.randn(2, 3),
- coords=[('x', [10, 20]), ('y', ['a', 'b', 'c'])])
+ arr = xr.DataArray(
+ np.random.randn(2, 3), coords=[("x", [10, 20]), ("y", ["a", "b", "c"])]
+ )
df = arr.to_pandas()
df
@@ -136,9 +143,10 @@ preserve all use of multi-indexes:
.. ipython:: python
- index = pd.MultiIndex.from_arrays([['a', 'a', 'b'], [0, 1, 2]],
- names=['one', 'two'])
- df = pd.DataFrame({'x': 1, 'y': 2}, index=index)
+ index = pd.MultiIndex.from_arrays(
+ [["a", "a", "b"], [0, 1, 2]], names=["one", "two"]
+ )
+ df = pd.DataFrame({"x": 1, "y": 2}, index=index)
ds = xr.Dataset(df)
ds
@@ -175,9 +183,9 @@ Let's take a look:
.. ipython:: python
data = np.random.RandomState(0).rand(2, 3, 4)
- items = list('ab')
- major_axis = list('mno')
- minor_axis = pd.date_range(start='2000', periods=4, name='date')
+ items = list("ab")
+ major_axis = list("mno")
+ minor_axis = pd.date_range(start="2000", periods=4, name="date")
With old versions of pandas (prior to 0.25), this could stored in a ``Panel``:
@@ -207,7 +215,7 @@ You can also easily convert this data into ``Dataset``:
.. ipython:: python
- array.to_dataset(dim='dim_0')
+ array.to_dataset(dim="dim_0")
Here, there are two data variables, each representing a DataFrame on panel's
``items`` axis, and labeled as such. Each variable is a 2D array of the
diff --git a/doc/plotting.rst b/doc/plotting.rst
index f3d9c0213de..3699f794ae8 100644
--- a/doc/plotting.rst
+++ b/doc/plotting.rst
@@ -13,7 +13,7 @@ labels can also be used to easily create informative plots.
xarray's plotting capabilities are centered around
:py:class:`DataArray` objects.
To plot :py:class:`Dataset` objects
-simply access the relevant DataArrays, ie ``dset['var1']``.
+simply access the relevant DataArrays, i.e. ``dset['var1']``.
Dataset specific plotting routines are also available (see :ref:`plot-dataset`).
Here we focus mostly on arrays 2d or larger. If your data fits
nicely into a pandas DataFrame then you're better off using one of the more
@@ -37,7 +37,7 @@ For more extensive plotting applications consider the following projects:
Integrates well with pandas.
- `HoloViews `_
- and `GeoViews `_: "Composable, declarative
+ and `GeoViews `_: "Composable, declarative
data structures for building even complex visualizations easily." Includes
native support for xarray objects.
@@ -56,6 +56,7 @@ Imports
# Use defaults so we don't get gridlines in generated docs
import matplotlib as mpl
+
mpl.rcdefaults()
The following imports are necessary for all of the examples.
@@ -71,7 +72,7 @@ For these examples we'll use the North American air temperature dataset.
.. ipython:: python
- airtemps = xr.tutorial.open_dataset('air_temperature')
+ airtemps = xr.tutorial.open_dataset("air_temperature")
airtemps
# Convert to celsius
@@ -79,7 +80,7 @@ For these examples we'll use the North American air temperature dataset.
# copy attributes to get nice figure labels and change Kelvin to Celsius
air.attrs = airtemps.air.attrs
- air.attrs['units'] = 'deg C'
+ air.attrs["units"] = "deg C"
.. note::
Until :issue:`1614` is solved, you might need to copy over the metadata in ``attrs`` to get informative figure labels (as was done above).
@@ -98,13 +99,14 @@ One Dimension
The simplest way to make a plot is to call the :py:func:`DataArray.plot()` method.
.. ipython:: python
+ :okwarning:
air1d = air.isel(lat=10, lon=10)
@savefig plotting_1d_simple.png width=4in
air1d.plot()
-xarray uses the coordinate name along with metadata ``attrs.long_name``, ``attrs.standard_name``, ``DataArray.name`` and ``attrs.units`` (if available) to label the axes. The names ``long_name``, ``standard_name`` and ``units`` are copied from the `CF-conventions spec `_. When choosing names, the order of precedence is ``long_name``, ``standard_name`` and finally ``DataArray.name``. The y-axis label in the above plot was constructed from the ``long_name`` and ``units`` attributes of ``air1d``.
+xarray uses the coordinate name along with metadata ``attrs.long_name``, ``attrs.standard_name``, ``DataArray.name`` and ``attrs.units`` (if available) to label the axes. The names ``long_name``, ``standard_name`` and ``units`` are copied from the `CF-conventions spec `_. When choosing names, the order of precedence is ``long_name``, ``standard_name`` and finally ``DataArray.name``. The y-axis label in the above plot was constructed from the ``long_name`` and ``units`` attributes of ``air1d``.
.. ipython:: python
@@ -124,9 +126,10 @@ can be used:
.. _matplotlib.pyplot.plot: http://matplotlib.org/api/pyplot_api.html#matplotlib.pyplot.plot
.. ipython:: python
+ :okwarning:
@savefig plotting_1d_additional_args.png width=4in
- air1d[:200].plot.line('b-^')
+ air1d[:200].plot.line("b-^")
.. note::
Not all xarray plotting methods support passing positional arguments
@@ -136,9 +139,10 @@ can be used:
Keyword arguments work the same way, and are more explicit.
.. ipython:: python
+ :okwarning:
@savefig plotting_example_sin3.png width=4in
- air1d[:200].plot.line(color='purple', marker='o')
+ air1d[:200].plot.line(color="purple", marker="o")
=========================
Adding to Existing Axis
@@ -150,6 +154,7 @@ In this example ``axes`` is an array consisting of the left and right
axes created by ``plt.subplots``.
.. ipython:: python
+ :okwarning:
fig, axes = plt.subplots(ncols=2)
@@ -177,6 +182,7 @@ support the ``aspect`` and ``size`` arguments which control the size of the
resulting image via the formula ``figsize = (aspect * size, size)``:
.. ipython:: python
+ :okwarning:
air1d.plot(aspect=2, size=3)
@savefig plotting_example_size_and_aspect.png
@@ -208,6 +214,48 @@ entire figure (as for matplotlib's ``figsize`` argument).
.. _plotting.multiplelines:
+=========================
+ Determine x-axis values
+=========================
+
+Per default dimension coordinates are used for the x-axis (here the time coordinates).
+However, you can also use non-dimension coordinates, MultiIndex levels, and dimensions
+without coordinates along the x-axis. To illustrate this, let's calculate a 'decimal day' (epoch)
+from the time and assign it as a non-dimension coordinate:
+
+.. ipython:: python
+ :okwarning:
+
+ decimal_day = (air1d.time - air1d.time[0]) / pd.Timedelta("1d")
+ air1d_multi = air1d.assign_coords(decimal_day=("time", decimal_day))
+ air1d_multi
+
+To use ``'decimal_day'`` as x coordinate it must be explicitly specified:
+
+.. ipython:: python
+ :okwarning:
+
+ air1d_multi.plot(x="decimal_day")
+
+Creating a new MultiIndex named ``'date'`` from ``'time'`` and ``'decimal_day'``,
+it is also possible to use a MultiIndex level as x-axis:
+
+.. ipython:: python
+ :okwarning:
+
+ air1d_multi = air1d_multi.set_index(date=("time", "decimal_day"))
+ air1d_multi.plot(x="decimal_day")
+
+Finally, if a dataset does not have any coordinates it enumerates all data points:
+
+.. ipython:: python
+ :okwarning:
+
+ air1d_multi = air1d_multi.drop("date")
+ air1d_multi.plot()
+
+The same applies to 2D plots below.
+
====================================================
Multiple lines showing variation along a dimension
====================================================
@@ -217,9 +265,10 @@ with appropriate arguments. Consider the 3D variable ``air`` defined above. We c
plots to check the variation of air temperature at three different latitudes along a longitude line:
.. ipython:: python
+ :okwarning:
@savefig plotting_example_multiple_lines_x_kwarg.png
- air.isel(lon=10, lat=[19,21,22]).plot.line(x='time')
+ air.isel(lon=10, lat=[19, 21, 22]).plot.line(x="time")
It is required to explicitly specify either
@@ -238,9 +287,10 @@ If required, the automatic legend can be turned off using ``add_legend=False``.
It is also possible to make line plots such that the data are on the x-axis and a dimension is on the y-axis. This can be done by specifying the appropriate ``y`` keyword argument.
.. ipython:: python
+ :okwarning:
@savefig plotting_example_xy_kwarg.png
- air.isel(time=10, lon=[10, 11]).plot(y='lat', hue='lon')
+ air.isel(time=10, lon=[10, 11]).plot(y="lat", hue="lon")
============
Step plots
@@ -253,23 +303,24 @@ made using 1D data.
:okwarning:
@savefig plotting_example_step.png width=4in
- air1d[:20].plot.step(where='mid')
+ air1d[:20].plot.step(where="mid")
The argument ``where`` defines where the steps should be placed, options are
``'pre'`` (default), ``'post'``, and ``'mid'``. This is particularly handy
when plotting data grouped with :py:meth:`Dataset.groupby_bins`.
.. ipython:: python
+ :okwarning:
- air_grp = air.mean(['time','lon']).groupby_bins('lat',[0,23.5,66.5,90])
+ air_grp = air.mean(["time", "lon"]).groupby_bins("lat", [0, 23.5, 66.5, 90])
air_mean = air_grp.mean()
air_std = air_grp.std()
air_mean.plot.step()
- (air_mean + air_std).plot.step(ls=':')
- (air_mean - air_std).plot.step(ls=':')
- plt.ylim(-20,30)
+ (air_mean + air_std).plot.step(ls=":")
+ (air_mean - air_std).plot.step(ls=":")
+ plt.ylim(-20, 30)
@savefig plotting_example_step_groupby.png width=4in
- plt.title('Zonal mean temperature')
+ plt.title("Zonal mean temperature")
In this case, the actual boundaries of the bins are used and the ``where`` argument
is ignored.
@@ -282,9 +333,12 @@ Other axes kwargs
The keyword arguments ``xincrease`` and ``yincrease`` let you control the axes direction.
.. ipython:: python
+ :okwarning:
@savefig plotting_example_xincrease_yincrease_kwarg.png
- air.isel(time=10, lon=[10, 11]).plot.line(y='lat', hue='lon', xincrease=False, yincrease=False)
+ air.isel(time=10, lon=[10, 11]).plot.line(
+ y="lat", hue="lon", xincrease=False, yincrease=False
+ )
In addition, one can use ``xscale, yscale`` to set axes scaling; ``xticks, yticks`` to set axes ticks and ``xlim, ylim`` to set axes limits. These accept the same values as the matplotlib methods ``Axes.set_(x,y)scale()``, ``Axes.set_(x,y)ticks()``, ``Axes.set_(x,y)lim()`` respectively.
@@ -299,6 +353,7 @@ Two Dimensions
The default method :py:meth:`DataArray.plot` calls :py:func:`xarray.plot.pcolormesh` by default when the data is two-dimensional.
.. ipython:: python
+ :okwarning:
air2d = air.isel(time=500)
@@ -309,6 +364,7 @@ All 2d plots in xarray allow the use of the keyword arguments ``yincrease``
and ``xincrease``.
.. ipython:: python
+ :okwarning:
@savefig 2d_simple_yincrease.png width=4in
air2d.plot(yincrease=False)
@@ -328,6 +384,7 @@ and ``xincrease``.
xarray plots data with :ref:`missing_values`.
.. ipython:: python
+ :okwarning:
bad_air2d = air2d.copy()
@@ -345,10 +402,11 @@ It's not necessary for the coordinates to be evenly spaced. Both
produce plots with nonuniform coordinates.
.. ipython:: python
+ :okwarning:
b = air2d.copy()
# Apply a nonlinear transformation to one of the coords
- b.coords['lat'] = np.log(b.coords['lat'])
+ b.coords["lat"] = np.log(b.coords["lat"])
@savefig plotting_nonuniform_coords.png width=4in
b.plot()
@@ -361,11 +419,12 @@ Since this is a thin wrapper around matplotlib, all the functionality of
matplotlib is available.
.. ipython:: python
+ :okwarning:
air2d.plot(cmap=plt.cm.Blues)
- plt.title('These colors prove North America\nhas fallen in the ocean')
- plt.ylabel('latitude')
- plt.xlabel('longitude')
+ plt.title("These colors prove North America\nhas fallen in the ocean")
+ plt.ylabel("latitude")
+ plt.xlabel("longitude")
plt.tight_layout()
@savefig plotting_2d_call_matplotlib.png width=4in
@@ -380,8 +439,9 @@ matplotlib is available.
``d_ylog.plot()`` updates the xlabel.
.. ipython:: python
+ :okwarning:
- plt.xlabel('Never gonna see this.')
+ plt.xlabel("Never gonna see this.")
air2d.plot()
@savefig plotting_2d_call_matplotlib2.png width=4in
@@ -395,6 +455,7 @@ xarray borrows logic from Seaborn to infer what kind of color map to use. For
example, consider the original data in Kelvins rather than Celsius:
.. ipython:: python
+ :okwarning:
@savefig plotting_kelvin.png width=4in
airtemps.air.isel(time=0).plot()
@@ -413,6 +474,7 @@ Here we add two bad data points. This affects the color scale,
washing out the plot.
.. ipython:: python
+ :okwarning:
air_outliers = airtemps.air.isel(time=0).copy()
air_outliers[0, 0] = 100
@@ -428,6 +490,7 @@ This will use the 2nd and 98th
percentiles of the data to compute the color limits.
.. ipython:: python
+ :okwarning:
@savefig plotting_robust2.png width=4in
air_outliers.plot(robust=True)
@@ -446,6 +509,7 @@ rather than the default continuous colormaps that matplotlib uses. The
colormaps. For example, to make a plot with 8 discrete color intervals:
.. ipython:: python
+ :okwarning:
@savefig plotting_discrete_levels.png width=4in
air2d.plot(levels=8)
@@ -454,6 +518,7 @@ It is also possible to use a list of levels to specify the boundaries of the
discrete colormap:
.. ipython:: python
+ :okwarning:
@savefig plotting_listed_levels.png width=4in
air2d.plot(levels=[0, 12, 18, 30])
@@ -461,6 +526,7 @@ discrete colormap:
You can also specify a list of discrete colors through the ``colors`` argument:
.. ipython:: python
+ :okwarning:
flatui = ["#9b59b6", "#3498db", "#95a5a6", "#e74c3c", "#34495e", "#2ecc71"]
@savefig plotting_custom_colors_levels.png width=4in
@@ -473,10 +539,10 @@ if using ``imshow`` or ``pcolormesh`` (but not with ``contour`` or ``contourf``,
since levels are chosen automatically).
.. ipython:: python
- :okwarning:
+ :okwarning:
@savefig plotting_seaborn_palette.png width=4in
- air2d.plot(levels=10, cmap='husl')
+ air2d.plot(levels=10, cmap="husl")
plt.draw()
.. _plotting.faceting:
@@ -518,16 +584,20 @@ arguments to the xarray plotting methods/functions. This returns a
:py:class:`xarray.plot.FacetGrid` object.
.. ipython:: python
+ :okwarning:
@savefig plot_facet_dataarray.png
- g_simple = t.plot(x='lon', y='lat', col='time', col_wrap=3)
+ g_simple = t.plot(x="lon", y="lat", col="time", col_wrap=3)
Faceting also works for line plots.
.. ipython:: python
+ :okwarning:
@savefig plot_facet_dataarray_line.png
- g_simple_line = t.isel(lat=slice(0,None,4)).plot(x='lon', hue='lat', col='time', col_wrap=3)
+ g_simple_line = t.isel(lat=slice(0, None, 4)).plot(
+ x="lon", hue="lat", col="time", col_wrap=3
+ )
===============
4 dimensional
@@ -539,14 +609,15 @@ a fixed amount. Now we can see how the temperature maps would compare if
one were much hotter.
.. ipython:: python
+ :okwarning:
t2 = t.isel(time=slice(0, 2))
- t4d = xr.concat([t2, t2 + 40], pd.Index(['normal', 'hot'], name='fourth_dim'))
+ t4d = xr.concat([t2, t2 + 40], pd.Index(["normal", "hot"], name="fourth_dim"))
# This is a 4d array
t4d.coords
@savefig plot_facet_4d.png
- t4d.plot(x='lon', y='lat', col='time', row='fourth_dim')
+ t4d.plot(x="lon", y="lat", col="time", row="fourth_dim")
================
Other features
@@ -555,20 +626,27 @@ one were much hotter.
Faceted plotting supports other arguments common to xarray 2d plots.
.. ipython:: python
- :suppress:
+ :suppress:
- plt.close('all')
+ plt.close("all")
.. ipython:: python
+ :okwarning:
hasoutliers = t.isel(time=slice(0, 5)).copy()
hasoutliers[0, 0, 0] = -100
hasoutliers[-1, -1, -1] = 400
@savefig plot_facet_robust.png
- g = hasoutliers.plot.pcolormesh('lon', 'lat', col='time', col_wrap=3,
- robust=True, cmap='viridis',
- cbar_kwargs={'label': 'this has outliers'})
+ g = hasoutliers.plot.pcolormesh(
+ "lon",
+ "lat",
+ col="time",
+ col_wrap=3,
+ robust=True,
+ cmap="viridis",
+ cbar_kwargs={"label": "this has outliers"},
+ )
===================
FacetGrid Objects
@@ -594,20 +672,21 @@ It's possible to select the :py:class:`xarray.DataArray` or
.. ipython:: python
- g.data.loc[g.name_dicts[0, 0]]
+ g.data.loc[g.name_dicts[0, 0]]
Here is an example of using the lower level API and then modifying the axes after
they have been plotted.
.. ipython:: python
+ :okwarning:
- g = t.plot.imshow('lon', 'lat', col='time', col_wrap=3, robust=True)
+ g = t.plot.imshow("lon", "lat", col="time", col_wrap=3, robust=True)
for i, ax in enumerate(g.axes.flat):
- ax.set_title('Air Temperature %d' % i)
+ ax.set_title("Air Temperature %d" % i)
bottomright = g.axes[-1, -1]
- bottomright.annotate('bottom right', (240, 40))
+ bottomright.annotate("bottom right", (240, 40))
@savefig plot_facet_iterator.png
plt.draw()
@@ -632,23 +711,25 @@ Consider this dataset
.. ipython:: python
- ds = xr.tutorial.scatter_example_dataset()
- ds
+ ds = xr.tutorial.scatter_example_dataset()
+ ds
Suppose we want to scatter ``A`` against ``B``
.. ipython:: python
+ :okwarning:
@savefig ds_simple_scatter.png
- ds.plot.scatter(x='A', y='B')
+ ds.plot.scatter(x="A", y="B")
The ``hue`` kwarg lets you vary the color by variable value
.. ipython:: python
+ :okwarning:
@savefig ds_hue_scatter.png
- ds.plot.scatter(x='A', y='B', hue='w')
+ ds.plot.scatter(x="A", y="B", hue="w")
When ``hue`` is specified, a colorbar is added for numeric ``hue`` DataArrays by
default and a legend is added for non-numeric ``hue`` DataArrays (as above).
@@ -656,24 +737,27 @@ You can force a legend instead of a colorbar by setting ``hue_style='discrete'``
Additionally, the boolean kwarg ``add_guide`` can be used to prevent the display of a legend or colorbar (as appropriate).
.. ipython:: python
+ :okwarning:
ds = ds.assign(w=[1, 2, 3, 5])
@savefig ds_discrete_legend_hue_scatter.png
- ds.plot.scatter(x='A', y='B', hue='w', hue_style='discrete')
+ ds.plot.scatter(x="A", y="B", hue="w", hue_style="discrete")
The ``markersize`` kwarg lets you vary the point's size by variable value. You can additionally pass ``size_norm`` to control how the variable's values are mapped to point sizes.
.. ipython:: python
+ :okwarning:
@savefig ds_hue_size_scatter.png
- ds.plot.scatter(x='A', y='B', hue='z', hue_style='discrete', markersize='z')
+ ds.plot.scatter(x="A", y="B", hue="z", hue_style="discrete", markersize="z")
Faceting is also possible
.. ipython:: python
+ :okwarning:
@savefig ds_facet_scatter.png
- ds.plot.scatter(x='A', y='B', col='x', row='z', hue='w', hue_style='discrete')
+ ds.plot.scatter(x="A", y="B", col="x", row="z", hue="w", hue_style="discrete")
For more advanced scatter plots, we recommend converting the relevant data variables to a pandas DataFrame and using the extensive plotting capabilities of ``seaborn``.
@@ -689,27 +773,38 @@ To follow this section you'll need to have Cartopy installed and working.
This script will plot the air temperature on a map.
.. ipython:: python
+ :okwarning:
import cartopy.crs as ccrs
- air = xr.tutorial.open_dataset('air_temperature').air
- ax = plt.axes(projection=ccrs.Orthographic(-80, 35))
- air.isel(time=0).plot.contourf(ax=ax, transform=ccrs.PlateCarree());
+
+ air = xr.tutorial.open_dataset("air_temperature").air
+
+ p = air.isel(time=0).plot(
+ subplot_kws=dict(projection=ccrs.Orthographic(-80, 35), facecolor="gray"),
+ transform=ccrs.PlateCarree(),
+ )
+ p.axes.set_global()
+
@savefig plotting_maps_cartopy.png width=100%
- ax.set_global(); ax.coastlines();
+ p.axes.coastlines()
When faceting on maps, the projection can be transferred to the ``plot``
function using the ``subplot_kws`` keyword. The axes for the subplots created
by faceting are accessible in the object returned by ``plot``:
.. ipython:: python
+ :okwarning:
- p = air.isel(time=[0, 4]).plot(transform=ccrs.PlateCarree(), col='time',
- subplot_kws={'projection': ccrs.Orthographic(-80, 35)})
+ p = air.isel(time=[0, 4]).plot(
+ transform=ccrs.PlateCarree(),
+ col="time",
+ subplot_kws={"projection": ccrs.Orthographic(-80, 35)},
+ )
for ax in p.axes.flat:
ax.coastlines()
ax.gridlines()
@savefig plotting_maps_cartopy_facetting.png width=100%
- plt.draw();
+ plt.draw()
Details
@@ -730,8 +825,10 @@ There are three ways to use the xarray plotting functionality:
These are provided for user convenience; they all call the same code.
.. ipython:: python
+ :okwarning:
import xarray.plot as xplt
+
da = xr.DataArray(range(5))
fig, axes = plt.subplots(ncols=2, nrows=2)
da.plot(ax=axes[0, 0])
@@ -766,8 +863,7 @@ read on.
.. ipython:: python
- a0 = xr.DataArray(np.zeros((4, 3, 2)), dims=('y', 'x', 'z'),
- name='temperature')
+ a0 = xr.DataArray(np.zeros((4, 3, 2)), dims=("y", "x", "z"), name="temperature")
a0[0, 0, 0] = 1
a = a0.isel(z=0)
a
@@ -779,6 +875,7 @@ think carefully about what the limits, labels, and orientation for
each of the axes should be.
.. ipython:: python
+ :okwarning:
@savefig plotting_example_2d_simple.png width=4in
a.plot()
@@ -799,16 +896,19 @@ xarray, but you'll have to tell the plot function to use these coordinates
instead of the default ones:
.. ipython:: python
+ :okwarning:
lon, lat = np.meshgrid(np.linspace(-20, 20, 5), np.linspace(0, 30, 4))
- lon += lat/10
- lat += lon/10
- da = xr.DataArray(np.arange(20).reshape(4, 5), dims=['y', 'x'],
- coords = {'lat': (('y', 'x'), lat),
- 'lon': (('y', 'x'), lon)})
+ lon += lat / 10
+ lat += lon / 10
+ da = xr.DataArray(
+ np.arange(20).reshape(4, 5),
+ dims=["y", "x"],
+ coords={"lat": (("y", "x"), lat), "lon": (("y", "x"), lon)},
+ )
@savefig plotting_example_2d_irreg.png width=4in
- da.plot.pcolormesh('lon', 'lat');
+ da.plot.pcolormesh("lon", "lat")
Note that in this case, xarray still follows the pixel centered convention.
This might be undesirable in some cases, for example when your data is defined
@@ -816,24 +916,29 @@ on a polar projection (:issue:`781`). This is why the default is to not follow
this convention when plotting on a map:
.. ipython:: python
+ :okwarning:
import cartopy.crs as ccrs
- ax = plt.subplot(projection=ccrs.PlateCarree());
- da.plot.pcolormesh('lon', 'lat', ax=ax);
- ax.scatter(lon, lat, transform=ccrs.PlateCarree());
+
+ ax = plt.subplot(projection=ccrs.PlateCarree())
+ da.plot.pcolormesh("lon", "lat", ax=ax)
+ ax.scatter(lon, lat, transform=ccrs.PlateCarree())
+ ax.coastlines()
@savefig plotting_example_2d_irreg_map.png width=4in
- ax.coastlines(); ax.gridlines(draw_labels=True);
+ ax.gridlines(draw_labels=True)
You can however decide to infer the cell boundaries and use the
``infer_intervals`` keyword:
.. ipython:: python
+ :okwarning:
- ax = plt.subplot(projection=ccrs.PlateCarree());
- da.plot.pcolormesh('lon', 'lat', ax=ax, infer_intervals=True);
- ax.scatter(lon, lat, transform=ccrs.PlateCarree());
+ ax = plt.subplot(projection=ccrs.PlateCarree())
+ da.plot.pcolormesh("lon", "lat", ax=ax, infer_intervals=True)
+ ax.scatter(lon, lat, transform=ccrs.PlateCarree())
+ ax.coastlines()
@savefig plotting_example_2d_irreg_map_infer.png width=4in
- ax.coastlines(); ax.gridlines(draw_labels=True);
+ ax.gridlines(draw_labels=True)
.. note::
The data model of xarray does not support datasets with `cell boundaries`_
@@ -845,8 +950,9 @@ You can however decide to infer the cell boundaries and use the
One can also make line plots with multidimensional coordinates. In this case, ``hue`` must be a dimension name, not a coordinate name.
.. ipython:: python
+ :okwarning:
f, ax = plt.subplots(2, 1)
- da.plot.line(x='lon', hue='y', ax=ax[0]);
+ da.plot.line(x="lon", hue="y", ax=ax[0])
@savefig plotting_example_2d_hue_xy.png
- da.plot.line(x='lon', hue='x', ax=ax[1]);
+ da.plot.line(x="lon", hue="x", ax=ax[1])
diff --git a/doc/quick-overview.rst b/doc/quick-overview.rst
index 741b3d1a5fe..1a2bc809550 100644
--- a/doc/quick-overview.rst
+++ b/doc/quick-overview.rst
@@ -22,16 +22,14 @@ array or list, with optional *dimensions* and *coordinates*:
.. ipython:: python
- data = xr.DataArray(np.random.randn(2, 3),
- dims=('x', 'y'),
- coords={'x': [10, 20]})
+ data = xr.DataArray(np.random.randn(2, 3), dims=("x", "y"), coords={"x": [10, 20]})
data
In this case, we have generated a 2D array, assigned the names *x* and *y* to the two dimensions respectively and associated two *coordinate labels* '10' and '20' with the two locations along the x dimension. If you supply a pandas :py:class:`~pandas.Series` or :py:class:`~pandas.DataFrame`, metadata is copied directly:
.. ipython:: python
- xr.DataArray(pd.Series(range(3), index=list('abc'), name='foo'))
+ xr.DataArray(pd.Series(range(3), index=list("abc"), name="foo"))
Here are the key properties for a ``DataArray``:
@@ -48,7 +46,7 @@ Here are the key properties for a ``DataArray``:
Indexing
--------
-xarray supports four kind of indexing. Since we have assigned coordinate labels to the x dimension we can use label-based indexing along that dimension just like pandas. The four examples below all yield the same result (the value at `x=10`) but at varying levels of convenience and intuitiveness.
+xarray supports four kinds of indexing. Since we have assigned coordinate labels to the x dimension we can use label-based indexing along that dimension just like pandas. The four examples below all yield the same result (the value at `x=10`) but at varying levels of convenience and intuitiveness.
.. ipython:: python
@@ -75,13 +73,13 @@ While you're setting up your DataArray, it's often a good idea to set metadata a
.. ipython:: python
- data.attrs['long_name'] = 'random velocity'
- data.attrs['units'] = 'metres/sec'
- data.attrs['description'] = 'A random variable created as an example.'
- data.attrs['random_attribute'] = 123
+ data.attrs["long_name"] = "random velocity"
+ data.attrs["units"] = "metres/sec"
+ data.attrs["description"] = "A random variable created as an example."
+ data.attrs["random_attribute"] = 123
data.attrs
# you can add metadata to coordinates too
- data.x.attrs['units'] = 'x units'
+ data.x.attrs["units"] = "x units"
Computation
@@ -102,15 +100,15 @@ numbers:
.. ipython:: python
- data.mean(dim='x')
+ data.mean(dim="x")
Arithmetic operations broadcast based on dimension name. This means you don't
need to insert dummy dimensions for alignment:
.. ipython:: python
- a = xr.DataArray(np.random.randn(3), [data.coords['y']])
- b = xr.DataArray(np.random.randn(4), dims='z')
+ a = xr.DataArray(np.random.randn(3), [data.coords["y"]])
+ b = xr.DataArray(np.random.randn(4), dims="z")
a
b
@@ -139,9 +137,9 @@ xarray supports grouped operations using a very similar API to pandas (see :ref:
.. ipython:: python
- labels = xr.DataArray(['E', 'F', 'E'], [data.coords['y']], name='labels')
+ labels = xr.DataArray(["E", "F", "E"], [data.coords["y"]], name="labels")
labels
- data.groupby(labels).mean('y')
+ data.groupby(labels).mean("y")
data.groupby(labels).map(lambda x: x - x.min())
Plotting
@@ -155,7 +153,7 @@ Visualizing your datasets is quick and convenient:
data.plot()
Note the automatic labeling with names and units. Our effort in adding metadata attributes has paid off! Many aspects of these figures are customizable: see :ref:`plotting`.
-
+
pandas
------
@@ -178,7 +176,7 @@ objects. You can think of it as a multi-dimensional generalization of the
.. ipython:: python
- ds = xr.Dataset({'foo': data, 'bar': ('x', [1, 2]), 'baz': np.pi})
+ ds = xr.Dataset({"foo": data, "bar": ("x", [1, 2]), "baz": np.pi})
ds
@@ -186,7 +184,7 @@ This creates a dataset with three DataArrays named ``foo``, ``bar`` and ``baz``.
.. ipython:: python
- ds['foo']
+ ds["foo"]
ds.foo
@@ -216,14 +214,15 @@ You can directly read and write xarray objects to disk using :py:meth:`~xarray.D
.. ipython:: python
- ds.to_netcdf('example.nc')
- xr.open_dataset('example.nc')
+ ds.to_netcdf("example.nc")
+ xr.open_dataset("example.nc")
.. ipython:: python
- :suppress:
+ :suppress:
import os
- os.remove('example.nc')
+
+ os.remove("example.nc")
It is common for datasets to be distributed across multiple files (commonly one file per timestep). xarray supports this use-case by providing the :py:meth:`~xarray.open_mfdataset` and the :py:meth:`~xarray.save_mfdataset` methods. For more, see :ref:`io`.
diff --git a/doc/related-projects.rst b/doc/related-projects.rst
index 57b8da0c447..456cb64197f 100644
--- a/doc/related-projects.rst
+++ b/doc/related-projects.rst
@@ -3,9 +3,11 @@
Xarray related projects
-----------------------
-Here below is a list of existing open source projects that build
+Below is a list of existing open source projects that build
functionality upon xarray. See also section :ref:`internals` for more
-details on how to build xarray extensions.
+details on how to build xarray extensions. We also maintain the
+`xarray-contrib `_ GitHub organization
+as a place to curate projects that build upon xarray.
Geosciences
~~~~~~~~~~~
@@ -36,10 +38,11 @@ Geosciences
harmonic wind analysis in Python.
- `wrf-python `_: A collection of diagnostic and interpolation routines for use with output of the Weather Research and Forecasting (WRF-ARW) Model.
- `xarray-simlab `_: xarray extension for computer model simulations.
+- `xarray-spatial `_: Numba-accelerated raster-based spatial processing tools (NDVI, curvature, zonal-statistics, proximity, hillshading, viewshed, etc.)
- `xarray-topo `_: xarray extension for topographic analysis and modelling.
- `xbpch `_: xarray interface for bpch files.
- `xclim `_: A library for calculating climate science indices with unit handling built from xarray and dask.
-- `xESMF `_: Universal Regridder for Geospatial Data.
+- `xESMF `_: Universal regridder for geospatial data.
- `xgcm `_: Extends the xarray data model to understand finite volume grid cells (common in General Circulation Models) and provides interpolation and difference operations for such grids.
- `xmitgcm `_: a python package for reading `MITgcm `_ binary MDS files into xarray data structures.
- `xshape `_: Tools for working with shapefiles, topographies, and polygons in xarray.
@@ -55,6 +58,7 @@ Other domains
~~~~~~~~~~~~~
- `ptsa `_: EEG Time Series Analysis
- `pycalphad `_: Computational Thermodynamics in Python
+- `pyomeca `_: Python framework for biomechanical analysis
Extend xarray capabilities
~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -62,19 +66,23 @@ Extend xarray capabilities
- `eofs `_: EOF analysis in Python.
- `hypothesis-gufunc `_: Extension to hypothesis. Makes it easy to write unit tests with xarray objects as input.
- `nxarray `_: NeXus input/output capability for xarray.
+- `xarray-compare `_: xarray extension for data comparison.
+- `xarray-custom `_: Data classes for custom xarray creation.
- `xarray_extras `_: Advanced algorithms for xarray objects (e.g. integrations/interpolations).
- `xpublish `_: Publish Xarray Datasets via a Zarr compatible REST API.
- `xrft `_: Fourier transforms for xarray data.
- `xr-scipy `_: A lightweight scipy wrapper for xarray.
- `X-regression `_: Multiple linear regression from Statsmodels library coupled with Xarray library.
-- `xskillscore `_: Metrics for verifying forecasts.
+- `xskillscore `_: Metrics for verifying forecasts.
- `xyzpy `_: Easily generate high dimensional data, including parallelization.
Visualization
~~~~~~~~~~~~~
-- `Datashader `_, `geoviews `_, `holoviews `_, : visualization packages for large data.
+- `datashader `_, `geoviews `_, `holoviews `_, : visualization packages for large data.
- `hvplot `_ : A high-level plotting API for the PyData ecosystem built on HoloViews.
- `psyplot `_: Interactive data visualization with python.
+- `xarray-leaflet `_: An xarray extension for tiled map plotting based on ipyleaflet.
+- `xtrude `_: An xarray extension for 3D terrain visualization based on pydeck.
Non-Python projects
~~~~~~~~~~~~~~~~~~~
diff --git a/doc/reshaping.rst b/doc/reshaping.rst
index 465ca14dfc2..81fd4a6d35e 100644
--- a/doc/reshaping.rst
+++ b/doc/reshaping.rst
@@ -7,25 +7,26 @@ Reshaping and reorganizing data
These methods allow you to reorganize
.. ipython:: python
- :suppress:
+ :suppress:
import numpy as np
import pandas as pd
import xarray as xr
+
np.random.seed(123456)
Reordering dimensions
---------------------
To reorder dimensions on a :py:class:`~xarray.DataArray` or across all variables
-on a :py:class:`~xarray.Dataset`, use :py:meth:`~xarray.DataArray.transpose`. An
+on a :py:class:`~xarray.Dataset`, use :py:meth:`~xarray.DataArray.transpose`. An
ellipsis (`...`) can be use to represent all other dimensions:
.. ipython:: python
- ds = xr.Dataset({'foo': (('x', 'y', 'z'), [[[42]]]), 'bar': (('y', 'z'), [[24]])})
- ds.transpose('y', 'z', 'x')
- ds.transpose(..., 'x') # equivalent
+ ds = xr.Dataset({"foo": (("x", "y", "z"), [[[42]]]), "bar": (("y", "z"), [[24]])})
+ ds.transpose("y", "z", "x")
+ ds.transpose(..., "x") # equivalent
ds.transpose() # reverses all dimensions
Expand and squeeze dimensions
@@ -37,7 +38,7 @@ use :py:meth:`~xarray.DataArray.expand_dims`
.. ipython:: python
- expanded = ds.expand_dims('w')
+ expanded = ds.expand_dims("w")
expanded
This method attaches a new dimension with size 1 to all data variables.
@@ -48,7 +49,7 @@ use :py:meth:`~xarray.DataArray.squeeze`
.. ipython:: python
- expanded.squeeze('w')
+ expanded.squeeze("w")
Converting between datasets and arrays
--------------------------------------
@@ -69,14 +70,14 @@ To convert back from a DataArray to a Dataset, use
.. ipython:: python
- arr.to_dataset(dim='variable')
+ arr.to_dataset(dim="variable")
The broadcasting behavior of ``to_array`` means that the resulting array
includes the union of data variable dimensions:
.. ipython:: python
- ds2 = xr.Dataset({'a': 0, 'b': ('x', [3, 4, 5])})
+ ds2 = xr.Dataset({"a": 0, "b": ("x", [3, 4, 5])})
# the input dataset has 4 elements
ds2
@@ -90,7 +91,7 @@ If you use ``to_dataset`` without supplying the ``dim`` argument, the DataArray
.. ipython:: python
- arr.to_dataset(name='combined')
+ arr.to_dataset(name="combined")
.. _reshape.stack:
@@ -103,11 +104,12 @@ implemented :py:meth:`~xarray.DataArray.stack` and
.. ipython:: python
- array = xr.DataArray(np.random.randn(2, 3),
- coords=[('x', ['a', 'b']), ('y', [0, 1, 2])])
- stacked = array.stack(z=('x', 'y'))
+ array = xr.DataArray(
+ np.random.randn(2, 3), coords=[("x", ["a", "b"]), ("y", [0, 1, 2])]
+ )
+ stacked = array.stack(z=("x", "y"))
stacked
- stacked.unstack('z')
+ stacked.unstack("z")
As elsewhere in xarray, an ellipsis (`...`) can be used to represent all unlisted dimensions:
@@ -128,15 +130,15 @@ possible levels. Missing levels are filled in with ``NaN`` in the resulting obje
stacked2 = stacked[::2]
stacked2
- stacked2.unstack('z')
+ stacked2.unstack("z")
However, xarray's ``stack`` has an important difference from pandas: unlike
pandas, it does not automatically drop missing values. Compare:
.. ipython:: python
- array = xr.DataArray([[np.nan, 1], [2, 3]], dims=['x', 'y'])
- array.stack(z=('x', 'y'))
+ array = xr.DataArray([[np.nan, 1], [2, 3]], dims=["x", "y"])
+ array.stack(z=("x", "y"))
array.to_pandas().stack()
We departed from pandas's behavior here because predictable shapes for new
@@ -166,16 +168,15 @@ like this:
.. ipython:: python
- data = xr.Dataset(
- data_vars={'a': (('x', 'y'), [[0, 1, 2], [3, 4, 5]]),
- 'b': ('x', [6, 7])},
- coords={'y': ['u', 'v', 'w']}
- )
- data
- stacked = data.to_stacked_array("z", sample_dims=['x'])
- stacked
- unstacked = stacked.to_unstacked_dataset("z")
- unstacked
+ data = xr.Dataset(
+ data_vars={"a": (("x", "y"), [[0, 1, 2], [3, 4, 5]]), "b": ("x", [6, 7])},
+ coords={"y": ["u", "v", "w"]},
+ )
+ data
+ stacked = data.to_stacked_array("z", sample_dims=["x"])
+ stacked
+ unstacked = stacked.to_unstacked_dataset("z")
+ unstacked
In this example, ``stacked`` is a two dimensional array that we can easily pass to a scikit-learn or another generic
numerical method.
@@ -202,19 +203,23 @@ coordinates using :py:meth:`~xarray.DataArray.set_index`:
.. ipython:: python
- da = xr.DataArray(np.random.rand(4),
- coords={'band': ('x', ['a', 'a', 'b', 'b']),
- 'wavenumber': ('x', np.linspace(200, 400, 4))},
- dims='x')
- da
- mda = da.set_index(x=['band', 'wavenumber'])
- mda
+ da = xr.DataArray(
+ np.random.rand(4),
+ coords={
+ "band": ("x", ["a", "a", "b", "b"]),
+ "wavenumber": ("x", np.linspace(200, 400, 4)),
+ },
+ dims="x",
+ )
+ da
+ mda = da.set_index(x=["band", "wavenumber"])
+ mda
These coordinates can now be used for indexing, e.g.,
.. ipython:: python
- mda.sel(band='a')
+ mda.sel(band="a")
Conversely, you can use :py:meth:`~xarray.DataArray.reset_index`
to extract multi-index levels as coordinates (this is mainly useful
@@ -222,27 +227,27 @@ for serialization):
.. ipython:: python
- mda.reset_index('x')
+ mda.reset_index("x")
:py:meth:`~xarray.DataArray.reorder_levels` allows changing the order
of multi-index levels:
.. ipython:: python
- mda.reorder_levels(x=['wavenumber', 'band'])
+ mda.reorder_levels(x=["wavenumber", "band"])
As of xarray v0.9 coordinate labels for each dimension are optional.
-You can also use ``.set_index`` / ``.reset_index`` to add / remove
+You can also use ``.set_index`` / ``.reset_index`` to add / remove
labels for one or several dimensions:
.. ipython:: python
- array = xr.DataArray([1, 2, 3], dims='x')
+ array = xr.DataArray([1, 2, 3], dims="x")
array
- array['c'] = ('x', ['a', 'b', 'c'])
- array.set_index(x='c')
- array = array.set_index(x='c')
- array = array.reset_index('x', drop=True)
+ array["c"] = ("x", ["a", "b", "c"])
+ array.set_index(x="c")
+ array = array.set_index(x="c")
+ array = array.reset_index("x", drop=True)
.. _reshape.shift_and_roll:
@@ -254,9 +259,9 @@ To adjust coordinate labels, you can use the :py:meth:`~xarray.Dataset.shift` an
.. ipython:: python
- array = xr.DataArray([1, 2, 3, 4], dims='x')
- array.shift(x=2)
- array.roll(x=2, roll_coords=True)
+ array = xr.DataArray([1, 2, 3, 4], dims="x")
+ array.shift(x=2)
+ array.roll(x=2, roll_coords=True)
.. _reshape.sort:
@@ -269,17 +274,18 @@ One may sort a DataArray/Dataset via :py:meth:`~xarray.DataArray.sortby` and
.. ipython:: python
- ds = xr.Dataset({'A': (('x', 'y'), [[1, 2], [3, 4]]),
- 'B': (('x', 'y'), [[5, 6], [7, 8]])},
- coords={'x': ['b', 'a'], 'y': [1, 0]})
- dax = xr.DataArray([100, 99], [('x', [0, 1])])
- day = xr.DataArray([90, 80], [('y', [0, 1])])
- ds.sortby([day, dax])
+ ds = xr.Dataset(
+ {"A": (("x", "y"), [[1, 2], [3, 4]]), "B": (("x", "y"), [[5, 6], [7, 8]])},
+ coords={"x": ["b", "a"], "y": [1, 0]},
+ )
+ dax = xr.DataArray([100, 99], [("x", [0, 1])])
+ day = xr.DataArray([90, 80], [("y", [0, 1])])
+ ds.sortby([day, dax])
As a shortcut, you can refer to existing coordinates by name:
.. ipython:: python
- ds.sortby('x')
- ds.sortby(['y', 'x'])
- ds.sortby(['y', 'x'], ascending=False)
+ ds.sortby("x")
+ ds.sortby(["y", "x"])
+ ds.sortby(["y", "x"], ascending=False)
diff --git a/doc/roadmap.rst b/doc/roadmap.rst
index 401dac779ad..1cbbaf8ef42 100644
--- a/doc/roadmap.rst
+++ b/doc/roadmap.rst
@@ -224,6 +224,8 @@ Current core developers
- Tom Nicholas
- Guido Imperiale
- Justus Magin
+- Mathias Hauser
+- Anderson Banihirwe
NumFOCUS
~~~~~~~~
diff --git a/doc/terminology.rst b/doc/terminology.rst
index ab6d856920a..3cfc211593f 100644
--- a/doc/terminology.rst
+++ b/doc/terminology.rst
@@ -4,40 +4,111 @@
Terminology
===========
-*Xarray terminology differs slightly from CF, mathematical conventions, and pandas; and therefore using xarray, understanding the documentation, and parsing error messages is easier once key terminology is defined. This glossary was designed so that more fundamental concepts come first. Thus for new users, this page is best read top-to-bottom. Throughout the glossary,* ``arr`` *will refer to an xarray* :py:class:`DataArray` *in any small examples. For more complete examples, please consult the relevant documentation.*
-
-----
-
-**DataArray:** A multi-dimensional array with labeled or named dimensions. ``DataArray`` objects add metadata such as dimension names, coordinates, and attributes (defined below) to underlying "unlabeled" data structures such as numpy and Dask arrays. If its optional ``name`` property is set, it is a *named DataArray*.
-
-----
-
-**Dataset:** A dict-like collection of ``DataArray`` objects with aligned dimensions. Thus, most operations that can be performed on the dimensions of a single ``DataArray`` can be performed on a dataset. Datasets have data variables (see **Variable** below), dimensions, coordinates, and attributes.
-
-----
-
-**Variable:** A `NetCDF-like variable `_ consisting of dimensions, data, and attributes which describe a single array. The main functional difference between variables and numpy arrays is that numerical operations on variables implement array broadcasting by dimension name. Each ``DataArray`` has an underlying variable that can be accessed via ``arr.variable``. However, a variable is not fully described outside of either a ``Dataset`` or a ``DataArray``.
-
-.. note::
-
- The :py:class:`Variable` class is low-level interface and can typically be ignored. However, the word "variable" appears often enough in the code and documentation that is useful to understand.
-
-----
-
-**Dimension:** In mathematics, the *dimension* of data is loosely the number of degrees of freedom for it. A *dimension axis* is a set of all points in which all but one of these degrees of freedom is fixed. We can think of each dimension axis as having a name, for example the "x dimension". In xarray, a ``DataArray`` object's *dimensions* are its named dimension axes, and the name of the ``i``-th dimension is ``arr.dims[i]``. If an array is created without dimensions, the default dimension names are ``dim_0``, ``dim_1``, and so forth.
-
-----
-
-**Coordinate:** An array that labels a dimension or set of dimensions of another ``DataArray``. In the usual one-dimensional case, the coordinate array's values can loosely be thought of as tick labels along a dimension. There are two types of coordinate arrays: *dimension coordinates* and *non-dimension coordinates* (see below). A coordinate named ``x`` can be retrieved from ``arr.coords[x]``. A ``DataArray`` can have more coordinates than dimensions because a single dimension can be labeled by multiple coordinate arrays. However, only one coordinate array can be a assigned as a particular dimension's dimension coordinate array. As a consequence, ``len(arr.dims) <= len(arr.coords)`` in general.
-
-----
-
-**Dimension coordinate:** A one-dimensional coordinate array assigned to ``arr`` with both a name and dimension name in ``arr.dims``. Dimension coordinates are used for label-based indexing and alignment, like the index found on a :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`. In fact, dimension coordinates use :py:class:`pandas.Index` objects under the hood for efficient computation. Dimension coordinates are marked by ``*`` when printing a ``DataArray`` or ``Dataset``.
-
-----
-
-**Non-dimension coordinate:** A coordinate array assigned to ``arr`` with a name in ``arr.coords`` but *not* in ``arr.dims``. These coordinates arrays can be one-dimensional or multidimensional, and they are useful for auxiliary labeling. As an example, multidimensional coordinates are often used in geoscience datasets when :doc:`the data's physical coordinates (such as latitude and longitude) differ from their logical coordinates `. However, non-dimension coordinates are not indexed, and any operation on non-dimension coordinates that leverages indexing will fail. Printing ``arr.coords`` will print all of ``arr``'s coordinate names, with the corresponding dimension(s) in parentheses. For example, ``coord_name (dim_name) 1 2 3 ...``.
-
-----
-
-**Index:** An *index* is a data structure optimized for efficient selecting and slicing of an associated array. Xarray creates indexes for dimension coordinates so that operations along dimensions are fast, while non-dimension coordinates are not indexed. Under the hood, indexes are implemented as :py:class:`pandas.Index` objects. The index associated with dimension name ``x`` can be retrieved by ``arr.indexes[x]``. By construction, ``len(arr.dims) == len(arr.indexes)``
+*Xarray terminology differs slightly from CF, mathematical conventions, and
+pandas; so we've put together a glossary of its terms. Here,* ``arr`` *
+refers to an xarray* :py:class:`DataArray` *in the examples. For more
+complete examples, please consult the relevant documentation.*
+
+.. glossary::
+
+ DataArray
+ A multi-dimensional array with labeled or named
+ dimensions. ``DataArray`` objects add metadata such as dimension names,
+ coordinates, and attributes (defined below) to underlying "unlabeled"
+ data structures such as numpy and Dask arrays. If its optional ``name``
+ property is set, it is a *named DataArray*.
+
+ Dataset
+ A dict-like collection of ``DataArray`` objects with aligned
+ dimensions. Thus, most operations that can be performed on the
+ dimensions of a single ``DataArray`` can be performed on a
+ dataset. Datasets have data variables (see **Variable** below),
+ dimensions, coordinates, and attributes.
+
+ Variable
+ A `NetCDF-like variable
+ `_
+ consisting of dimensions, data, and attributes which describe a single
+ array. The main functional difference between variables and numpy arrays
+ is that numerical operations on variables implement array broadcasting
+ by dimension name. Each ``DataArray`` has an underlying variable that
+ can be accessed via ``arr.variable``. However, a variable is not fully
+ described outside of either a ``Dataset`` or a ``DataArray``.
+
+ .. note::
+
+ The :py:class:`Variable` class is low-level interface and can
+ typically be ignored. However, the word "variable" appears often
+ enough in the code and documentation that is useful to understand.
+
+ Dimension
+ In mathematics, the *dimension* of data is loosely the number of degrees
+ of freedom for it. A *dimension axis* is a set of all points in which
+ all but one of these degrees of freedom is fixed. We can think of each
+ dimension axis as having a name, for example the "x dimension". In
+ xarray, a ``DataArray`` object's *dimensions* are its named dimension
+ axes, and the name of the ``i``-th dimension is ``arr.dims[i]``. If an
+ array is created without dimension names, the default dimension names are
+ ``dim_0``, ``dim_1``, and so forth.
+
+ Coordinate
+ An array that labels a dimension or set of dimensions of another
+ ``DataArray``. In the usual one-dimensional case, the coordinate array's
+ values can loosely be thought of as tick labels along a dimension. There
+ are two types of coordinate arrays: *dimension coordinates* and
+ *non-dimension coordinates* (see below). A coordinate named ``x`` can be
+ retrieved from ``arr.coords[x]``. A ``DataArray`` can have more
+ coordinates than dimensions because a single dimension can be labeled by
+ multiple coordinate arrays. However, only one coordinate array can be a
+ assigned as a particular dimension's dimension coordinate array. As a
+ consequence, ``len(arr.dims) <= len(arr.coords)`` in general.
+
+ Dimension coordinate
+ A one-dimensional coordinate array assigned to ``arr`` with both a name
+ and dimension name in ``arr.dims``. Dimension coordinates are used for
+ label-based indexing and alignment, like the index found on a
+ :py:class:`pandas.DataFrame` or :py:class:`pandas.Series`. In fact,
+ dimension coordinates use :py:class:`pandas.Index` objects under the
+ hood for efficient computation. Dimension coordinates are marked by
+ ``*`` when printing a ``DataArray`` or ``Dataset``.
+
+ Non-dimension coordinate
+ A coordinate array assigned to ``arr`` with a name in ``arr.coords`` but
+ *not* in ``arr.dims``. These coordinates arrays can be one-dimensional
+ or multidimensional, and they are useful for auxiliary labeling. As an
+ example, multidimensional coordinates are often used in geoscience
+ datasets when :doc:`the data's physical coordinates (such as latitude
+ and longitude) differ from their logical coordinates
+ `. However, non-dimension coordinates
+ are not indexed, and any operation on non-dimension coordinates that
+ leverages indexing will fail. Printing ``arr.coords`` will print all of
+ ``arr``'s coordinate names, with the corresponding dimension(s) in
+ parentheses. For example, ``coord_name (dim_name) 1 2 3 ...``.
+
+ Index
+ An *index* is a data structure optimized for efficient selecting and
+ slicing of an associated array. Xarray creates indexes for dimension
+ coordinates so that operations along dimensions are fast, while
+ non-dimension coordinates are not indexed. Under the hood, indexes are
+ implemented as :py:class:`pandas.Index` objects. The index associated
+ with dimension name ``x`` can be retrieved by ``arr.indexes[x]``. By
+ construction, ``len(arr.dims) == len(arr.indexes)``
+
+ name
+ The names of dimensions, coordinates, DataArray objects and data
+ variables can be anything as long as they are :term:`hashable`. However,
+ it is preferred to use :py:class:`str` typed names.
+
+ scalar
+ By definition, a scalar is not an :term:`array` and when converted to
+ one, it has 0 dimensions. That means that, e.g., :py:class:`int`,
+ :py:class:`float`, and :py:class:`str` objects are "scalar" while
+ :py:class:`list` or :py:class:`tuple` are not.
+
+ duck array
+ `Duck arrays`__ are array implementations that behave
+ like numpy arrays. They have to define the ``shape``, ``dtype`` and
+ ``ndim`` properties. For integration with ``xarray``, the ``__array__``,
+ ``__array_ufunc__`` and ``__array_function__`` protocols are also required.
+
+ __ https://numpy.org/neps/nep-0022-ndarray-duck-typing-overview.html
diff --git a/doc/time-series.rst b/doc/time-series.rst
index d838dbbd4cd..96a2edc0ea5 100644
--- a/doc/time-series.rst
+++ b/doc/time-series.rst
@@ -10,11 +10,12 @@ data in pandas such a joy to xarray. In most cases, we rely on pandas for the
core functionality.
.. ipython:: python
- :suppress:
+ :suppress:
import numpy as np
import pandas as pd
import xarray as xr
+
np.random.seed(123456)
Creating datetime64 data
@@ -29,8 +30,8 @@ using :py:func:`pandas.to_datetime` and :py:func:`pandas.date_range`:
.. ipython:: python
- pd.to_datetime(['2000-01-01', '2000-02-02'])
- pd.date_range('2000-01-01', periods=365)
+ pd.to_datetime(["2000-01-01", "2000-02-02"])
+ pd.date_range("2000-01-01", periods=365)
Alternatively, you can supply arrays of Python ``datetime`` objects. These get
converted automatically when used as arguments in xarray objects:
@@ -38,7 +39,8 @@ converted automatically when used as arguments in xarray objects:
.. ipython:: python
import datetime
- xr.Dataset({'time': datetime.datetime(2000, 1, 1)})
+
+ xr.Dataset({"time": datetime.datetime(2000, 1, 1)})
When reading or writing netCDF files, xarray automatically decodes datetime and
timedelta arrays using `CF conventions`_ (that is, by using a ``units``
@@ -62,8 +64,8 @@ You can manual decode arrays in this form by passing a dataset to
.. ipython:: python
- attrs = {'units': 'hours since 2000-01-01'}
- ds = xr.Dataset({'time': ('time', [0, 1, 2, 3], attrs)})
+ attrs = {"units": "hours since 2000-01-01"}
+ ds = xr.Dataset({"time": ("time", [0, 1, 2, 3], attrs)})
xr.decode_cf(ds)
One unfortunate limitation of using ``datetime64[ns]`` is that it limits the
@@ -87,10 +89,10 @@ items and with the `slice` object:
.. ipython:: python
- time = pd.date_range('2000-01-01', freq='H', periods=365 * 24)
- ds = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time})
- ds.sel(time='2000-01')
- ds.sel(time=slice('2000-06-01', '2000-06-10'))
+ time = pd.date_range("2000-01-01", freq="H", periods=365 * 24)
+ ds = xr.Dataset({"foo": ("time", np.arange(365 * 24)), "time": time})
+ ds.sel(time="2000-01")
+ ds.sel(time=slice("2000-06-01", "2000-06-10"))
You can also select a particular time by indexing with a
:py:class:`datetime.time` object:
@@ -113,8 +115,8 @@ given ``DataArray`` can be quickly computed using a special ``.dt`` accessor.
.. ipython:: python
- time = pd.date_range('2000-01-01', freq='6H', periods=365 * 4)
- ds = xr.Dataset({'foo': ('time', np.arange(365 * 4)), 'time': time})
+ time = pd.date_range("2000-01-01", freq="6H", periods=365 * 4)
+ ds = xr.Dataset({"foo": ("time", np.arange(365 * 4)), "time": time})
ds.time.dt.hour
ds.time.dt.dayofweek
@@ -130,16 +132,16 @@ __ http://pandas.pydata.org/pandas-docs/stable/api.html#time-date-components
.. ipython:: python
- ds['time.month']
- ds['time.dayofyear']
+ ds["time.month"]
+ ds["time.dayofyear"]
For use as a derived coordinate, xarray adds ``'season'`` to the list of
datetime components supported by pandas:
.. ipython:: python
- ds['time.season']
- ds['time'].dt.season
+ ds["time.season"]
+ ds["time"].dt.season
The set of valid seasons consists of 'DJF', 'MAM', 'JJA' and 'SON', labeled by
the first letters of the corresponding months.
@@ -152,7 +154,7 @@ __ http://pandas.pydata.org/pandas-docs/stable/timeseries.html#offset-aliases
.. ipython:: python
- ds['time'].dt.floor('D')
+ ds["time"].dt.floor("D")
The ``.dt`` accessor can also be used to generate formatted datetime strings
for arrays utilising the same formatting as the standard `datetime.strftime`_.
@@ -161,7 +163,7 @@ for arrays utilising the same formatting as the standard `datetime.strftime`_.
.. ipython:: python
- ds['time'].dt.strftime('%a, %b %d %H:%M')
+ ds["time"].dt.strftime("%a, %b %d %H:%M")
.. _resampling:
@@ -173,9 +175,9 @@ Datetime components couple particularly well with grouped operations (see
calculate the mean by time of day:
.. ipython:: python
- :okwarning:
+ :okwarning:
- ds.groupby('time.hour').mean()
+ ds.groupby("time.hour").mean()
For upsampling or downsampling temporal resolutions, xarray offers a
:py:meth:`~xarray.Dataset.resample` method building on the core functionality
@@ -187,25 +189,25 @@ same api as ``resample`` `in pandas`_.
For example, we can downsample our dataset from hourly to 6-hourly:
.. ipython:: python
- :okwarning:
+ :okwarning:
- ds.resample(time='6H')
+ ds.resample(time="6H")
This will create a specialized ``Resample`` object which saves information
necessary for resampling. All of the reduction methods which work with
``Resample`` objects can also be used for resampling:
.. ipython:: python
- :okwarning:
+ :okwarning:
- ds.resample(time='6H').mean()
+ ds.resample(time="6H").mean()
You can also supply an arbitrary reduction function to aggregate over each
resampling group:
.. ipython:: python
- ds.resample(time='6H').reduce(np.mean)
+ ds.resample(time="6H").reduce(np.mean)
For upsampling, xarray provides six methods: ``asfreq``, ``ffill``, ``bfill``, ``pad``,
``nearest`` and ``interpolate``. ``interpolate`` extends ``scipy.interpolate.interp1d``
@@ -218,7 +220,7 @@ Data that has indices outside of the given ``tolerance`` are set to ``NaN``.
.. ipython:: python
- ds.resample(time='1H').nearest(tolerance='1H')
+ ds.resample(time="1H").nearest(tolerance="1H")
For more examples of using grouped operations on a time dimension, see
diff --git a/doc/weather-climate.rst b/doc/weather-climate.rst
index 768cf6556f9..db612d74859 100644
--- a/doc/weather-climate.rst
+++ b/doc/weather-climate.rst
@@ -4,7 +4,7 @@ Weather and climate data
========================
.. ipython:: python
- :suppress:
+ :suppress:
import xarray as xr
@@ -56,11 +56,14 @@ coordinate with dates from a no-leap calendar and a
.. ipython:: python
- from itertools import product
- from cftime import DatetimeNoLeap
- dates = [DatetimeNoLeap(year, month, 1) for year, month in
- product(range(1, 3), range(1, 13))]
- da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')
+ from itertools import product
+ from cftime import DatetimeNoLeap
+
+ dates = [
+ DatetimeNoLeap(year, month, 1)
+ for year, month in product(range(1, 3), range(1, 13))
+ ]
+ da = xr.DataArray(np.arange(24), coords=[dates], dims=["time"], name="foo")
xarray also includes a :py:func:`~xarray.cftime_range` function, which enables
creating a :py:class:`~xarray.CFTimeIndex` with regularly-spaced dates. For
@@ -68,30 +71,50 @@ instance, we can create the same dates and DataArray we created above using:
.. ipython:: python
- dates = xr.cftime_range(start='0001', periods=24, freq='MS', calendar='noleap')
- da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'], name='foo')
+ dates = xr.cftime_range(start="0001", periods=24, freq="MS", calendar="noleap")
+ da = xr.DataArray(np.arange(24), coords=[dates], dims=["time"], name="foo")
+
+Mirroring pandas' method with the same name, :py:meth:`~xarray.infer_freq` allows one to
+infer the sampling frequency of a :py:class:`~xarray.CFTimeIndex` or a 1-D
+:py:class:`~xarray.DataArray` containing cftime objects. It also works transparently with
+``np.datetime64[ns]`` and ``np.timedelta64[ns]`` data.
+
+.. ipython:: python
+
+ xr.infer_freq(dates)
With :py:meth:`~xarray.CFTimeIndex.strftime` we can also easily generate formatted strings from
the datetime values of a :py:class:`~xarray.CFTimeIndex` directly or through the
-:py:meth:`~xarray.DataArray.dt` accessor for a :py:class:`~xarray.DataArray`
+``dt`` accessor for a :py:class:`~xarray.DataArray`
using the same formatting as the standard `datetime.strftime`_ convention .
.. _datetime.strftime: https://docs.python.org/3/library/datetime.html#strftime-strptime-behavior
.. ipython:: python
- dates.strftime('%c')
- da['time'].dt.strftime('%Y%m%d')
+ dates.strftime("%c")
+ da["time"].dt.strftime("%Y%m%d")
For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:
-- `Partial datetime string indexing`_ using strictly `ISO 8601-format`_ partial
- datetime strings:
+- `Partial datetime string indexing`_:
.. ipython:: python
- da.sel(time='0001')
- da.sel(time=slice('0001-05', '0002-02'))
+ da.sel(time="0001")
+ da.sel(time=slice("0001-05", "0002-02"))
+
+.. note::
+
+
+ For specifying full or partial datetime strings in cftime
+ indexing, xarray supports two versions of the `ISO 8601 standard`_, the
+ basic pattern (YYYYMMDDhhmmss) or the extended pattern
+ (YYYY-MM-DDThh:mm:ss), as well as the default cftime string format
+ (YYYY-MM-DD hh:mm:ss). This is somewhat more restrictive than pandas;
+ in other words, some datetime strings that would be valid for a
+ :py:class:`pandas.DatetimeIndex` are not valid for an
+ :py:class:`~xarray.CFTimeIndex`.
- Access of basic datetime components via the ``dt`` accessor (in this case
just "year", "month", "day", "hour", "minute", "second", "microsecond",
@@ -99,64 +122,65 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:
.. ipython:: python
- da.time.dt.year
- da.time.dt.month
- da.time.dt.season
- da.time.dt.dayofyear
- da.time.dt.dayofweek
- da.time.dt.days_in_month
+ da.time.dt.year
+ da.time.dt.month
+ da.time.dt.season
+ da.time.dt.dayofyear
+ da.time.dt.dayofweek
+ da.time.dt.days_in_month
- Rounding of datetimes to fixed frequencies via the ``dt`` accessor:
.. ipython:: python
- da.time.dt.ceil('3D')
- da.time.dt.floor('5D')
- da.time.dt.round('2D')
-
+ da.time.dt.ceil("3D")
+ da.time.dt.floor("5D")
+ da.time.dt.round("2D")
+
- Group-by operations based on datetime accessor attributes (e.g. by month of
the year):
.. ipython:: python
- da.groupby('time.month').sum()
+ da.groupby("time.month").sum()
- Interpolation using :py:class:`cftime.datetime` objects:
.. ipython:: python
- da.interp(time=[DatetimeNoLeap(1, 1, 15), DatetimeNoLeap(1, 2, 15)])
+ da.interp(time=[DatetimeNoLeap(1, 1, 15), DatetimeNoLeap(1, 2, 15)])
- Interpolation using datetime strings:
.. ipython:: python
- da.interp(time=['0001-01-15', '0001-02-15'])
+ da.interp(time=["0001-01-15", "0001-02-15"])
- Differentiation:
.. ipython:: python
- da.differentiate('time')
+ da.differentiate("time")
- Serialization:
.. ipython:: python
- da.to_netcdf('example-no-leap.nc')
- xr.open_dataset('example-no-leap.nc')
+ da.to_netcdf("example-no-leap.nc")
+ xr.open_dataset("example-no-leap.nc")
.. ipython:: python
:suppress:
import os
- os.remove('example-no-leap.nc')
+
+ os.remove("example-no-leap.nc")
- And resampling along the time dimension for data indexed by a :py:class:`~xarray.CFTimeIndex`:
.. ipython:: python
- da.resample(time='81T', closed='right', label='right', base=3).mean()
+ da.resample(time="81T", closed="right", label="right", base=3).mean()
.. note::
@@ -168,13 +192,13 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:
method:
.. ipython:: python
- :okwarning:
+ :okwarning:
- modern_times = xr.cftime_range('2000', periods=24, freq='MS', calendar='noleap')
- da = xr.DataArray(range(24), [('time', modern_times)])
+ modern_times = xr.cftime_range("2000", periods=24, freq="MS", calendar="noleap")
+ da = xr.DataArray(range(24), [("time", modern_times)])
da
- datetimeindex = da.indexes['time'].to_datetimeindex()
- da['time'] = datetimeindex
+ datetimeindex = da.indexes["time"].to_datetimeindex()
+ da["time"] = datetimeindex
However in this case one should use caution to only perform operations which
do not depend on differences between dates (e.g. differentiation,
@@ -182,6 +206,6 @@ For data indexed by a :py:class:`~xarray.CFTimeIndex` xarray currently supports:
and silent errors due to the difference in calendar types between the dates
encoded in your data and the dates stored in memory.
-.. _Timestamp-valid range: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#timestamp-limitations
-.. _ISO 8601-format: https://en.wikipedia.org/wiki/ISO_8601
-.. _partial datetime string indexing: https://pandas.pydata.org/pandas-docs/stable/timeseries.html#partial-string-indexing
+.. _Timestamp-valid range: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#timestamp-limitations
+.. _ISO 8601 standard: https://en.wikipedia.org/wiki/ISO_8601
+.. _partial datetime string indexing: https://pandas.pydata.org/pandas-docs/stable/user_guide/timeseries.html#partial-string-indexing
diff --git a/doc/whats-new.rst b/doc/whats-new.rst
index c9a2ca2e41c..063e7cd9b64 100644
--- a/doc/whats-new.rst
+++ b/doc/whats-new.rst
@@ -4,28 +4,436 @@ What's New
==========
.. ipython:: python
- :suppress:
+ :suppress:
import numpy as np
import pandas as pd
import xarray as xray
import xarray
import xarray as xr
+
np.random.seed(123456)
+
+.. _whats-new.0.16.3:
+
+v0.16.3 (unreleased)
+--------------------
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+- xarray no longer supports python 3.6
+
+ The minimum versions of some other dependencies were changed:
+ ============ ====== ====
+ Package Old New
+ ============ ====== ====
+ Python 3.6 3.7
+ setuptools 38.4 40.4
+ ============ ====== ====
+
+ (:issue:`4688`, :pull:`4720`)
+ By `Justus Magin `_.
+- As a result of :pull:`4684` the default units encoding for
+ datetime-like values (``np.datetime64[ns]`` or ``cftime.datetime``) will now
+ always be set such that ``int64`` values can be used. In the past, no units
+ finer than "seconds" were chosen, which would sometimes mean that ``float64``
+ values were required, which would lead to inaccurate I/O round-trips.
+- remove deprecated ``autoclose`` kwargs from :py:func:`open_dataset` (:pull: `4725`).
+ By `Aureliana Barghini `_
+
+
+New Features
+~~~~~~~~~~~~
+- Performance improvement when constructing DataArrays. Significantly speeds up repr for Datasets with large number of variables.
+ By `Deepak Cherian `_
+
+Bug fixes
+~~~~~~~~~
+- :py:meth:`DataArray.resample` and :py:meth:`Dataset.resample` do not trigger computations anymore if :py:meth:`Dataset.weighted` or :py:meth:`DataArray.weighted` are applied (:issue:`4625`, :pull:`4668`). By `Julius Busecke `_.
+- :py:func:`merge` with ``combine_attrs='override'`` makes a copy of the attrs (:issue:`4627`).
+- By default, when possible, xarray will now always use values of type ``int64`` when encoding
+ and decoding ``numpy.datetime64[ns]`` datetimes. This ensures that maximum
+ precision and accuracy are maintained in the round-tripping process
+ (:issue:`4045`, :pull:`4684`). It also enables encoding and decoding standard calendar
+ dates with time units of nanoseconds (:pull:`4400`). By `Spencer Clark
+ `_ and `Mark Harfouche `_.
+- :py:meth:`DataArray.astype`, :py:meth:`Dataset.astype` and :py:meth:`Variable.astype` support
+ the ``order`` and ``subok`` parameters again. This fixes a regression introduced in version 0.16.1
+ (:issue:`4644`, :pull:`4683`).
+ By `Richard Kleijn `_ .
+- Remove dictionary unpacking when using ``.loc`` to avoid collision with ``.sel`` parameters (:pull:`4695`).
+ By `Anderson Banihirwe `_
+- Fix the legend created by :py:meth:`Dataset.plot.scatter` (:issue:`4641`, :pull:`4723`).
+ By `Justus Magin `_.
+- Fix a crash in orthogonal indexing on geographic coordinates with ``engine='cfgrib'`` (:issue:`4733` :pull:`4737`).
+ By `Alessandro Amici `_
+- Coordinates with dtype ``str`` or ``bytes`` now retain their dtype on many operations,
+ e.g. ``reindex``, ``align``, ``concat``, ``assign``, previously they were cast to an object dtype
+ (:issue:`2658` and :issue:`4543`) by `Mathias Hauser `_.
+- Limit number of data rows when printing large datasets. (:issue:`4736`, :pull:`4750`). By `Jimmy Westling `_.
+- Add ``missing_dims`` parameter to transpose (:issue:`4647`, :pull:`4767`). By `Daniel Mesejo `_.
+- Resolve intervals before appending other metadata to labels when plotting (:issue:`4322`, :pull:`4794`).
+ By `Justus Magin `_.
+- Fix regression when decoding a variable with a ``scale_factor`` and ``add_offset`` given
+ as a list of length one (:issue:`4631`) by `Mathias Hauser `_.
+- Expand user directory paths (e.g. ``~/``) in :py:func:`open_mfdataset` and
+ :py:meth:`Dataset.to_zarr` (:issue:`4783`, :pull:`4795`).
+ By `Julien Seguinot `_.
+
+Documentation
+~~~~~~~~~~~~~
+- add information about requirements for accessor classes (:issue:`2788`, :pull:`4657`).
+ By `Justus Magin `_.
+- start a list of external I/O integrating with ``xarray`` (:issue:`683`, :pull:`4566`).
+ By `Justus Magin `_.
+- add concat examples and improve combining documentation (:issue:`4620`, :pull:`4645`).
+ By `Ray Bell `_ and
+ `Justus Magin `_.
+
+Internal Changes
+~~~~~~~~~~~~~~~~
+- Speed up of the continuous integration tests on azure.
+
+ - Switched to mamba and use matplotlib-base for a faster installation of all dependencies (:pull:`4672`).
+ - Use ``pytest.mark.skip`` instead of ``pytest.mark.xfail`` for some tests that can currently not
+ succeed (:pull:`4685`).
+ - Run the tests in parallel using pytest-xdist (:pull:`4694`).
+
+ By `Justus Magin `_ and `Mathias Hauser `_.
+
+- Replace all usages of ``assert x.identical(y)`` with ``assert_identical(x, y)``
+ for clearer error messages.
+ (:pull:`4752`);
+ By `Maximilian Roos `_.
+- Speed up attribute style access (e.g. ``ds.somevar`` instead of ``ds["somevar"]``) and tab completion
+ in ipython (:issue:`4741`, :pull:`4742`). By `Richard Kleijn `_.
+
+.. _whats-new.0.16.2:
+
+v0.16.2 (30 Nov 2020)
+---------------------
+
+This release brings the ability to write to limited regions of ``zarr`` files, open zarr files with :py:func:`open_dataset` and :py:func:`open_mfdataset`, increased support for propagating ``attrs`` using the ``keep_attrs`` flag, as well as numerous bugfixes and documentation improvements.
+
+Many thanks to the 31 contributors who contributed to this release:
+Aaron Spring, Akio Taniguchi, Aleksandar Jelenak, alexamici, Alexandre Poux, Anderson Banihirwe, Andrew Pauling, Ashwin Vishnu, aurghs, Brian Ward, Caleb, crusaderky, Dan Nowacki, darikg, David Brochart, David Huard, Deepak Cherian, Dion Häfner, Gerardo Rivera, Gerrit Holl, Illviljan, inakleinbottle, Jacob Tomlinson, James A. Bednar, jenssss, Joe Hamman, johnomotani, Joris Van den Bossche, Julia Kent, Julius Busecke, Kai Mühlbauer, keewis, Keisuke Fujii, Kyle Cranmer, Luke Volpatti, Mathias Hauser, Maximilian Roos, Michaël Defferrard, Michal Baumgartner, Nick R. Papior, Pascal Bourgault, Peter Hausamann, PGijsbers, Ray Bell, Romain Martinez, rpgoldman, Russell Manser, Sahid Velji, Samnan Rahee, Sander, Spencer Clark, Stephan Hoyer, Thomas Zilio, Tobias Kölling, Tom Augspurger, Wei Ji, Yash Saboo, Zeb Nicholls,
+
+Deprecations
+~~~~~~~~~~~~
+
+- :py:attr:`~core.accessor_dt.DatetimeAccessor.weekofyear` and :py:attr:`~core.accessor_dt.DatetimeAccessor.week`
+ have been deprecated. Use ``DataArray.dt.isocalendar().week``
+ instead (:pull:`4534`). By `Mathias Hauser `_,
+ `Maximilian Roos `_, and `Spencer Clark `_.
+- :py:attr:`DataArray.rolling` and :py:attr:`Dataset.rolling` no longer support passing ``keep_attrs``
+ via its constructor. Pass ``keep_attrs`` via the applied function, i.e. use
+ ``ds.rolling(...).mean(keep_attrs=False)`` instead of ``ds.rolling(..., keep_attrs=False).mean()``
+ Rolling operations now keep their attributes per default (:pull:`4510`).
+ By `Mathias Hauser `_.
+
+New Features
+~~~~~~~~~~~~
+
+- :py:func:`open_dataset` and :py:func:`open_mfdataset`
+ now works with ``engine="zarr"`` (:issue:`3668`, :pull:`4003`, :pull:`4187`).
+ By `Miguel Jimenez `_ and `Wei Ji Leong `_.
+- Unary & binary operations follow the ``keep_attrs`` flag (:issue:`3490`, :issue:`4065`, :issue:`3433`, :issue:`3595`, :pull:`4195`).
+ By `Deepak Cherian `_.
+- Added :py:meth:`~core.accessor_dt.DatetimeAccessor.isocalendar()` that returns a Dataset
+ with year, week, and weekday calculated according to the ISO 8601 calendar. Requires
+ pandas version 1.1.0 or greater (:pull:`4534`). By `Mathias Hauser `_,
+ `Maximilian Roos `_, and `Spencer Clark `_.
+- :py:meth:`Dataset.to_zarr` now supports a ``region`` keyword for writing to
+ limited regions of existing Zarr stores (:pull:`4035`).
+ See :ref:`io.zarr.appending` for full details.
+ By `Stephan Hoyer `_.
+- Added typehints in :py:func:`align` to reflect that the same type received in ``objects`` arg will be returned (:pull:`4522`).
+ By `Michal Baumgartner `_.
+- :py:meth:`Dataset.weighted` and :py:meth:`DataArray.weighted` are now executing value checks lazily if weights are provided as dask arrays (:issue:`4541`, :pull:`4559`).
+ By `Julius Busecke `_.
+- Added the ``keep_attrs`` keyword to ``rolling_exp.mean()``; it now keeps attributes
+ per default. By `Mathias Hauser `_ (:pull:`4592`).
+- Added ``freq`` as property to :py:class:`CFTimeIndex` and into the
+ ``CFTimeIndex.repr``. (:issue:`2416`, :pull:`4597`)
+ By `Aaron Spring `_.
+
+Bug fixes
+~~~~~~~~~
+
+- Fix bug where reference times without padded years (e.g. ``since 1-1-1``) would lose their units when
+ being passed by ``encode_cf_datetime`` (:issue:`4422`, :pull:`4506`). Such units are ambiguous
+ about which digit represents the years (is it YMD or DMY?). Now, if such formatting is encountered,
+ it is assumed that the first digit is the years, they are padded appropriately (to e.g. ``since 0001-1-1``)
+ and a warning that this assumption is being made is issued. Previously, without ``cftime``, such times
+ would be silently parsed incorrectly (at least based on the CF conventions) e.g. "since 1-1-1" would
+ be parsed (via ``pandas`` and ``dateutil``) to ``since 2001-1-1``.
+ By `Zeb Nicholls `_.
+- Fix :py:meth:`DataArray.plot.step`. By `Deepak Cherian `_.
+- Fix bug where reading a scalar value from a NetCDF file opened with the ``h5netcdf`` backend would raise a ``ValueError`` when ``decode_cf=True`` (:issue:`4471`, :pull:`4485`).
+ By `Gerrit Holl `_.
+- Fix bug where datetime64 times are silently changed to incorrect values if they are outside the valid date range for ns precision when provided in some other units (:issue:`4427`, :pull:`4454`).
+ By `Andrew Pauling `_
+- Fix silently overwriting the ``engine`` key when passing :py:func:`open_dataset` a file object
+ to an incompatible netCDF (:issue:`4457`). Now incompatible combinations of files and engines raise
+ an exception instead. By `Alessandro Amici `_.
+- The ``min_count`` argument to :py:meth:`DataArray.sum()` and :py:meth:`DataArray.prod()`
+ is now ignored when not applicable, i.e. when ``skipna=False`` or when ``skipna=None``
+ and the dtype does not have a missing value (:issue:`4352`).
+ By `Mathias Hauser `_.
+- :py:func:`combine_by_coords` now raises an informative error when passing coordinates
+ with differing calendars (:issue:`4495`). By `Mathias Hauser `_.
+- :py:attr:`DataArray.rolling` and :py:attr:`Dataset.rolling` now also keep the attributes and names of of (wrapped)
+ ``DataArray`` objects, previously only the global attributes were retained (:issue:`4497`, :pull:`4510`).
+ By `Mathias Hauser `_.
+- Improve performance where reading small slices from huge dimensions was slower than necessary (:pull:`4560`). By `Dion Häfner `_.
+- Fix bug where ``dask_gufunc_kwargs`` was silently changed in :py:func:`apply_ufunc` (:pull:`4576`). By `Kai Mühlbauer `_.
+
+Documentation
+~~~~~~~~~~~~~
+- document the API not supported with duck arrays (:pull:`4530`).
+ By `Justus Magin `_.
+- Mention the possibility to pass functions to :py:meth:`Dataset.where` or
+ :py:meth:`DataArray.where` in the parameter documentation (:issue:`4223`, :pull:`4613`).
+ By `Justus Magin `_.
+- Update the docstring of :py:class:`DataArray` and :py:class:`Dataset`.
+ (:pull:`4532`);
+ By `Jimmy Westling `_.
+- Raise a more informative error when :py:meth:`DataArray.to_dataframe` is
+ is called on a scalar, (:issue:`4228`);
+ By `Pieter Gijsbers `_.
+- Fix grammar and typos in the :doc:`contributing` guide (:pull:`4545`).
+ By `Sahid Velji `_.
+- Fix grammar and typos in the :doc:`io` guide (:pull:`4553`).
+ By `Sahid Velji `_.
+- Update link to NumPy docstring standard in the :doc:`contributing` guide (:pull:`4558`).
+ By `Sahid Velji `_.
+- Add docstrings to ``isnull`` and ``notnull``, and fix the displayed signature
+ (:issue:`2760`, :pull:`4618`).
+ By `Justus Magin `_.
+
+Internal Changes
+~~~~~~~~~~~~~~~~
+
+- Optional dependencies can be installed along with xarray by specifying
+ extras as ``pip install "xarray[extra]"`` where ``extra`` can be one of ``io``,
+ ``accel``, ``parallel``, ``viz`` and ``complete``. See docs for updated
+ :ref:`installation instructions `.
+ (:issue:`2888`, :pull:`4480`).
+ By `Ashwin Vishnu `_, `Justus Magin
+ `_ and `Mathias Hauser
+ `_.
+- Removed stray spaces that stem from black removing new lines (:pull:`4504`).
+ By `Mathias Hauser `_.
+- Ensure tests are not skipped in the ``py38-all-but-dask`` test environment
+ (:issue:`4509`). By `Mathias Hauser `_.
+- Ignore select numpy warnings around missing values, where xarray handles
+ the values appropriately, (:pull:`4536`);
+ By `Maximilian Roos `_.
+- Replace the internal use of ``pd.Index.__or__`` and ``pd.Index.__and__`` with ``pd.Index.union``
+ and ``pd.Index.intersection`` as they will stop working as set operations in the future
+ (:issue:`4565`). By `Mathias Hauser `_.
+- Add GitHub action for running nightly tests against upstream dependencies (:pull:`4583`).
+ By `Anderson Banihirwe `_.
+- Ensure all figures are closed properly in plot tests (:pull:`4600`).
+ By `Yash Saboo `_, `Nirupam K N
+ `_ and `Mathias Hauser
+ `_.
+
+.. _whats-new.0.16.1:
+
+v0.16.1 (2020-09-20)
+---------------------
+
+This patch release fixes an incompatibility with a recent pandas change, which
+was causing an issue indexing with a ``datetime64``. It also includes
+improvements to ``rolling``, ``to_dataframe``, ``cov`` & ``corr`` methods and
+bug fixes. Our documentation has a number of improvements, including fixing all
+doctests and confirming their accuracy on every commit.
+
+Many thanks to the 36 contributors who contributed to this release:
+
+Aaron Spring, Akio Taniguchi, Aleksandar Jelenak, Alexandre Poux,
+Caleb, Dan Nowacki, Deepak Cherian, Gerardo Rivera, Jacob Tomlinson, James A.
+Bednar, Joe Hamman, Julia Kent, Kai Mühlbauer, Keisuke Fujii, Mathias Hauser,
+Maximilian Roos, Nick R. Papior, Pascal Bourgault, Peter Hausamann, Romain
+Martinez, Russell Manser, Samnan Rahee, Sander, Spencer Clark, Stephan Hoyer,
+Thomas Zilio, Tobias Kölling, Tom Augspurger, alexamici, crusaderky, darikg,
+inakleinbottle, jenssss, johnomotani, keewis, and rpgoldman.
+
+Breaking changes
+~~~~~~~~~~~~~~~~
+
+- :py:meth:`DataArray.astype` and :py:meth:`Dataset.astype` now preserve attributes. Keep the
+ old behavior by passing `keep_attrs=False` (:issue:`2049`, :pull:`4314`).
+ By `Dan Nowacki `_ and `Gabriel Joel Mitchell `_.
+
+New Features
+~~~~~~~~~~~~
+
+- :py:meth:`~xarray.DataArray.rolling` and :py:meth:`~xarray.Dataset.rolling`
+ now accept more than 1 dimension. (:pull:`4219`)
+ By `Keisuke Fujii `_.
+- :py:meth:`~xarray.DataArray.to_dataframe` and :py:meth:`~xarray.Dataset.to_dataframe`
+ now accept a ``dim_order`` parameter allowing to specify the resulting dataframe's
+ dimensions order (:issue:`4331`, :pull:`4333`).
+ By `Thomas Zilio `_.
+- Support multiple outputs in :py:func:`xarray.apply_ufunc` when using
+ ``dask='parallelized'``. (:issue:`1815`, :pull:`4060`).
+ By `Kai Mühlbauer `_.
+- ``min_count`` can be supplied to reductions such as ``.sum`` when specifying
+ multiple dimension to reduce over; (:pull:`4356`).
+ By `Maximilian Roos `_.
+- :py:func:`xarray.cov` and :py:func:`xarray.corr` now handle missing values; (:pull:`4351`).
+ By `Maximilian Roos `_.
+- Add support for parsing datetime strings formatted following the default
+ string representation of cftime objects, i.e. YYYY-MM-DD hh:mm:ss, in
+ partial datetime string indexing, as well as :py:meth:`~xarray.cftime_range`
+ (:issue:`4337`). By `Spencer Clark `_.
+- Build ``CFTimeIndex.__repr__`` explicitly as :py:class:`pandas.Index`. Add ``calendar`` as a new
+ property for :py:class:`CFTimeIndex` and show ``calendar`` and ``length`` in
+ ``CFTimeIndex.__repr__`` (:issue:`2416`, :pull:`4092`)
+ By `Aaron Spring `_.
+- Use a wrapped array's ``_repr_inline_`` method to construct the collapsed ``repr``
+ of :py:class:`DataArray` and :py:class:`Dataset` objects and
+ document the new method in :doc:`internals`. (:pull:`4248`).
+ By `Justus Magin `_.
+- Allow per-variable fill values in most functions. (:pull:`4237`).
+ By `Justus Magin `_.
+- Expose ``use_cftime`` option in :py:func:`~xarray.open_zarr` (:issue:`2886`, :pull:`3229`)
+ By `Samnan Rahee `_ and `Anderson Banihirwe `_.
+
+
+Bug fixes
+~~~~~~~~~
+
+- Fix indexing with datetime64 scalars with pandas 1.1 (:issue:`4283`).
+ By `Stephan Hoyer `_ and
+ `Justus Magin `_.
+- Variables which are chunked using dask only along some dimensions can be chunked while storing with zarr along previously
+ unchunked dimensions (:pull:`4312`) By `Tobias Kölling `_.
+- Fixed a bug in backend caused by basic installation of Dask (:issue:`4164`, :pull:`4318`)
+ `Sam Morley `_.
+- Fixed a few bugs with :py:meth:`Dataset.polyfit` when encountering deficient matrix ranks (:issue:`4190`, :pull:`4193`). By `Pascal Bourgault `_.
+- Fixed inconsistencies between docstring and functionality for :py:meth:`DataArray.str.get`
+ and :py:meth:`DataArray.str.wrap` (:issue:`4334`). By `Mathias Hauser `_.
+- Fixed overflow issue causing incorrect results in computing means of :py:class:`cftime.datetime`
+ arrays (:issue:`4341`). By `Spencer Clark `_.
+- Fixed :py:meth:`Dataset.coarsen`, :py:meth:`DataArray.coarsen` dropping attributes on original object (:issue:`4120`, :pull:`4360`). By `Julia Kent `_.
+- fix the signature of the plot methods. (:pull:`4359`) By `Justus Magin `_.
+- Fix :py:func:`xarray.apply_ufunc` with ``vectorize=True`` and ``exclude_dims`` (:issue:`3890`).
+ By `Mathias Hauser `_.
+- Fix `KeyError` when doing linear interpolation to an nd `DataArray`
+ that contains NaNs (:pull:`4233`).
+ By `Jens Svensmark `_
+- Fix incorrect legend labels for :py:meth:`Dataset.plot.scatter` (:issue:`4126`).
+ By `Peter Hausamann `_.
+- Fix ``dask.optimize`` on ``DataArray`` producing an invalid Dask task graph (:issue:`3698`)
+ By `Tom Augspurger `_
+- Fix ``pip install .`` when no ``.git`` directory exists; namely when the xarray source
+ directory has been rsync'ed by PyCharm Professional for a remote deployment over SSH.
+ By `Guido Imperiale `_
+- Preserve dimension and coordinate order during :py:func:`xarray.concat` (:issue:`2811`, :issue:`4072`, :pull:`4419`).
+ By `Kai Mühlbauer `_.
+- Avoid relying on :py:class:`set` objects for the ordering of the coordinates (:pull:`4409`)
+ By `Justus Magin `_.
+
+Documentation
+~~~~~~~~~~~~~
+
+- Update the docstring of :py:meth:`DataArray.copy` to remove incorrect mention of 'dataset' (:issue:`3606`)
+ By `Sander van Rijn `_.
+- Removed skipna argument from :py:meth:`DataArray.count`, :py:meth:`DataArray.any`, :py:meth:`DataArray.all`. (:issue:`755`)
+ By `Sander van Rijn `_
+- Update the contributing guide to use merges instead of rebasing and state
+ that we squash-merge. (:pull:`4355`). By `Justus Magin `_.
+- Make sure the examples from the docstrings actually work (:pull:`4408`).
+ By `Justus Magin `_.
+- Updated Vectorized Indexing to a clearer example.
+ By `Maximilian Roos `_
+
+Internal Changes
+~~~~~~~~~~~~~~~~
+
+- Fixed all doctests and enabled their running in CI.
+ By `Justus Magin `_.
+- Relaxed the :ref:`mindeps_policy` to support:
+
+ - all versions of setuptools released in the last 42 months (but no older than 38.4)
+ - all versions of dask and dask.distributed released in the last 12 months (but no
+ older than 2.9)
+ - all versions of other packages released in the last 12 months
+
+ All are up from 6 months (:issue:`4295`)
+ `Guido Imperiale `_.
+- Use :py:func:`dask.array.apply_gufunc ` instead of
+ :py:func:`dask.array.blockwise` in :py:func:`xarray.apply_ufunc` when using
+ ``dask='parallelized'``. (:pull:`4060`, :pull:`4391`, :pull:`4392`)
+ By `Kai Mühlbauer `_.
+- Align ``mypy`` versions to ``0.782`` across ``requirements`` and
+ ``.pre-commit-config.yml`` files. (:pull:`4390`)
+ By `Maximilian Roos `_
+- Only load resource files when running inside a Jupyter Notebook
+ (:issue:`4294`) By `Guido Imperiale `_
+- Silenced most ``numpy`` warnings such as ``Mean of empty slice``. (:pull:`4369`)
+ By `Maximilian Roos `_
+- Enable type checking for :py:func:`concat` (:issue:`4238`)
+ By `Mathias Hauser `_.
+- Updated plot functions for matplotlib version 3.3 and silenced warnings in the
+ plot tests (:pull:`4365`). By `Mathias Hauser `_.
+- Versions in ``pre-commit.yaml`` are now pinned, to reduce the chances of
+ conflicting versions. (:pull:`4388`)
+ By `Maximilian Roos `_
+
+
+
.. _whats-new.0.16.0:
-v0.16.0 (unreleased)
+v0.16.0 (2020-07-11)
---------------------
+This release adds `xarray.cov` & `xarray.corr` for covariance & correlation
+respectively; the `idxmax` & `idxmin` methods, the `polyfit` method &
+`xarray.polyval` for fitting polynomials, as well as a number of documentation
+improvements, other features, and bug fixes. Many thanks to all 44 contributors
+who contributed to this release:
+
+Akio Taniguchi, Andrew Williams, Aurélien Ponte, Benoit Bovy, Dave Cole, David
+Brochart, Deepak Cherian, Elliott Sales de Andrade, Etienne Combrisson, Hossein
+Madadi, Huite, Joe Hamman, Kai Mühlbauer, Keisuke Fujii, Maik Riechert, Marek
+Jacob, Mathias Hauser, Matthieu Ancellin, Maximilian Roos, Noah D Brenowitz,
+Oriol Abril, Pascal Bourgault, Phillip Butcher, Prajjwal Nijhara, Ray Bell, Ryan
+Abernathey, Ryan May, Spencer Clark, Spencer Hill, Srijan Saurav, Stephan Hoyer,
+Taher Chegini, Todd, Tom Nicholas, Yohai Bar Sinai, Yunus Sevinchan,
+arabidopsis, aurghs, clausmichele, dmey, johnomotani, keewis, raphael dussin,
+risebell
+
Breaking changes
~~~~~~~~~~~~~~~~
+
+- Minimum supported versions for the following packages have changed: ``dask >=2.9``,
+ ``distributed>=2.9``.
+ By `Deepak Cherian `_
+- ``groupby`` operations will restore coord dimension order. Pass ``restore_coord_dims=False``
+ to revert to previous behavior.
+- :meth:`DataArray.transpose` will now transpose coordinates by default.
+ Pass ``transpose_coords=False`` to revert to previous behaviour.
+ By `Maximilian Roos `_
- Alternate draw styles for :py:meth:`plot.step` must be passed using the
``drawstyle`` (or ``ds``) keyword argument, instead of the ``linestyle`` (or
``ls``) keyword argument, in line with the `upstream change in Matplotlib
`_.
(:pull:`3274`)
By `Elliott Sales de Andrade `_
+- The old ``auto_combine`` function has now been removed in
+ favour of the :py:func:`combine_by_coords` and
+ :py:func:`combine_nested` functions. This also means that
+ the default behaviour of :py:func:`open_mfdataset` has changed to use
+ ``combine='by_coords'`` as the default argument value. (:issue:`2616`, :pull:`3926`)
+ By `Tom Nicholas `_.
+- The ``DataArray`` and ``Variable`` HTML reprs now expand the data section by
+ default (:issue:`4176`)
+ By `Stephan Hoyer `_.
- New deprecations (behavior will be changed in xarray 0.17):
- ``dim`` argument to :py:meth:`DataArray.integrate` is being deprecated in
@@ -35,25 +443,47 @@ Breaking changes
New Features
~~~~~~~~~~~~
-- Added :py:meth:`DataArray.polyfit` and :py:func:`xarray.polyval` for fitting polynomials. (:issue:`3349`)
+- :py:meth:`DataArray.argmin` and :py:meth:`DataArray.argmax` now support
+ sequences of 'dim' arguments, and if a sequence is passed return a dict
+ (which can be passed to :py:meth:`DataArray.isel` to get the value of the minimum) of
+ the indices for each dimension of the minimum or maximum of a DataArray.
+ (:pull:`3936`)
+ By `John Omotani `_, thanks to `Keisuke Fujii
+ `_ for work in :pull:`1469`.
+- Added :py:func:`xarray.cov` and :py:func:`xarray.corr` (:issue:`3784`, :pull:`3550`, :pull:`4089`).
+ By `Andrew Williams `_ and `Robin Beer `_.
+- Implement :py:meth:`DataArray.idxmax`, :py:meth:`DataArray.idxmin`,
+ :py:meth:`Dataset.idxmax`, :py:meth:`Dataset.idxmin`. (:issue:`60`, :pull:`3871`)
+ By `Todd Jennings `_
+- Added :py:meth:`DataArray.polyfit` and :py:func:`xarray.polyval` for fitting
+ polynomials. (:issue:`3349`, :pull:`3733`, :pull:`4099`)
+ By `Pascal Bourgault `_.
+- Added :py:meth:`xarray.infer_freq` for extending frequency inferring to CFTime indexes and data (:pull:`4033`).
By `Pascal Bourgault `_.
+- ``chunks='auto'`` is now supported in the ``chunks`` argument of
+ :py:meth:`Dataset.chunk`. (:issue:`4055`)
+ By `Andrew Williams `_
- Control over attributes of result in :py:func:`merge`, :py:func:`concat`,
:py:func:`combine_by_coords` and :py:func:`combine_nested` using
combine_attrs keyword argument. (:issue:`3865`, :pull:`3877`)
By `John Omotani `_
-- 'missing_dims' argument to :py:meth:`Dataset.isel`,
- `:py:meth:`DataArray.isel` and :py:meth:`Variable.isel` to allow replacing
+- `missing_dims` argument to :py:meth:`Dataset.isel`,
+ :py:meth:`DataArray.isel` and :py:meth:`Variable.isel` to allow replacing
the exception when a dimension passed to ``isel`` is not present with a
warning, or just ignore the dimension. (:issue:`3866`, :pull:`3923`)
By `John Omotani `_
-- Limited the length of array items with long string reprs to a
- reasonable width (:pull:`3900`)
- By `Maximilian Roos `_
-- Implement :py:meth:`DataArray.idxmax`, :py:meth:`DataArray.idxmin`,
- :py:meth:`Dataset.idxmax`, :py:meth:`Dataset.idxmin`. (:issue:`60`, :pull:`3871`)
- By `Todd Jennings `_
+- Support dask handling for :py:meth:`DataArray.idxmax`, :py:meth:`DataArray.idxmin`,
+ :py:meth:`Dataset.idxmax`, :py:meth:`Dataset.idxmin`. (:pull:`3922`, :pull:`4135`)
+ By `Kai Mühlbauer `_ and `Pascal Bourgault `_.
+- More support for unit aware arrays with pint (:pull:`3643`, :pull:`3975`, :pull:`4163`)
+ By `Justus Magin `_.
+- Support overriding existing variables in ``to_zarr()`` with ``mode='a'`` even
+ without ``append_dim``, as long as dimension sizes do not change.
+ By `Stephan Hoyer `_.
- Allow plotting of boolean arrays. (:pull:`3766`)
By `Marek Jacob `_
+- Enable using MultiIndex levels as coordinates in 1D and 2D plots (:issue:`3927`).
+ By `Mathias Hauser `_.
- A ``days_in_month`` accessor for :py:class:`xarray.CFTimeIndex`, analogous to
the ``days_in_month`` accessor for a :py:class:`pandas.DatetimeIndex`, which
returns the days in the month each datetime in the index. Now days in month
@@ -61,16 +491,63 @@ New Features
the :py:class:`~core.accessor_dt.DatetimeAccessor` (:pull:`3935`). This
feature requires cftime version 1.1.0 or greater. By
`Spencer Clark `_.
+- For the netCDF3 backend, added dtype coercions for unsigned integer types.
+ (:issue:`4014`, :pull:`4018`)
+ By `Yunus Sevinchan `_
+- :py:meth:`map_blocks` now accepts a ``template`` kwarg. This allows use cases
+ where the result of a computation could not be inferred automatically.
+ By `Deepak Cherian `_
+- :py:meth:`map_blocks` can now handle dask-backed xarray objects in ``args``. (:pull:`3818`)
+ By `Deepak Cherian `_
+- Add keyword ``decode_timedelta`` to :py:func:`xarray.open_dataset`,
+ (:py:func:`xarray.open_dataarray`, :py:func:`xarray.open_dataarray`,
+ :py:func:`xarray.decode_cf`) that allows to disable/enable the decoding of timedeltas
+ independently of time decoding (:issue:`1621`)
+ `Aureliana Barghini `_
+
+Enhancements
+~~~~~~~~~~~~
+- Performance improvement of :py:meth:`DataArray.interp` and :py:func:`Dataset.interp`
+ We performs independant interpolation sequentially rather than interpolating in
+ one large multidimensional space. (:issue:`2223`)
+ By `Keisuke Fujii `_.
+- :py:meth:`DataArray.interp` now support interpolations over chunked dimensions (:pull:`4155`). By `Alexandre Poux `_.
+- Major performance improvement for :py:meth:`Dataset.from_dataframe` when the
+ dataframe has a MultiIndex (:pull:`4184`).
+ By `Stephan Hoyer `_.
+ - :py:meth:`DataArray.reset_index` and :py:meth:`Dataset.reset_index` now keep
+ coordinate attributes (:pull:`4103`). By `Oriol Abril `_.
+- Axes kwargs such as ``facecolor`` can now be passed to :py:meth:`DataArray.plot` in ``subplot_kws``.
+ This works for both single axes plots and FacetGrid plots.
+ By `Raphael Dussin `_.
+- Array items with long string reprs are now limited to a
+ reasonable width (:pull:`3900`)
+ By `Maximilian Roos `_
+- Large arrays whose numpy reprs would have greater than 40 lines are now
+ limited to a reasonable length.
+ (:pull:`3905`)
+ By `Maximilian Roos `_
Bug fixes
~~~~~~~~~
-- Fix wrong order in converting a ``pd.Series`` with a MultiIndex to ``DataArray``. (:issue:`3951`)
+- Fix errors combining attrs in :py:func:`open_mfdataset` (:issue:`4009`, :pull:`4173`)
+ By `John Omotani `_
+- If groupby receives a ``DataArray`` with name=None, assign a default name (:issue:`158`)
+ By `Phil Butcher `_.
+- Support dark mode in VS code (:issue:`4024`)
By `Keisuke Fujii `_.
+- Fix bug when converting multiindexed Pandas objects to sparse xarray objects. (:issue:`4019`)
+ By `Deepak Cherian `_.
+- ``ValueError`` is raised when ``fill_value`` is not a scalar in :py:meth:`full_like`. (:issue:`3977`)
+ By `Huite Bootsma `_.
+- Fix wrong order in converting a ``pd.Series`` with a MultiIndex to ``DataArray``.
+ (:issue:`3951`, :issue:`4186`)
+ By `Keisuke Fujii `_ and `Stephan Hoyer `_.
- Fix renaming of coords when one or more stacked coords is not in
sorted order during stack+groupby+apply operations. (:issue:`3287`,
:pull:`3906`) By `Spencer Hill `_
- Fix a regression where deleting a coordinate from a copied :py:class:`DataArray`
- can affect the original :py:class:`Dataarray`. (:issue:`3899`, :pull:`3871`)
+ can affect the original :py:class:`DataArray`. (:issue:`3899`, :pull:`3871`)
By `Todd Jennings `_
- Fix :py:class:`~xarray.plot.FacetGrid` plots with a single contour. (:issue:`3569`, :pull:`3915`).
By `Deepak Cherian `_
@@ -78,15 +555,30 @@ Bug fixes
By `Deepak Cherian `_
- Fix :py:class:`~xarray.plot.FacetGrid` when ``vmin == vmax``. (:issue:`3734`)
By `Deepak Cherian `_
+- Fix plotting when ``levels`` is a scalar and ``norm`` is provided. (:issue:`3735`)
+ By `Deepak Cherian `_
- Fix bug where plotting line plots with 2D coordinates depended on dimension
order. (:issue:`3933`)
By `Tom Nicholas `_.
- Fix ``RasterioDeprecationWarning`` when using a ``vrt`` in ``open_rasterio``. (:issue:`3964`)
By `Taher Chegini `_.
+- Fix ``AttributeError`` on displaying a :py:class:`Variable`
+ in a notebook context. (:issue:`3972`, :pull:`3973`)
+ By `Ian Castleden `_.
- Fix bug causing :py:meth:`DataArray.interpolate_na` to always drop attributes,
and added `keep_attrs` argument. (:issue:`3968`)
By `Tom Nicholas `_.
-
+- Fix bug in time parsing failing to fall back to cftime. This was causing time
+ variables with a time unit of `'msecs'` to fail to parse. (:pull:`3998`)
+ By `Ryan May `_.
+- Fix weighted mean when passing boolean weights (:issue:`4074`).
+ By `Mathias Hauser `_.
+- Fix html repr in untrusted notebooks: fallback to plain text repr. (:pull:`4053`)
+ By `Benoit Bovy `_.
+- Fix :py:meth:`DataArray.to_unstacked_dataset` for single-dimension variables. (:issue:`4049`)
+ By `Deepak Cherian `_
+- Fix :py:func:`open_rasterio` for ``WarpedVRT`` with specified ``src_crs``. (:pull:`4104`)
+ By `Dave Cole `_.
Documentation
~~~~~~~~~~~~~
@@ -108,18 +600,33 @@ Documentation
of ``kwargs`` in :py:meth:`Dataset.interp` and :py:meth:`DataArray.interp`
for 1-d and n-d interpolation (:pull:`3956`).
By `Matthias Riße `_.
+- Apply ``black`` to all the code in the documentation (:pull:`4012`)
+ By `Justus Magin `_.
+- Narrative documentation now describes :py:meth:`map_blocks`: :ref:`dask.automatic-parallelization`.
+ By `Deepak Cherian