Skip to content

Releases: google/weather-tools

v0.3.2

27 Jun 16:53
a6a9c06
Compare
Choose a tag to compare

weather-mv is much faster now & equipped with Cloud-Optimized Geotiffs (GOGs) ingestion. weather-dl is enhanced to support MARS syntax in JSON config files and restriction for max-number of workers.

We're happy to welcome @deepgabani8 to the weather-tools dev team !

Current Status

weather-dl: Fixes and parser system improvements

  • Fixed error while parsing new-line separated date-values.
  • JSON config files now support MARS syntax.
  • New syntax supported: now, users can specify MARS range syntax in reverse orders as well (e.g. 2020-01-01/to/2018-01-01/by/-1).
  • Prevent exhaustion of quotas: Based on current approach for the downloader, we've capped the max number of workers to N i.e. possible simultaneous requests + fudge factor.

weather-mv: Performance improvements and support for COGs ingestion

  • Substantial performance improvement !
  • Added flag to control in-memory copying of dataset. By default the dataset is opened in-memory, the user can restrict it by passing the --disable_in_memory_copy flag.
  • Added validation to alert the user earlier that the BigQuery table and temp location (cloud bucket) need to be in the same region. Users can skip this validation by passing the -s, --skip-region-validation flag.
  • Added support for ingestion of COGs into BigQuery.
  • Updated doc (README.md) of the tool to remove duplicate flags from sample examples.

General

  • Fixed typo in contribution guide (CONTRIBUTING.md).

What's Changed

New Contributors

Full Changelog: v0.3.1...v0.3.2

v0.3.1

29 Apr 19:07
c2cb9ec
Compare
Choose a tag to compare

Improvements to the weather-dl parser.

  • Fixed a bug where numbers with leading zeros were not parsed (useful for date ranges)
  • Correct additional issue for singleton partition values (e.g. get only one day of every month)
  • New syntax added: now, users can specify day=all to get all the days in a month.

What's Changed

  • New Syntax for download configs: day=all by @alxmrs in #150
  • Proper handling of singleton partition dimensions. by @alxmrs in #151
  • Incrementing weather-dl version to cover recent parser changes. by @alxmrs in #153

Full Changelog: v0.3.0...v0.3.1

v0.3.0

24 Apr 19:40
c574e54
Compare
Choose a tag to compare

The weather splitter has a new API that allows for partitioning weather data by any dimension (we intentionally exclude lat/lngs). weather-dl Now has a simpler, more pythonic interface for expressing target paths. The weather-mv tool now supports dry runs and BigQuery geopoints.

We're happy to welcome @mahrsee1997 and @ksic8 to the weather-tools dev team!

Current Status

weather-dl: Fixes and DSL usability improvements

  • Specifying templates is much simpler. Only target_path is needed, and we fully support python string formatting syntax.
  • A significant error was fixed, and now downloads have better skipping and retry logic.
  • Log ergonomics were improved by adding timestamps and removing needless warnings (thanks, @pbattaglia!).
  • Internal code refactors were included to improve maintainability.
  • Data source clients (now, only from ECMWF) includes important license information regarding terms of data use.

weather-mv: Schema & usage improvements

  • The default schemas were improved to include BigQuery Geography-type columns. Now, lat/lngs will be represented as POINTs.
  • The weather mover now has dry runs! Users will be able to preview their data ingestion into BigQuery before making use of infrastructure.

weather-sp: Flexible splits

  • A new version of the splitter was introduce to allow for flexible splits of weather data: Now, you can divide Grib and NetCDF data by any dimension except latitude and longitude (great work, @uhager!).

General

  • Pip install instructions include debugging advice for long installs.
  • We've removed open meetings from our contributing guide due to low attendance

What's Changed

New Contributors

Full Changelog: v0.2.2...v0.3.0

v0.2.2

29 Mar 00:59
c7a6c4f
Compare
Choose a tag to compare

A re-release of v0.2.1.

v0.2.1

29 Mar 00:12
c7a6c4f
Compare
Choose a tag to compare

Improvements and bugfixes for all weather tools. weather-dl is much faster & more robust. weather-mv now uses a pluggable infrastructure, which makes iterations faster. weather-sp is mid transition to arbitrary splits.

Thanks to our new OSS contributors, @pranay101 and @pbattaglia!

Current Status

weather-dl: Major fixes

  • This release introduces a fix to #98, which makes the downloader faster and more robust. With this change, there is no need to override the autoscaling algorithm – so, it now has less moving parts.
  • Uploads have better retry logic. Users should experience less crashes from network errors in the pipeline.
  • The downloader structure has been refactored to be more testable.
  • Examples of JSON configs were added.
  • Address a critical NameError bug that occurred during a refactor.

weather-mv: Refactor

  • The mover has been refactored to use a pluggable infrastructure. This makes it easier to develop local runs, and to write weather data to other sources besides BQ.

weather-sp: Skipping logic

  • A non-API changing feature has been added to the splitter: now, already splitted data will be skipped. Users can override this feature with -f,--force.
  • The documentation for the splitter has been improved.

General

The release process now produces smaller binaries (we're now ignoring test data).

What's Changed

  • Added example JSON config by @pranay101 in #52
  • Refactored weather-mv to work with pluggable Data Sinks. by @alxmrs in #101
  • Update weather-sp's templating system to allow users to specify level and shortname. by @alxmrs in #105
  • Upload to cloud is robust to socket timeout errors. by @alxmrs in #110
  • Fix wrong output file example in weather-sp readme. by @uhager in #111
  • Added skipping logic to weather-sp by @alxmrs in #108
  • Lower default num-requests for MARS to make it more robust. by @alxmrs in #113
  • New data-oriented task distribution strategy. by @alxmrs in #116
  • Downloader refactor: extracted out partitioning; tested pipeline args. by @alxmrs in #117
  • Fix minor bug: main session needs to be saved. by @alxmrs in #120
  • (#123) Fixed beam not being able to access global namespace + minor related bug. by @pbattaglia in #124
  • Shrinking the size of the package release artifacts by @alxmrs in #122

New Contributors

Full Changelog: v0.2.0...v0.2.1

v0.2.0

28 Jan 22:29
2d3df59
Compare
Choose a tag to compare

New version of weather-sp. Fixes and improvements to weather-dl and weather-mv.

Thanks to our volunteer open source contributors and Google 20%ers!

Current State

All three tools are still in their beta and alpha stages. In this release, the stability of weather-mv was especially improved. We've been able to execute streaming ingestion of Grib data into BigQuery. Users of weather-sp will now have greater control to express the output location of split files through a file pattern template.

weather-dl: Minor fixes

  • We fixed GCS timeout issues experienced intermittently.
  • Issue with mandatory partition keys was fixed.

weather-mv: Major fixes for tool stability

  • Grib support added.
  • Row extraction is faster by loading weather data into memory.
  • Log messages were improved.
  • Writes to BigQuery will use the most efficient method (streaming vs file upload).
  • XArray Open step is made generic.
  • Several fixes were introduced.
    • JSON serialization fixes.
    • Dataflow environment will now include get ecCodes installed so we can run cfgrib.
    • Tarballs are smaller / faster to upload to Dataflow (or another Beam runner).
    • BigQuery write errors were fixed.

weather-sp: New version

The splitter now supports flexible specification of output files.

General project improvements

  • Documentation was groomed.
  • Windows developer pathway was documented.
  • Fix in developer scripts (now we can better dev-test different branches of the project) and slow CI.
  • Announced open developer meetings.

What's Changed

  • weather-dl: Fix GCS timeout issues the pipelines intermittently experiences. by @alxmrs in #72
  • Improve grib file processing speed by @pramodg in #74
  • Default behavior is better by @lakshmanok in #77
  • Updating script to use new package name by @CillianFn in #79
  • Better progress logs for weather-mv. by @alxmrs in #82
  • weather-mv fix: Serializing all numpy float and int types to JSON. by @alxmrs in #83
  • Documented windows workaround. by @alxmrs in #85
  • Updated weather-mv install process to setup ecCodes on worker machine. by @alxmrs in #86
  • Groomed documentation by @alxmrs in #88
  • Coercing timedelta to float by @alxmrs in #89
  • weather-mv: Allow users to pass in keyword arguments to xarray.open_dataset by @alxmrs in #87
  • weather-splitter: allow for more flexible output files by @uhager in #65
  • Fix slow test runs by @CillianFn in #92
  • Add check for partition_keys when using append_date_dirs by @CillianFn in #90
  • Exclude test data from tarball by @CillianFn in #93
  • weather-mv – Fixed error writing to BigQuery: Excluding non-coordinate indexes if they don't appear in the Schema by @alxmrs in #95
  • Updating tool versions in prep for release. by @alxmrs in #97
  • Announcing open developer meetings. by @alxmrs in #96

New Contributors

Full Changelog: v0.1.1...v0.2.0

Hotfix for issue found in `weather-mv`.

11 Jan 01:42
e4a00f7
Compare
Choose a tag to compare

What's Changed

  • weather-mv: Fixed variable referenced before assignment. by @alxmrs in #71

Full Changelog: v0.1.0...v0.1.1

Initial Release of weather-tools

11 Jan 01:16
3bae7b3
Compare
Choose a tag to compare

The inaugural release of weather-tools.

Current State

Currently, there are three tools in development: weather-dl, weather-mv, and weather-sp. The first tool is in its beta stage, and the latter two are in alpha. Since this is the start of the project's changelog, I will now quickly summarize the features of each tool:

weather-dl: the Weather Downloader

Weather Downloader ingests weather data to cloud buckets.

  • Downloads weather data from ECMWF through their MARS and CDS APIs.
  • Supports pipeline Dry-runs.
  • Downloads are filesystem agnostic. Data can be ingested to GCS, S3, Azure Blobstore, or a local filesystem.
  • Manifests of downloads are recoded in Firebase.
  • A ConfigParser-based DSL lets users select data to download and control how data is sharded in a general manner.

weather-mv: the Weather Mover

Weather Mover loads weather data from cloud storage into Google BigQuery.

  • Weather data from any filesystem can be uploaded in batch to Google BigQuery.
  • Both NetCDF and Grib data are explicitly supported. Later, any XArray-readable dataset will be supported.
  • All rows include an "import time" to keep track of when the data was ingested.
  • Weather data can be filtered by geographic area or by variable type.
  • Supports inference if BigQuery Schema from parts of the dataset.
  • Streaming pipelines for ingesting real-time data into BigQuery is supported.

weather-sp: the Weather Spitter

Splits NetCDF and Grib files into several files by variable.

  • NetCDF and Grib data splitting is supported.
  • Grib data is split by variable and leveltype.
  • Buckets with mixtures of data types (Grib and NetCDF) can be processed at once.
  • The root of the output path is computed for you; users have control over the parent directory.
  • Dry-runs of splits are supported.

Recent Changes

  • Adding back an example config. by @alxmrs in #30
  • Handle NaNs in data by @pramodg in #33
  • Add utf-8 encoding to file read in setup by @CillianFn in #36
  • Bump urllib3 from 1.25.11 to 1.26.5 in /weather_dl by @dependabot in #37
  • Support un-indexed / single valued coordinates. by @pramodg in #39
  • Set up empty dataset, not table by @lakshmanok in #41
  • Docs fix - typo & stale links by @CillianFn in #44
  • Basic support for grib files. by @pramodg in #40
  • Test example configs by @CillianFn in #56
  • Read the docs config by @CillianFn in #61
  • weather-mv: Now using Streaming Inserts into BQ by @alxmrs in #62
  • weather-mv: Implemented streaming import of data into BigQuery. by @alxmrs in #58
  • Added script to help contributors test each other's updates to weather-tools. by @alxmrs in #63
  • Github action to publish package by @saveriogzz in #31
  • Updated python package name to google-weather-tools. by @alxmrs in #67
  • Updated the standard example configs to use Reanalyses instead of Ensemble Means. by @alxmrs in #66
  • Setting initial versions of each weather-tool. by @alxmrs in #68

New Contributors

Full Changelog: https://github.com/google/weather-tools/commits/v0.1.0