Skip to content

Releases: CouncilDataProject/cdp-backend

Prep for Matter Text Extraction

16 Jun 23:35
d6d71d1
Compare
Choose a tag to compare

Reduced Complexity Pipeline

01 Jun 18:10
703d7f1
Compare
Choose a tag to compare

What's Changed

  • feature/reduce-event-gather-complexity by @evamaxfield in #232
  • Remove Unneccessary Re-Encode If Video Is Already H264 by @whargrove in #234
  • Use requests stream and shutil.copyfileobj to constrain memory usage during resource copy by @whargrove in #236

Full Changelog: v4.0.9...v4.1.0

Use PyPI Faster Whisper Release, Better Word Level Timestamps, Other Minor Bugfixes

25 Mar 23:15
ad09d9c
Compare
Choose a tag to compare

Pull in faster-whisper directly from PyPI, new faster-whisper lib also pulled in the base library's changes to allow word level timestamps (we no longer have to linearly interpolate! Finally, this is an attempt to fix a JSON decode error during config reading.

What's Changed

Full Changelog: v4.0.8...v4.0.9

Try to handle missing www video uris

24 Mar 21:48
ad09d9c
Compare
Choose a tag to compare
Pre-release

Full Changelog: v4.0.9.rc0...v4.0.9.rc1

Faster Whisper from PyPI, Better Word Timestamps, Fix JSON Load

24 Mar 17:03
0b860f5
Compare
Choose a tag to compare

Pull in faster-whisper directly from PyPI, new faster-whisper lib also pulled in the base library's changes to allow word level timestamps (we no longer have to linearly interpolate! Finally, this is an attempt to fix a JSON decode error during config reading.

What's Changed

Full Changelog: v4.0.0...v4.0.9.rc0

Google Speech-to-Text Out, Whisper In

21 Feb 06:52
Compare
Choose a tag to compare

CouncilDataProject cdp-backend v4.0.0

⚠️ ⚠️ This is a major breaking release. Instance maintainers should update the instance with just update-from-cookiecutter. ⚠️ ⚠️

You should re-read through the SETUP/README.md document as there is some new minor configuration required. Specifically the new PERSONAL_ACCESS_TOKEN and Quote Increase request should be the only things that need to be updated for existing instances.

You should also lower how often your CRON event gather runs prior to running just update-from-cookiecutter. All of the instances maintained by the CDP Core Team will be lowered to running only once per day.


Council Data Project is a backend, frontend, and cookiecutter deployment for creating a whole database, storage system, and website, for archiving, exploring, and tracking municipal council action.

This library, cdp-backend maintains the pipelines, database models, infrastructure configuration, etc.

v4.0.0

There are two main changes for this release.

  1. We are swapping out Google Speech-to-Text for OpenAIs Whisper.

Specifically, we are using a forked version called faster-whisper. This new speech-to-text model performs much better (ranging from ~3.6% word-error-rate to ~9% word-error-rate on long audio files).

To use this new model efficiently, we need access to a GPU. Since GitHub Actions do not have GPUs available, we are using a system which spins up a Google Cloud Compute Engine instance, connects to it, runs our job, and then tears it down all in the course of a single GitHub Action workflow. From multiple tests, this should be a reduction in cost and processing time however with this release we will do more testing to get a better estimate.

  1. We have switched from MIT to MPLv2 License.

Unless you are trying to fork our code and take it private, this won't affect you.

Bugfix for Trimmed Videos During Parallel Processing

01 Jan 21:25
aa8ff15
Compare
Choose a tag to compare

In v3.2.10, we introduced video trimming during processing in cases where users may just want to process part of a larger video. That functionality broke when trying to parallel process events because all trimmed sections were stored under the same file name. This release fixes that behavior by making the temporary file name used for the clipped portion random / a uuid.

What's Changed

Full Changelog: v3.2.10...v3.2.11

Trimming Video Prior to Processing

06 Dec 21:30
b00f378
Compare
Choose a tag to compare

What's Changed

  • Add transcription range fields to database and ingestion models, add … by @chrisjkhan in #221

New Contributors

Additionally I would like to thank: @dphoria and @smai-f

Full Changelog: v3.2.8...v3.2.9

Event Index Chunk Upload Fix

04 Oct 23:15
5bb5488
Compare
Choose a tag to compare

After an initial report from @phildini a month or so ago that the Alameda instance event index was missing a lot of n-grams and @conantp's second report. @conantp investigated the issue and found that we had bug in our index chunk upload code which ultimately meant that parts of the index were simply never updated. This was a drastic bug and much thanks should be given to @conantp for both investigating, finding, fixing, and testing the changes needed.

@conantp has already ran an index generation and upload to the Asheville instance: https://sunshine-request.github.io/cdp-asheville/#/events

What's Changed

New Contributors

Full Changelog: v3.2.6...v3.2.7

Further fix infra deployment due to bad import management

03 Oct 18:38
Compare
Choose a tag to compare

Further fixes the bad library import to protect infrastructure deployments.

Full Changelog: v3.2.5...v3.2.6