Skip to content

Commit

Permalink
Shrodingers/destination databricks dbt (#1)
Browse files Browse the repository at this point in the history
* octavia-cli: fix workspace not having anonymous_data_collection property (#13869)

* Update connection update calls to use central utility to ensure connection update has all data (#13564)

* Update connection updates with build update utility
* Add buildConnectionUpdate utility
* Update components that update the connection to use utility when necessary

* Use conection name when saving connection from replication view to prevent override from refreshed catalog

* Improve connection check on ReplicationView onSubmit function

* Display connection state in connection setting page (#13394)

* Display Connection State in Setting page

* memoize callback

* rendering and confirmaton

* setState API

* Input validation

* remove JSON step

* rename apiMethod to `updateState`

* test and adjust route

* skip if sync is running

* prevent state update when sync is running

* code editor component

* errors fixed

* scss style

* make linter happy

* Back to monaco editor

* Remove ability to edit state

* Adjust FE code

* Fix CSS problem

* Update airbyte-webapp/src/locales/en.json

Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

* just use PRE to render state for now

Co-authored-by: Tim Roes <tim@airbyte.io>
Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

* update api for per stream (#13835)

* Update airbyte-protocol.md (#13892)

* Update airbyte-protocol.md

* Fix typo

* Fix prose

* Add protocol reviewers for protocol documentation

* Remove duplicate

* Edited Amplitude, Mailchimp, and Zendesk Support docs (#13897)

* deleting SUMMARY.md since we don't need it for docusaurus builds (#13901)

* Do not hide unexpected errors in the check connection (#13903)

* Do not hide unexpected errors in the check connection

* Fix test

* Common code to deserialize a state message in the new format (#13772)

* Common code to deserialize a state message in the new format

* PR comments and type changed to typed

* Format

* Add StateType and StateWrapper objects to the model

* Use state wrapper instead of Either

* Switch to optional

* PR comments

* Support array legacy state

* format

Co-authored-by: Jimmy Ma <jimmy@airbyte.io>

* 🐛 Source Amazon Seller Partner: handle start date for financial stream (#13633)

* start and end date for finacial stream should not be more than 180 days apart

* improve unit tests

* make changes to start date for finance stream

* update tests

* lint changes

* update version to 0.2.22 for source-amazon-seller-partner

* Normalization: Fix incorrect jinja2 macro `json_extract_array` call (#13894)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* Docs: fixed the broken links (#13915)

* 0.2.5 -> 0.2.6 (#13924)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* 13546 Fix integration tests source-postgres Mac OS (#13872)

* 13546 Fix integration tests source-postgres Mac OS

* 13548 Fixed integration tests source-tidb Mac OS (#13927)

* Source MsSql : incr ver to include changes #13854 (#13887)

* incr version

* put PR id

* docker ver

* connectors that published (#13932)

* Deprecate PART_SIZE_MB in connectors using S3/GCS storage (#13753)

* Removed part_size from connectors that use StreamTransferManager

* fixed S3DestinationConfigTest

* fixed S3JsonlFormatConfigTest

* upadate changelog and bump version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* upadate changelog and bump version for Redshift and Snowflake destinations

* auto-bump connector version

* fix GCS staging test

* fix GCS staging test

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Reverted changes in SshBastionContainer (#13934)

* 🎉 New Source Dockerhub (#13931)

* init

* implement working source + tests

* add docs

* add docs

* fix bad comments

* Update airbyte-integrations/connectors/source-dockerhub/acceptance-test-config.yml

* Update airbyte-integrations/connectors/source-dockerhub/Dockerfile

* Update airbyte-integrations/connectors/source-dockerhub/.dockerignore

* Apply suggestions from code review

* Update docs/integrations/sources/dockerhub.md

* Update airbyte-integrations/connectors/source-dockerhub/integration_tests/acceptance.py

Co-authored-by: George Claireaux <george@airbyte.io>

* address @Phlair's feedback

* address @Phlair's feedback

* each record is now a Docker image rather than response page

* format

* fix unit tests

* fix acceptance tests

* add icon, definition and generate seed spec

* add requests to requirements

Co-authored-by: sw-yx <shawnthe1@gmail.com>

* commented out non-relevant tests (#13940)

* Bump Airbyte version from 0.39.20-alpha to 0.39.21-alpha (#13938)

Co-authored-by: alafanechere <alafanechere@users.noreply.github.com>

* newaction (#13942)

* remove test action (#13944)

* 🎉Source-mysql: aligned datatype test (#13945)

* [13607] source-mysql: aligned datatype tests for regular and CDC ways + added CHAR fix to CDC processing

* #13958 Source Stripe: fix configured catalogs (#13959)

* 🐛 Source: Typeform - Update schema for Responses stream (#13935)

* Upd responses schema

* Upd docs

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* :window: Updated email invitation flow that enables invited users to set name and create password (#12788)

* First pass accepting email link invitation
* Update Auth service with signInWithEmailLink calls
* Add AcceptEmailInvite component
* Update FirebaseActionRoute to handle sign in mode
* Rename ResetPasswordAction to FirebseActionRoute

* Add create password setp to AcceptEmailInvite component

* Remove continueURL from invite fetch

* Update accept email invite for user to enter both email and password together

* Set name during email link signup

* Update AcceptEmailInvite to send name
* Add updateName to UserService
* Update AuthService to set name during sign up

* Remove steps from AcceptEmailInvite component
Remove setPassword from AuthService

* Add header and title to accept invite page

* Move invite error messages to en file

* For invite link pages, show login link instead of sign up

* Disable name update on sign in via email lnk

* Resend email invite when the invite link is expired

* Fix status message in accept email invite page

* Re-enable set user's name during sign up email invite

* Update signUpWithEmailLink so that sign up is successful even if we fail to update the user's name

* Update comments on GoogleAuthService signInWithEmailLink

* Add newsletter and accept terms checkboxes to accept email invite component
* Extract signup form from signup page
* Extract fields from signup form
* Update accept email invite component to use field components from signup form
* Ensure that sign up button is disable until form is valid and security checkbox is checked

* Make error status text color in accept email link red

* Update workspace check in DefaultView so that user lands in workspace selector when there are no workspaces

* Add coment around continueUrl param usage in UserService

* Remove usless default case in GoogleAuthService

* Source Marketo: process fail during creation of an export job (#13930)

* #9322 source Marketo: process fail during creation of an export job

* #9322 source marketo: upd changelog

* #9322 source marketo: fix unit test

* #9322 source marketo: fix SATs

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* :window: :wrench: Add eslint rules for CSS modules (#13952)

* add eslint-plugin-css-modules rules

* Fixes:
- turn on eslint css modules rule as error
- remove unused styles

* add warning message if styled components is used

* Revert "add warning message if styled components is used"

This reverts commit 4e92b8b2110142bb679f15aeb034e377e0dcc69c.

* replace rule severity with words

* Update salesforce.md

Fixed broken link

* :window: 🔧 Add auto-fixable linting rules to webapp (#13462)

* Add new eslint rules that fit with our code style and downgrade rules to warn

* allowExpressions in fragment eslint rule

* Enable function-component-definition in eslint and fix styles

* Cleanup lint file

* Fix react/function-component-definition warnings manually

* Add more auto-fixable rules and fix

* Fix functions that require usless returns

* Update array-type rule to array-simple

* Fix eslint errors manually
disable assignmentExpression for arrays in prefer-destructuring rule

* Auto fix new linting issues after rebase

* Enhance /publish to allow for multiple connectors and parallel execution (#13864)

* start

* revert

* azblob

* bq

* bq denorm

* megapublish baaaabyyyy

* fix needs

* matrix connectors

* auto-bump connector version

* dont failfast and max parallel 5

* multi runno

* minor

* testing matrix agents

* name

* testing multi agents

* tmp fix

* new multi agents

* multi test

* tryy

* let's do this

* magico

* fix

* label test

* couple more connector bumps

* temp

* things

* check this

* lets gooo

* more connectors

* Delete TEMP-testing-command.yml

* auto-bump connector version

* added comment describing bash part

* running single thread

* catch sentry cli

* auto-bump connector version

* destinations

* + snowflake

* saved

* auto-bump connector version

* auto-bump connector version

* java source bumps

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* remove twice-defined methods

* label things

* revert action

* using the new test action

* point at action

* wrong tag on action

* update pool label

* update to use new ec2-github-runner fork

* this needs to be more generic than publisher

* change publish to run on pool

* add comment about runner-pool usage

* updated publish command docs for multi & parallel connector runs

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* unbump failed publish versions

* missed dockerfiles

* remove failed docs

* mssql fix

* overhauled the git comment output

* bumping a test connector that should work

* slight order switcheroo

* output connectors properly in first message

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Bump Airbyte version from 0.39.21-alpha to 0.39.22-alpha (#13979)

Co-authored-by: Phlair <Phlair@users.noreply.github.com>

* Parker/temporal cloud (#13243)

* switch to temporal cloud client for now

* format

* use client cert/key env secret instead of path to secret

* add TODO comments

* format

* add logging to debug timeout issue

* add more logging

* change workflow task timeout

* PR feedback: consolidate as much as possible, add missing javadoc

* fix acceptance test, needs to specify localhost

* add internal-use only comments

* format

* refactor to clean up TemporalClient and prepare it for future dependency injection framework

* remove extraneous log statements

* PR feedback

* fix test

* return isInitialized true in test

* 📄  Postgres source: fix CDC setup order in docs (#13949)

* postgres source: fix CDC setup order docs

* Update docs/integrations/sources/postgres.md

Co-authored-by: Liren Tu <tuliren@gmail.com>

* Per-stream state support for Postgres source (#13609)

* WIP Per-stream state support for Postgres source

* Fix failing test

* Improve code coverage

* Make global the default state manager

* Add legacy adapter state manager

* Formatting

* Include legacy state for backwards compatibility

* Add global state manager

* Implement Global/CDC state handling

* Fix test issues

* Fix issue with updated method signature

* Handle empty state case in global state manager

* Adjust to protocol changes

* Fix failing acceptance tests

* Fix failing test

* Fix unmodifiable list issue

* Fix unmodifiable exception

* PR feedback

* Abstract global state manager selection

* Handle conversion between different state types

* Handle invalid conversion

* Rename parameter

* Refactor state manager creation

* Fix failing tests

* Fix failing integration tests

* Add CDC test

* Fix failing integration test

* Revert change

* Fix failing integration test

* Use per-stream for postgres tests

* Formatting

* Correct stream descriptor validation

* Correct permalink

* PR feedback

* Bump Airbyte version from 0.39.22-alpha to 0.39.23-alpha (#13984)

Co-authored-by: pmossman <pmossman@users.noreply.github.com>

* Adds test for new workflow (#13986)

* Adds test for new workflow

* Adds airbyte repo

* remove testing line

* Add new InterpolatedRequestOptionsProvider that encapsulates all variations of request arguments (#13472)

* write out new request options provider and refactor components and parts of the YAML config

* fix formatting

* pr feedback to consolidate body_data_provider to simplify the code

* pr feedback get rid of extraneous optional

* publish oss for cloud (#13978)

workflow to publish oss artifacts that cloud needs to build against
use docker buildx to create arm images for local development

* skip debezium engine startup in case no table is in INCREMENTAL mode (#13870)

* 🎉 Source Github: break point added for workflows_runs stream (#13926)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* 6339: error when attempting to use azure sql database within an elastic pool as source for cdc based replication (#13866)

* 6339: debug info

* 6339: not using 'USE' on Azure SQL servers

* 6339: cleanup

* 6339: cleanup2

* 6339: cleanup3

* 6339: versions/changelogs updated

* 6339: merge from master (consolidation issue)

* 6339: dev connector version (for testing in airbyte cloud)

* 6339: code review implementation

* 6339: apply formatting

* in case runners fail to spin up, this needs to run on github-hosted (#13996)

* 12708: Add an option to use encryption with staging in Redshift Destination (#13675)

* 12708: Add an option to use encryption with staging in Redshift Destination

* 12708: docs/docker configs updated

* 12708: merge with master

* 12708: merge fix

* 12708: code review implementation

* 12708: fix for older configs

* 12708: fix for older configs in check

* 12708: merge from master (consolidation issue)

* 12708: versions updated

* :tada: New Source: Webflow (#13617)

* Added webflow code

* Updated readme

* Updated README

* Added webflow to source_definitions.yaml

* Enhanced documentation for the Webflow source connector

* Improved webflow source connector instructions

* Moved Site ID to before API token in Spec.yaml (for presentation in the UI)

* Addressed comments in PR.

* Changes to address requests in PR review

* Removed version from config

* Minor udpate to spec.yaml for clarity

* Updated to pass the accept-version as a constant rather than parameter

* Updated check_connection to hit the collections API that requires both site id and the authentication token.

* Fixed the test_check_connection to use the new check_connection function

* Added a streams test for generate_streams

* Re-named "autentication" object to "auth" to be more consistent with the way it is created by the CDK

* Added in an explict line to instantiante an "auth" object from WebflowTokenAuthenticator, to make it easier to describe in the blog

* Fixed a typo in a comment

* Renamed some classes to be more intuitive

* Renamed class to be more intuitive

* Minor change to an internal method name

* Made _get_collection_name_to_id_dict staticmethod

* Fixed a unit-test error that only appeared when running " python -m pytest -s unit_tests".
This was caused by Mocked settings from test_source.py leaking into test_streams.py

* format: add double quotes and remove unused import

* readme: remove semantic version naming of connector in build commands

* Updated spec.yaml

* auto-bump connector version

* format files

* add changelog

* update dockerfile

* auto-bump connector version

Co-authored-by: sajarin <sajarindider@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>

* Source-oracle: fixed tests + checkstyle (#13997)

* Source-oracle: fixed tests + checkstyle

* 🐛Destination-mysql: fixed integration test and build process (#13302)

* [13180] destination-mysql: fixed integration test

* update changelog to include debezium version upgrade (#13844)

* make table headers look less like successes (#13999)

* source-twilio: implement lookback windows (#13896)

* Revert "12708: Add an option to use encryption with staging in Redshift Destination (#13675)" (#14010)

This reverts commit aa28d448d820df9d79c2c0d06b38978d1108fb2c.

* Revert "6339: error when attempting to use azure sql database within an elastic pool as source for cdc based replication (#13866)" (#14011)

This reverts commit 0d870bd37bc3b5cd798b92115d73bcc45a42d8f7.

* [low-code connectors] BasicHttpAuthenticator (#13733)

* implement basichttpauthenticator

* add optional refresh access token authenticator

* remove prints

* type hints

* Fix and unit test

* missing test

* Add class to __init__ file

* Add comment

* migrate JsonSchemas to use basic path instead of JSONPath (#13917)

* scaffold for catalog diff, needs fixing on type handling and tests (#13786)

* Prepare release of JDBC connectors (#13987)

* Prepare release of JDBC connectors

* Update source definitions manually

* use built in check for if path is definite (#13834)

* 13535 Fixed bastion network for integration tests (#14007)

* doc: add error troubleshooting `docker-compose up` (#13765)

* fix: duplicate resource allocations in `airbyte-temporal` deployment (#13816)

* helm-chart: Fix worker deployment format error (#13839)

* add catalog diff connection read (#13918)

* doc: fix small typo on Shopify documentation (#13992)

* add streams to reset to job info (#13919)

* Generate api for changes in #13370 and make code compatible (#14014)

* Generate api for per-stream updates #13835 (#14021)

* Revert "Prepare release of JDBC connectors (#13987)" (#14029)

This reverts commit df759b30778082508e2872513800fac34d98ff7c.

* Fix per stream state protocol backward compatibility (#14032)

* rename state type field to fix backwards compatibility issue

* replace usages of stateType with type

* support semi incremental by adding extractor record filter (#13520)

* support semi incremental by adding extractor record filter

* refactor extractor into a record_selector that supports extraction and filtering of response records

* Remove pydantic spec from amazon ads and use YAML spec (#13988)

* add EdDSA support in SSH tunnel (#9494)

* add EdDSA support

* verify EdDSA support works correct

Co-authored-by: Yurii Bidiuk <yura.bidyuk@gmail.com>

* 🎉New source connector: source-metabase (#13752)

* Add docs

* Close metabase session when sync finishes

* Close session in check_connection

* Add source definition to seed

* Add icon

* improve cdc check for connectors (#14005)

* improve should use cdc check

* Revert "improve should use cdc check"

This reverts commit 7d01727279d21d33a6c18ed3227ee94432636120.

* improve should use cdc check

* add unit test

* Update webflow.md

* Update webflow.md

* Update webflow.md

* Remove legacy sentry code from cdk (#14016)

* rip sentry out of cdk

* remove sentry dsn from gsc

* Update webflow.md

* Update webflow.md

* Fixed broken links (#14071)

* 🪟Persist unsaved changes on schema refresh (#13895)

* add form values tracker context

* add clarifying comment

* add same functionality to create connection

* Update airbyte-webapp/src/components/CreateConnectionContent/CreateConnectionContent.tsx

Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

* Fixes broken links so we can deploy again (#14075)

also adds better error message for when this happens to others

* Adds symmary.md to gitignore (#14078)

* Added webflow icon (#14069)

* Added webflow icon

* Added icon

* Build create connection form build failure (#14081)

* Fix CDK obfuscation of nested secrets (#14035)

* Added Buy Credits section to Managing Airbyte Cloud (#13905)

* Added Buy Credits section to Managing Airbyte Cloud

* Made some style changes

* Made edits based on Natalie's suggestions

* Deleted link

* Deleted line

* Edited email address

* Updated reaching out to sales sentence

* disable es-lit to fix build (#14087)

* Release source connectors (#14077)

* Release source connectors

* Fix issue with database connection in test

* Fix failing tests due to authentication

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Bump Airbyte version from 0.39.23-alpha to 0.39.24-alpha (#14094)

Co-authored-by: jdpgrailsdev <jdpgrailsdev@users.noreply.github.com>

* Emit the state to remove in the airbyte empty source (#13725)

What
This PR updates the EmptyAirbyteSource in order to perform a partial update and handle the new state message format.

How
The empty will now emit different messages based on the type of state being provided:

Per stream: it will emit one message per stream that have been reset
Global: It will emit one global message that will contain null for the stream that have been reset including the shared state
Co-authored-by: Jimmy Ma <jimmy@airbyte.io>

* Add StatePersistence object (#13900)

Add a StatePersistence object that supports Read/Writes of States to the DB with StreamDescriptor fields

The only migrations that is supported are
* moving from LEGACY to GLOBAL
* moving from LEGACY to STREAM
* All other state type migrations are expected to go through an explicit reset beforehand.

* secret-persistence: Hashicorp Vault Secret Store (#13616)

Co-authored-by: Amanda Murphy <amanda.murphy@heapanalytics.com>
Co-authored-by: Benoit Moriceau <benoit@airbyte.io>

* 🐛 Source Hubspot: remove `AirbyteSentry` dependency (#14102)

* fixed

* updated changelog

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* fix: format VaultSecretPersistenceTest.java (#14110)

* Source Hubspot: extend error logging (#14054)

* #291 incall - source Hubspot: extend error logging

* huspot: upd changelog

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Update webflow.md (#14083)

* Update webflow.md

Removed a description that is only applicable to people that are writing connector code, not to _users_ of the connector.

* Update webflow.md

* Update webflow.md

* Update webflow.md

* Update webflow.md

* 12708: Add an option to use encryption with staging in Redshift Desti… (#14013)

* 12708: Add an option to use encryption with staging in Redshift Destination (#13675)

* 12708: Add an option to use encryption with staging in Redshift Destination

* 12708: docs/docker configs updated

* 12708: merge with master

* 12708: merge fix

* 12708: code review implementation

* 12708: fix for older configs

* 12708: fix for older configs in check

* 12708: merge from master (consolidation issue)

* 12708: versions updated

* 12708: specs updated

* 12708: specs updated

* 12708: removing autogenerated files from PR

* 12708: changelog updated

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Source PayPal Transaction: Update Transaction Schema (#13682)

* Update transaction schema.
* Transform money values from strings to floats or integers.

Co-authored-by: nataly <nataly@airbyte.io>
Co-authored-by: Augustin <augustin.lafanechere@gmail.com>

* fix(jsonSchemas): raise error when items property not provided (#14018)

* fix stream name in stream transformation update (#14044)

* 🐛 Destination Redshift: Improved discovery for redshift-destination not SUPER streams (#13690)

airbyte-12843: Improved discovery for redshift-destination not SUPER tables, excluded views from discovery.

* Remove skiptests option (#14100)

* update sentry release script (#14123)

* Remove "additionalProperties": false from specs for connectors with staging (#14114)

* Remove "additionalProperties": false from spec for connectors with staging

* Remove "additionalProperties": false from spec for Redshift destination

* bump versions

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* [14003] source-oracle: added custom jdbc field (#14092)

* [14003] source-oracle: added custom jdbc field

* Add JobErrorReporter for sending sync job connector failures to Sentry (#13899)

* skeleton for reporting connector errors to sentry

* report on job failures instead of attempt failures

* report sync job failures with relevant metadata using JobErrorReporter

* send stack traces from python connectors to sentry

* test JobCreationAndStatusUpdate and JobErrorReporter

* logs

* refactor into helper, initial tests

* using sentry

* run format

* load reporting client from env

* load sentry dsn from env

* send java stack traces to sentry

* test sentryclient, refactor to use Hub instance

* ErrorReportingClient.report -> .reportJobFailureReason

* inject exception helper, test stack trace parse error tagging

* rm logs

* more stack trace tests

* remove logs

* fix failing tests

* rename ErrorReportingClient to JobErrorReportingClient

* rename vars in docker-compose

* Return an Optional instead of null when parsing stack traces

* dont remove airbyte prefix when setting release name

* from_trace_message static

* remove failureSummary from jobfailure input, get from Job

* send stacktrace string if we weren't able to parse

* set deployment mode tag

* update .env

* just log if something goes wrong

* Use StateMessageHelper in source (#14125)

* Use StateMessageHelper in source

* PR feedback and formatting

* More PR feedback

* Revert change

* Revert changes

* Bump Airbyte version from 0.39.24-alpha to 0.39.25-alpha (#14124)

Co-authored-by: brianjlai <brianjlai@users.noreply.github.com>

* Refactor acceptance tests and utils (#13950)

* Refactor Basic acceptance tests and utils

* Refactor Advanced acceptance tests and utils

* Remove unused code

* Clear destination db data during cleanup

* Cleanup comments

* cleanup init code

* test creating new desintation db for each test

* cleanup desintation db init

* Allow to edit api client

* pull in temporal cloud changes

* Rename helper to harness; set some funcs to private; turn init into constructor

* add func to set env vars instead of using static vars and move some functionality out of init into acceptance tests

* update javadoc

Co-authored-by: Davin Chia <davinchia@gmail.com>

* fix javadoc formatting

* fix var naming

Co-authored-by: Davin Chia <davinchia@gmail.com>

* Bump Airbyte version from 0.39.25-alpha to 0.39.26-alpha (#14141)

Co-authored-by: terencecho <terencecho@users.noreply.github.com>

* 🎉 octavia-cli: Add ability to get existing resources (#13254)

* 13541 Fixed integration tests source-db2 Mac OS (#14133)

* 13523 Fix integration tests destination-cassandra Mac OS (#14134)

* 🐛 Source Hubspot: fixed SAT test, commented out expected_records (#14140)

* :bug: Source Intercom: extend `Contacts` schema with new properties (#14099)

* Source Twilio: adopt best practices (#14000)

* #1946 Source twilio: aopt best practices - tune tests

* #1946 add expected_records to acceptance-test-config.yml

* #1946 source twilio - upd schema and changelog

* #1946 fix expected_records

* #1946 source twilio: rm alerts from expected records as they expire in 30 days

* #1946 source twilio: bump version

* 🎉 Source BingAds:  expose hourly/daily/weekly/monthly options from configuration (#13801)

* #12489 - expose hourly/daily/weekly/monthly reports in discovery by default instead of in the connector's configuration settings

removed:  config settings for hourly/daily/weekly/monthly reports
added:    default value for all periodic reports to True

* #12489 - expose hourly/daily/weekly/monthly reports in discovery by default instead of in the connector's configuration settings

removed:  unused class variables, if-statement

* #12489 - expose hourly/daily/weekly/monthly reports in discovery by default instead of in the connector's configuration settings

removed:  unused variables from config

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* remove VersionMismatchServer (#14076)

* remove VersionMismatchServer

* remove VersionMismatchServerTest

* revert intended changes

* Increase instance termination time limit to 3 hours to accommodate connector builds. (#14181)

* Use correct bash comment symbol. (#14183)

* 🎉 New Source: Orbit.love (#13390)

* source-orbit: add definition and specs (#14189)

* 🎉 Base Norrmalization: clean-up Redshift `tmp_schemas` after SAT (#14015)

Now after `base-normalization` SAT the Destination Redshift will be automatically cleaned up from test leftovers. Other destinations are not covered yet.

* Source Salesforce: fix customIntegrationTest for SAT (#14172)

* Source Amazon Ads: increase timeout for SAT (#14167)

* 🎉  Introduce Google Analytics Data API source (#12701)

* Introduce Google Analytics Data API source

https://developers.google.com/analytics/devguides/reporting/data/v1

* Add Google Analytics Data API source PR link

* Add `client` class for Google Analytics Data API

* Move dimensions and metrics extraction to the `client` class

In the Google Analytics Data API

* Change the copyright date to 2022 in Google Analytics Data API

* fix: removing incremental syncs

* fix: change project_id to string

* fix: flake check is failing

* chore: added it to source definitions

* chore: update seed file

Co-authored-by: Harshith Mullapudi <harshithmullapudi@gmail.com>

* 🐛 Destination Redshift: use s3 bucket path for s3 staging operations (#13916)

* Publish acceptance test utils maven artifact (#14142)

* Fix StatePersistence Legacy read/write (#14129)

StatePersistence will wrap/unwrap legacy state on write/read to ensure
compatibility with the old behavior/data.

* 🎉 Destination connectors: Improved "SecondSync" checks in Standard Destination Acceptance tests (#14184)

* [11731] Improved "SecondSync" checks in Standard Destination Acceptance tests

* 🐛 Source Zendesk Support: fixed "Retry-After" non integer value (#14112)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* Source Tiktok Marketing: Videometrics (#13650)

* added video metrics in streams.py

* common metrics list updated.

* updated streams.py with extended metrics required.

* updated stream_test

* updated configured_catalog

* video metrics required list updated.

* chore: formatting

* chore: bump version in source definitions

* chore: update seed file

Co-authored-by: Harshith Mullapudi <harshithmullapudi@gmail.com>

* 🎉 Source Github: secondary rate limits has to retry (#13955)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* Harshith/test pr 13118 (#14192)

* Firebolt destination

* feat: Write method dropdown

* feat: Use future-proof Auth in SDK

* refactor: Move writer instantiation

* fix: tests are failing

* fix: tests are failing

* fix: tests are failing

* chore: added connector to definitions

* fix: formatting and spec

* fix: formatting for orbit

Co-authored-by: ptiurin <petro.tiurin@firebolt.io>

* 🪟 :art: Show credit usage on chart's specific day (#13503)

* add tooltip to chart

* Fixes:
- update main chart color;
- change onHover background color

* change chart color pallet to grey 500

* update color reference

* remove opacity from UsageCell

* 🐛 destination-redshift: use s3 bucket path for s3 cleanup (#14190)

* Improve documentation for Postgres Source (#13830)

* Improve documentation for Postgres Source
 * add information about additional JDBC params
 * add anchors for doc sections
 * fix link to CDC on Bare Metal
 * add more details about parsing date/time values
 * add doc link to SSH fields

* Handle null reset source config (#14202)

* handle null reset source config

* format

* Wait indefinitely if connection is not active (#14200)

* also wait indefinitely if connection is deleted

* fix test

* Bump Airbyte version from 0.39.26-alpha to 0.39.27-alpha (#14204)

Co-authored-by: lmossman <lmossman@users.noreply.github.com>

* Bmoric/feature flag for state deserialization (#14127)

* Add Feature flag

* Add default feature flag value

* Update test

* remove unsused

* tmp

* Update tests

* rm unwanted change

* PR comments

* [low-code connectors] default types and default values (#14004)

* default types and default values

* cleanup

* fixes so read works

* remove prints and trycatch

* comment

* remove unused param

* split file

* extract method

* extract methods

* comment

* optional

* fix test

* cleanup

* delete interpolated request header provider

* simplify next page url paginator interface

* comment

* format

* add state type endpoint (#14111)

* Bump Airbyte version from 0.39.27-alpha to 0.39.28-alpha (#14210)

Co-authored-by: sherifnada <sherifnada@users.noreply.github.com>

* 🐛 source-orbit: remove workspace_old.json (#14208)

* Fix: Docs plural login redirecting to wrong URL (#14207)

* [docs] fix numbering and incorrect filename in CDK docs (#13045)

* [docs] fix numbering in CDK docs

* Update 5-declare-schema.md

* Update 5-declare-schema.md

* Update 6-read-data.md

* Update 8-test-your-connector.md

* Remove the old scheduler from HelmCharts helper (#14187)

* Remove the old scheduler from HelmCharts helper

The old scheduler was removed as part of https://github.com/airbytehq/airbyte/pull/13400

* Remove legacy `scheduler` comment in HelmCharts

* Source Gitlab: add GroupIssueBoards stream (#13252)

* GitLab Source: add GroupIssueBoards stream

* Address stream schema comments

* Address comments

* Bump version

* Add as empty stream

* run seed file source (#14215)

* fix 'cannot reach server' error on demo instance (#10020)

* Update CODEOWNERS (#14209)

* 🎉 Source Github: use GraphQL for `reviews` stream (#13989)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* workflow for publishing artifacts for cloud (#14199)

* fix sentry org slug change (#14218)

* Source File: correct spec json to match json format (#13738)

* Upgrade spotless version and remove jvmargs workaround (#13705)

* Source Zendesk Chat: Process large amount of data in batches for incremental  (#14214)

* increased the limit of itens in request

* Configuration for max api pages on requests

* included api_pagination_limit in sample

* included api_pagination_limit in invalid_config

* creating new table for chat_session

* reverted api_pagination_limit approach

* removed api_pagination_limit from TimeIncrementalStream

* correct chat json

* bump connector version

* add changelog

* run format

* auto-bump connector version

Co-authored-by: Roberto Bonnet <robertojuarezwp@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Remove all @ts-ignore (#14221)

* Bump hadoop to use version 3.3.3 (#14182)

* Change the persistence activity to use the new persistence layer (#14205)

* Change the persistence activity to use the new persistence layer

* Use lombok

* format

* Use new State message helper

* Fix build (#14225)

* Fix build

* Fix test

* Use new state persistence for state reads (#14126)

* Inject StatePersistence into DefaultJobCreator
* Read the state from StatePersistence instead of ConfigRepository
* Add a conversion helper to convert StateWrapper to State
* Remove unused ConfigRepository.getConnectionState

* Temporal per stream resets (#13990)

* remove reset flags from workflow state + refactor

* bring back cancelledForReset, since we need to distinguish between that case and a normal cancel

* delete reset job streams on cancel or success

* extract isResetJob to method

* merge with master

* set sync modes on streams in reset job correctly

* format

* Add test for getAllStreamsForConnection

* fix tests

* update more tests

* add StreamResetActivityTests

* fix tests for default job creator

* remove outdated comment

* remove debug lines

* remove unused enum value

* fix tests

* fix constant equals ordering

* make job mock not static

* DRY and add comments

* add comment about deleted streams

* Remove io.airbyte.config.StreamDescriptor

* regisster stream reset activity impl

* refetch connection workflow when checking job id, since it may have been restarted

* only cancel if workflow is running, to allow reset signal to always succeed even if batched with a workflow start

* fix reset signal to use new doneWaiting workflow state prop

* try to fix tests

* fix reset cancel case

* add acceptance test for resetting while sync is running

* format

* fix new acceptance test

* lower sleep on test

* raise sleep

* increase sleep and timeout, and remove repeated test

* use CatalogHelpers to extract stream descriptors

* raise sleep and timeout to prevent transient failures

* format

Co-authored-by: alovew <anne@airbyte.io>

* fix PostgresJdbcSourceAcceptanceTest by activating the feature flag (#14240)

* fix PostgresJdbcSourceAcceptanceTest by activating the feature flag

* fix AbstractJdbcSourceAcceptanceTest as well

* fix expected_spec for strict encrypt

* [13539] Fix integration tests source-clickhouse Mac OS (#14201)

* [13539] Fix integration tests source-clickhouse Mac OS
fixed unit tests

* [13524] Fix integration tests destination-clickhouse Mac OS
fixed unit tests

* 6339: error when attempting to use azure sql database within an elastic pool as source for cdc based replication (#14121)

* 6339: implementation

* 6339: changelog updated

* 6339: definitions updated

* 6339: definitions reverted

* 6339: still struggling with publishing

* auto-bump connector version

* 6339: definitions reverted - correct

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* 🪟 🎨 Update favicon and table row image styles (#14020)

* style changes to favicon and imageblock

* fix import

* revert component and props names to block

* Update airbyte-webapp/src/components/ImageBlock/ImageBlock.tsx

Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

* Update airbyte-webapp/src/components/ImageBlock/ImageBlock.module.scss

Co-authored-by: Vladimir <volodymyr.s.petrov@globallogic.com>

* Update airbyte-webapp/src/components/ImageBlock/ImageBlock.tsx

Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>

* Update airbyte-webapp/src/components/ImageBlock/ImageBlock.module.scss

Co-authored-by: Vladimir <volodymyr.s.petrov@globallogic.com>

* add storybook

Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>
Co-authored-by: Vladimir <volodymyr.s.petrov@globallogic.com>

* upgrade potgresql version to fix default timestamp handling (#14211)

* implement logic to trigger snapshot of new tables via debezium (#13994)

* implement logic to trigger snapshot of new tables via debezium

* format

* improve test condition

* fix build

* BigQuery Denormalized "airbyte_type": "big_integer" to INT64 (#14079)

* BigQuery Denormalized "airbyte_type": "big_integer" to INT64

* updated changelog

* added unit test

* removed star import

* fixed checkstyle

* bump version

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Add Metrics section to Scaling Airbyte doc (#14224)

* Added metrics section to scaling airbyte doc

* Updated URL in doc

* Deleted link

* Added link

* Added backslashes before brackets that aren't links

* Edited note about tagged metrics

* Changed list

* Changed spacing

* Changed spacing

* Changed spacing

* Deleted period

* Fixed broken firebolt link

* Added tables

* Cleaned up wording in tables

* Add ability to provide source/destination connector docker image (#14266)

* Add ability to provide source/destination connector docker image

* Make constant public

* Bump Airbyte version from 0.39.28-alpha to 0.39.29-alpha (#14232)

* disable flaky cmw test temporarily (#14269)

* release new postgres source connector version 0.4.29 (#14265)

* release new postgres source connector version 0.4.29

* add changelog

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* :tada: Source Tiktok marketing - remove granularity config option (#13890)

* Removed granularity config option from spec, added corresponsing streams for each support granularity (hourly daily, lifetime), updated unittests, SAT

* auto-formating

* auto-formating

* removed AdvertisersIds stream from list of exposed streams, updated docs

* expose new style streams since 0.1.13, expose old streams for config for older version

* update spec

* fixed path to catalog

* increased timeout

* source bing-ads to ga (#13679)

* Source Tiktok marketing - increase connector version (#14272)

* increased connector version

* increased connector version in seed

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Fix flaky connection manager workflow test (#14271)

* try thread sleep instead of test env, and run 100 times

* replace testEnv.sleep with Thread.sleep in several tests

* replace RepeatedTest with Test

* replace testEnv.sleep with Thread.sleep after signals are executed

* run each test 100 times to see if any are flaky

* add log

* change repetitions to 100 to avoid out of memory

* format

* swap repeated test for normal test

* 13532 Fixed integration tests destination-mssql Mac OS (#14252)

* 13532 Fixed integration tests destination-mssql Mac OS

* Source Google Analytics: Specify integer for dimension ga:dateHourMinute (#14298)

* Specify integer for dimension ga:dateHourMinute
* Update changelog

* 🎉 Source Github: rename field `mergeable` to `is_mergeable` (#14274)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* Update Airbyte Client (#14270)

* #12668 #13198 enable full refresh, disable incremental and expected_records (#14191)

* 🎉 Destination S3: update INSTANCE_PROFILE to use AWSDefaultProfileCredential (#14231)

Co-authored-by: Mike Balmer <remlabm@users.noreply.github.com>

* Source Zendesk Support: pagination group membership (#14304)

* add next_page_tooken and request

* correct group_membership paginatin

* update doc

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* 🪟 🐛 Fix OAuth validation not allowing to create source or destination (#14197)

* Enable "Set up source/destination" button only if the form is valid

* Update how ServiceForm initial values are patched so that it correctly patches the configuration with default values

* Update initial values patching in service form to use initialValues to preserve already set values
Update useOAuthFlowAdapter to correctly merge the values from the oauth response

* Remove unused values var from ServiceForm

* Add acceptance tests for per-stream state updates (#14263)

* Add acceptance tests for per-stream state updates

* PR feedback

* Formatting

* More PR feedback

* PR feedback

* Remove unused constant

* Make sure that the feature flag is transfer to container (#14314)

* Make sure that the feature flag is transfer to container

* propagate the feature flags

* Avoid propagating the feature flags

* Fix tests

* Source Postgres : use more simple and comprehensive query to get selectable tables (#14251)

* use more simple and comprehensive query to get selectable tables

* cover case when schema is not specified

* add test to check discover with different ways of grants

* format

* incr ver

* incr ver

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Fixed broken link

* Fix for deleting stream resets (#14322)

* Fix for deleting stream resets

* Fix build by updating var (#14321)

* Edited formatting (#14275)

* Avoid error when creating dupl stream reset (#14328)

* Bump Airbyte version from 0.39.29-alpha to 0.39.30-alpha (#14329)

Co-authored-by: lmossman <lmossman@users.noreply.github.com>

* Release new postgres strict encrypt version (#14331)

* Bump postgres strict encrypt version

* Update changelogs

* Update doc

* Release new destination s3 version to pick up latest change (#14332)

* Bump s3 version

* Update pr id

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* 13538 Fix integration tests destination-scylla Mac OS (#14308)

* 13538 Fix integration tests destination-scylla Mac OS

* Update cdk-speedrun.md (#14258)

Added a link at the bottom of the article , so the user may find the more in-depth tutorial about building a real-world connector.

* Update README.md (#14303)

Added a link to https://airbyte.com/tutorials/extract-data-from-the-webflow-api in Webflow's README.md

* Update building-a-python-source.md (#14262)

* Update webflow.md (#14254)

Added a link to the new blog - https://airbyte.com/tutorials/extract-data-from-the-webflow-api

Co-authored-by: Simon Späti <simu@sspaeti.com>

* Alex/declarative stream incremental fix (#14268)

* checkout files from test branch

* read_incremental works

* reset to master

* remove dead code

* comment

* fix

* Add test

* comments

* utc

* format

* small fix

* Add test with rfc3339

* remove unused param

* fix test

* 🐛 SingerSource: Fix incompatibilities and typing issues (#14148)

* Use logging.Logger in SingerSource

* Fix SingerSource ConfigContainer

This fixes typing issues with `ConfigContainer` and makes it compatible
with `split_config`. Fixes #8710.

* Fix SingerSource state and catalog typer issues

* Rename SingerSource method args to match parent classes

* Remove old comment about excluding Singer

Co-authored-by: Alexandre Girard <alexandre@airbyte.io>

* Update source postgres release stage to beta (#14326)

* fix NPE (#14353)

* fix NPE

* Add test

* Fix trailing

* 🎉 octavia-cli: Add ability to import existing resources (#14137)

* helm chart: Add Image Pull Secrets Param  (#14031)

* fix format (#14354)

* Bump Airbyte version from 0.39.30-alpha to 0.39.31-alpha (#14355)

Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com>

* tiktok to ga (#14358)

* Update state.state type (#14360)

* Run some DATs as part of base-normalization tests (#14312)

* Revert "🎉 Source Github: rename field `mergeable` to `is_mergeable` (#14274)" (#14338)

* Revert "🎉 Source Github: rename field `mergeable` to `is_mergeable` (#14274)"

* Properly update the hasEmitted state (#14367)

* Bmoric/state aggregator (#14364)

* Update state.state type

* Add state aggregator

* Test and format

* PR comments

* Move to its own package

* Update airbyte-workers/src/test/java/io/airbyte/workers/internal/state_aggregator/StateAggregatorTest.java

Co-authored-by: Lake Mossman <lake@airbyte.io>

* format

* Update airbyte-workers/src/main/java/io/airbyte/workers/internal/state_aggregator/DefaultStateAggregator.java

Co-authored-by: Lake Mossman <lake@airbyte.io>

* format

Co-authored-by: Lake Mossman <lake@airbyte.io>

* Bump Airbyte version from 0.39.31-alpha to 0.39.32-alpha (#14383)

Co-authored-by: alafanechere <alafanechere@users.noreply.github.com>

* 🐛 Source Mixpanel: fix SAT tests (#14349)

* Call the new revoke_user_session endpoint from the FE (#13165)

* Source Instagram: change releaseStage to GA (#14162)

* Source Google Analytics: Change releaseStage to GA (#13957)

* source-outreach: fix record parsing and cursor field access (#14386)

* Kustomize: Use `resources` since `bases` is deprecated (#14037)

* fix: clone api doesn't take update configurations (#13592)

* fix: clone api doesn't take update configurations

* fix: you will be able to create clone in different workspace

* fix: added description to source/destination body

* cdk: Attach namespace to stream  in catalog (#13923)

* Source TiDB: correct jdbc string builder (#14243)

* add icon for tidb-connector

* Fix TiDB source connector

* bump connector version

* auto-bump connector version

Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* Source Google Ads: use docsaurus feature for warn/note and udpdate doc (#14392)

* use docsaurus feature for warn/note and udpdate doc

* update description in supported streams

* Source Facebook Marketing: allow configuration of MAX_BATCH_SIZE (#14267)

* Add max batch size config

* Bump semver

* add changelog

* auto-bump connector version

Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>

* 🎉 Source Github: add Retry for GraphQL API Resource limitations (#14376)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* Add more metadata to the JobErrorReporter (#14395)

* add workspace_id and connector_repository as tags

* add tag for connection url

* fix urls for job notifier

* format

* fix failing test

* beta -> generally_available (#14315)

Signed-off-by: Sergey Chvalyuk <grubberr@gmail.com>

* helm chart: Fix/double printing of extra volume mounts (#14091)

* SentryJobErrorReporter: better handling of multiline chained java exceptions (#14398)

* Docs: deploy on gcp use docusaurus tabs (#14401)

* Revert "Kustomize: Use `resources` since `bases` is deprecated (#14037)" (#14415)

This reverts commit 5c9a6a5fc655a9e597f755be8fc8ccf805a2537a.

* Use Debezium Postgres image for CDC tests (#14318)

* Use Debezium Postgres image for CDC tests

* Formatting

* 🎉 octavia-cli: Add ability to import all resources (#14374)

* Bump Airbyte version from 0.39.32-alpha to 0.39.33-alpha (#14419)

Co-authored-by: pedroslopez <pedroslopez@users.noreply.github.com>

* 📝 MySql source: clarify tinyint to number conversion when size > 1 (#14424)

* 🪟 🐛 Fix Setup Source Button on OAuth Sources (#14413)

* don't disable setup button

* make eslint happy

* one more cleanup

* use the spec to decide how to create config object

* Bump Airbyte version from 0.39.33-alpha to 0.39.34-alpha (#14428)

Co-authored-by: timroes <timroes@users.noreply.github.com>

* [low-code cdk] Enable configurable state checkpointing (#14317)

* checkout files from test branch

* read_incremental works

* reset to master

* remove dead code

* comment

* fix

* Add test

* comments

* utc

* format

* small fix

* Add test with rfc3339

* remove unused param

* fix test

* configurable state checkpointing

* update test

* fix type hints (#14352)

* normalization: Do not return NULL for MySQL column values > 512 chars  (#11694)

Co-authored-by: Augustin <augustin.lafanechere@gmail.com>
Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com>
Co-authored-by: Evan Tahler <evan@airbyte.io>
Co-authored-by: Tim Roes <tim@airbyte.io>
Co-authored-by: Charles <charles@airbyte.io>
Co-authored-by: Jonathan Pearlin <jonathan@airbyte.io>
Co-authored-by: Amruta Ranade <11484018+Amruta-Ranade@users.noreply.github.com>
Co-authored-by: Benoit Moriceau <benoit@airbyte.io>
Co-authored-by: Jimmy Ma <jimmy@airbyte.io>
Co-authored-by: Ganpat Agarwal <gagarwal@artica.com>
Co-authored-by: Serhii Chvaliuk <grubberr@gmail.com>
Co-authored-by: Rajakavitha Kodhandapani <krajakavitha@gmail.com>
Co-authored-by: Yevhen Sukhomud <suhomud@gmail.com>
Co-authored-by: Andrii Leonets <30464745+DoNotPanicUA@users.noreply.github.com>
Co-authored-by: George Claireaux <george@claireaux.co.uk>
Co-authored-by: VitaliiMaltsev <39538064+VitaliiMaltsev@users.noreply.github.com>
Co-authored-by: Octavia Squidington III <octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: sw-yx <shawnthe1@gmail.com>
Co-authored-by: Baz <oleksandr.bazarnov@globallogic.com>
Co-authored-by: Octavia Squidington III <90398440+octavia-squidington-iii@users.noreply.github.com>
Co-authored-by: alafanechere <alafanechere@users.noreply.github.com>
Co-authored-by: Eugene <etsybaev@gmail.com>
Co-authored-by: Denis Davydov <davydov.den18@gmail.com>
Co-authored-by: Anna Lvova <37615075+annalvova05@users.noreply.github.com>
Co-authored-by: Vladimir <volodymyr.s.petrov@globallogic.com>
Co-authored-by: Phlair <Phlair@users.noreply.github.com>
Co-authored-by: Parker Mossman <parker@airbyte.io>
Co-authored-by: Adam <adam-bloom@users.noreply.github.com>
Co-authored-by: Liren Tu <tuliren@gmail.com>
Co-authored-by: pmossman <pmossman@users.noreply.github.com>
Co-authored-by: Topher Lubaway <asimplechris@gmail.com>
Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com>
Co-authored-by: Peter Hu <peter@airbyte.io>
Co-authored-by: Subodh Kant Chaturvedi <subodh1810@gmail.com>
Co-authored-by: Tuhai Maksym <kimerinn@gmail.com>
Co-authored-by: Alexander Marquardt <alexander.marquardt@gmail.com>
Co-authored-by: sajarin <sajarindider@gmail.com>
Co-authored-by: marcosmarxm <marcosmarxm@gmail.com>
Co-authored-by: Alexandre Girard <alexandre@airbyte.io>
Co-authored-by: steve withington <steve@digitalmine.com>
Co-authored-by: Leo Sussan <leosussan@gmail.com>
Co-authored-by: cenegd <cenegd@live.com>
Co-authored-by: Tomas Perez Alvarez <72174660+Tomperez98@users.noreply.github.com>
Co-authored-by: Lake Mossman <lake@airbyte.io>
Co-authored-by: Sherif A. Nada <snadalive@gmail.com>
Co-authored-by: Edward Gao <edward.gao@airbyte.io>
Co-authored-by: Yurii Bidiuk <yura.bidyuk@gmail.com>
Co-authored-by: Christophe Duong <christophe.duong@gmail.com>
Co-authored-by: Teal Larson <LARSON.TEAL@GMAIL.COM>
Co-authored-by: Sophia Wiley <106352739+sophia-wiley@users.noreply.github.com>
Co-authored-by: jdpgrailsdev <jdpgrailsdev@users.noreply.github.com>
Co-authored-by: Jimmy Ma <gosusnp@users.noreply.github.com>
Co-authored-by: Stella Chung <schung507@gmail.com>
Co-authored-by: Amanda Murphy <amanda.murphy@heapanalytics.com>
Co-authored-by: Mohamed Magdy <mohamed.magdy@canary.is>
Co-authored-by: nataly <nataly@airbyte.io>
Co-authored-by: Tyler Russell <tylerrussell85@gmail.com>
Co-authored-by: Alexander Tsukanov <alexander.tsukanovvv@gmail.com>
Co-authored-by: Pedro S. Lopez <pedroslopez@me.com>
Co-authored-by: brianjlai <brianjlai@users.noreply.github.com>
Co-authored-by: terencecho <terence@airbyte.io>
Co-authored-by: Davin Chia <davinchia@gmail.com>
Co-authored-by: terencecho <terencecho@users.noreply.github.com>
Co-authored-by: Daniel Diamond <33811744+danieldiamond@users.noreply.github.com>
Co-authored-by: drrest <dr.rest@gmail.com>
Co-authored-by: Marcos Marx <marcosmarxm@users.noreply.github.com>
Co-authored-by: Abhi Vaidyanatha <abhi@airbyte.io>
Co-authored-by: Harshith Mullapudi <harshithmullapudi@gmail.com>
Co-authored-by: Zawar Khan <zawar.khan@getmercury.io>
Co-authored-by: ptiurin <petro.tiurin@firebolt.io>
Co-authored-by: Greg Solovyev <grishick@users.noreply.github.com>
Co-authored-by: lmossman <lmossman@users.noreply.github.com>
Co-authored-by: sherifnada <sherifnada@users.noreply.github.com>
Co-authored-by: Sachin Jangid <sachinjangid832@gmail.com>
Co-authored-by: Chris Wu <chris@faros.ai>
Co-authored-by: Jared Rhizor <me@jaredrhizor.com>
Co-authored-by: tison <wander4096@gmail.com>
Co-authored-by: Roberto Bonnet <robertojuarezwp@gmail.com>
Co-authored-by: Malik Diarra <malik@airbyte.io>
Co-authored-by: alovew <anne@airbyte.io>
Co-authored-by: Oleksandr Sheheda <alexandr-shegeda@users.noreply.github.com>
Co-authored-by: midavadim <midavadim@yahoo.com>
Co-authored-by: Arsen Losenko <20901439+arsenlosenko@users.noreply.github.com>
Co-authored-by: Ryan Lewon <ryan@segv.net>
Co-authored-by: Mike Balmer <remlabm@users.noreply.github.com>
Co-authored-by: Anne <102554163+alovew@users.noreply.github.com>
Co-authored-by: Liren Tu <tuliren.git@outlook.com>
Co-authored-by: Simon Späti <simu@sspaeti.com>
Co-authored-by: Albin Skott <cstruct@users.noreply.github.com>
Co-authored-by: Caleb Fornari <calebfornari@gmail.com>
Co-authored-by: benmoriceau <benmoriceau@users.noreply.github.com>
Co-authored-by: Christian Martin <christian@ctmartin.me>
Co-authored-by: jordan-glitch <65691557+jordan-glitch@users.noreply.github.com>
Co-authored-by: Daemonxiao <35677990+Daemonxiao@users.noreply.github.com>
Co-authored-by: Keith Thompson <keithjoethompson@gmail.com>
Co-authored-by: Leo Sussan <leo@reach.vote>
Co-authored-by: pedroslopez <pedroslopez@users.noreply.github.com>
Co-authored-by: timroes <timroes@users.noreply.github.com>
Co-authored-by: Johannes Nicolai <jonico@planetscale.com>
  • Loading branch information
Show file tree
Hide file tree
Showing 109 changed files with 3,005 additions and 327 deletions.
Expand Up @@ -267,6 +267,11 @@ public DataSource build() {
* will preserve existing behavior that tests for the connection on first use, not on creation.
*/
config.setInitializationFailTimeout(Integer.MIN_VALUE);
/*
* Default timeout is 30 sec, which is too short when you work with cloud data warehouses clusters
* that can take 4-5 min to start up. Set it to 30 min to be sure
*/
config.setConnectionTimeout(30 * 60 * 1000);

connectionProperties.forEach(config::addDataSourceProperty);

Expand Down
Expand Up @@ -10,4 +10,5 @@
!dbt-project-template-oracle
!dbt-project-template-clickhouse
!dbt-project-template-snowflake
!dbt-project-template-databricks
!dbt-project-template-redshift
6 changes: 6 additions & 0 deletions airbyte-integrations/bases/base-normalization/build.gradle
Expand Up @@ -75,6 +75,10 @@ task airbyteDockerSnowflake(type: Exec, dependsOn: checkSshScriptCopy) {
configure buildAirbyteDocker('snowflake')
dependsOn assemble
}
task airbyteDockerDatabricks(type: Exec, dependsOn: checkSshScriptCopy) {
configure buildAirbyteDocker('databricks')
dependsOn assemble
}
task airbyteDockerRedshift(type: Exec, dependsOn: checkSshScriptCopy) {
configure buildAirbyteDocker('redshift')
dependsOn assemble
Expand All @@ -85,6 +89,7 @@ airbyteDocker.dependsOn(airbyteDockerMySql)
airbyteDocker.dependsOn(airbyteDockerOracle)
airbyteDocker.dependsOn(airbyteDockerClickhouse)
airbyteDocker.dependsOn(airbyteDockerSnowflake)
airbyteDocker.dependsOn(airbyteDockerDatabricks)
airbyteDocker.dependsOn(airbyteDockerRedshift)

task("customIntegrationTestPython", type: PythonTask, dependsOn: installTestReqs) {
Expand All @@ -100,6 +105,7 @@ task("customIntegrationTestPython", type: PythonTask, dependsOn: installTestReqs
dependsOn ':airbyte-integrations:connectors:destination-oracle:airbyteDocker'
dependsOn ':airbyte-integrations:connectors:destination-mssql:airbyteDocker'
dependsOn ':airbyte-integrations:connectors:destination-clickhouse:airbyteDocker'
dependsOn ':airbyte-integrations:connectors:destination-databricks:airbyteDocker'
}

// DATs have some additional tests that exercise normalization code paths,
Expand Down
@@ -0,0 +1,34 @@
FROM fishtownanalytics/dbt:1.0.0
COPY --from=airbyte/base-airbyte-protocol-python:0.1.1 /airbyte /airbyte

# Install SSH Tunneling dependencies
RUN apt-get update && apt-get install -y jq sshpass

WORKDIR /airbyte
COPY entrypoint.sh .
COPY build/sshtunneling.sh .

WORKDIR /airbyte/normalization_code
COPY normalization ./normalization
COPY setup.py .
COPY dbt-project-template/ ./dbt-template/
COPY dbt-project-template-databricks/* ./dbt-template/

# Install python dependencies
WORKDIR /airbyte/base_python_structs
RUN pip install .

WORKDIR /airbyte/normalization_code
RUN pip install .

WORKDIR /airbyte/normalization_code/dbt-template/
# Download external dbt dependencies
RUN pip install dbt-databricks==1.0.0
RUN dbt deps

WORKDIR /airbyte
ENV AIRBYTE_ENTRYPOINT "/airbyte/entrypoint.sh"
ENTRYPOINT ["/airbyte/entrypoint.sh"]

LABEL io.airbyte.version=0.1.73
LABEL io.airbyte.name=airbyte/normalization-databricks
@@ -0,0 +1,72 @@
# This file is necessary to install dbt-utils with dbt deps
# the content will be overwritten by the transform function

# Name your package! Package names should contain only lowercase characters
# and underscores. A good package name should reflect your organization's
# name or the intended use of these models
name: "airbyte_utils"
version: "1.0"
config-version: 2

# This setting configures which "profile" dbt uses for this project. Profiles contain
# database connection information, and should be configured in the ~/.dbt/profiles.yml file
profile: "normalize"

# These configurations specify where dbt should look for different types of files.
# The `model-paths` config, for example, states that source models can be found
# in the "models/" directory. You probably won't need to change these!
model-paths: ["models"]
docs-paths: ["docs"]
analysis-paths: ["analysis"]
test-paths: ["tests"]
seed-paths: ["data"]
macro-paths: ["macros"]

target-path: "../build" # directory which will store compiled SQL files
log-path: "../logs" # directory which will store DBT logs
packages-install-path: "/tmp/dbt_modules" # directory which will store external DBT dependencies

clean-targets: # directories to be removed by `dbt clean`
- "build"
- "dbt_modules"

quoting:
database: true
# Temporarily disabling the behavior of the ExtendedNameTransformer on table/schema names, see (issue #1785)
# all schemas should be unquoted
schema: false
identifier: false

# You can define configurations for models in the `model-paths` directory here.
# Using these configurations, you can enable or disable models, change how they
# are materialized, and more!
models:
+transient: false
airbyte_utils:
+materialized: table
generated:
airbyte_ctes:
+tags: airbyte_internal_cte
+materialized: ephemeral
airbyte_incremental:
+tags: incremental_tables
+materialized: incremental
+incremental_strategy: merge
# schema change test is supported automatically by the merge operation
# need to be run against a cluster with spark.databricks.delta.schema.autoMerge.enabled = True
# schema merge being handled at the final step, if a schema changes in one of the primary keys
# that coalesce differently to string, unicity will be broken
+on_schema_change: "ignore"
+file_format: delta
+pre-hook: 'SET spark.databricks.delta.schema.autoMerge.enabled = True'
airbyte_tables:
+tags: normalized_tables
+materialized: table
+file_format: delta
airbyte_views:
+tags: airbyte_internal_views
+materialized: view

dispatch:
- macro_namespace: dbt_utils
search_order: ["airbyte_utils", "dbt_utils"]
Expand Up @@ -6,6 +6,7 @@
- postgres: unnest() -> https://www.postgresqltutorial.com/postgresql-array/
- MSSQL: openjson() –> https://docs.microsoft.com/en-us/sql/relational-databases/json/validate-query-and-change-json-data-with-built-in-functions-sql-server?view=sql-server-ver15
- ClickHouse: ARRAY JOIN> https://clickhouse.com/docs/zh/sql-reference/statements/select/array-join/
- Databricks: LATERAL VIEW -> https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-qry-select-lateral-view.html
#}

{# cross_join_unnest ------------------------------------------------- #}
Expand Down Expand Up @@ -50,6 +51,10 @@
cross join table(flatten({{ array_col }})) as {{ array_col }}
{%- endmacro %}

{% macro databricks__cross_join_unnest(stream_name, array_col) -%}
lateral view outer explode(from_json({{ array_col }}, 'array<string>')) as _airbyte_nested_data
{%- endmacro %}

{% macro sqlserver__cross_join_unnest(stream_name, array_col) -%}
{# https://docs.microsoft.com/en-us/sql/relational-databases/json/convert-json-data-to-rows-and-columns-with-openjson-sql-server?view=sql-server-ver15#option-1---openjson-with-the-default-output #}
CROSS APPLY (
Expand Down Expand Up @@ -87,6 +92,10 @@
_airbyte_nested_data
{%- endmacro %}

{% macro databricks__unnested_column_value(column_col) -%}
_airbyte_nested_data
{%- endmacro %}

{% macro oracle__unnested_column_value(column_col) -%}
{{ column_col }}
{%- endmacro %}
Expand Down
Expand Up @@ -14,3 +14,31 @@
{% endcall %}

{% endmacro %}

{#
This changes the behaviour of the default adapter macro, since DBT defaults to 256 when there are no explicit varchar limits
(cf : https://github.com/dbt-labs/dbt-core/blob/3996a69861d5ba9a460092c93b7e08d8e2a63f88/core/dbt/adapters/base/column.py#L91)
Since normalization code uses varchar for string type (and not text) on postgres, we need to set the max length possible when using unlimited varchars
(cf : https://dba.stackexchange.com/questions/189876/size-limit-of-character-varying-postgresql)
#}

{% macro postgres__get_columns_in_relation(relation) -%}
{% call statement('get_columns_in_relation', fetch_result=True) %}
select
column_name,
data_type,
COALESCE(character_maximum_length, 10485760),
numeric_precision,
numeric_scale

from {{ relation.information_schema('columns') }}
where table_name = '{{ relation.identifier }}'
{% if relation.schema %}
and table_schema = '{{ relation.schema }}'
{% endif %}
order by ordinal_position

{% endcall %}
{% set table = load_result('get_columns_in_relation').table %}
{{ return(sql_convert_columns_in_relation(table)) }}
{% endmacro %}
Expand Up @@ -5,3 +5,7 @@
{% macro oracle__current_timestamp() %}
CURRENT_TIMESTAMP
{% endmacro %}

{% macro databricks__current_timestamp() %}
CURRENT_TIMESTAMP
{% endmacro %}
Expand Up @@ -8,6 +8,10 @@
string
{% endmacro %}

{%- macro databricks__type_json() -%}
string
{%- endmacro -%}

{%- macro redshift__type_json() -%}
{%- if redshift_super_type() -%}
super
Expand Down Expand Up @@ -91,6 +95,10 @@
INT
{% endmacro %}

{% macro databricks__type_int() %}
INT
{% endmacro %}


{# bigint ------------------------------------------------- #}
{% macro mysql__type_bigint() %}
Expand All @@ -105,6 +113,10 @@
BIGINT
{% endmacro %}

{% macro databricks__type_bigint() %}
BIGINT
{% endmacro %}


{# numeric ------------------------------------------------- --#}
{% macro mysql__type_numeric() %}
Expand All @@ -115,6 +127,10 @@
Float64
{% endmacro %}

{% macro databricks__type_numeric() %}
FLOAT
{% endmacro %}


{# timestamp ------------------------------------------------- --#}
{% macro mysql__type_timestamp() %}
Expand Down Expand Up @@ -146,6 +162,12 @@
timestamp
{% endmacro %}

{#-- Spark timestamps are already 'point in time', even if converted / stored without the original tz info, relative to session tz --#}
{#-- cf: https://docs.databricks.com/spark/latest/dataframes-datasets/dates-timestamps.html --#}
{% macro databricks__type_timestamp_with_timezone() %}
timestamp
{% endmacro %}

{#-- MySQL doesnt allow cast operation to work with TIMESTAMP so we have to use char --#}
{%- macro mysql__type_timestamp_with_timezone() -%}
char
Expand Down
Expand Up @@ -6,6 +6,7 @@
- Postgres: json_extract_path_text(<from_json>, 'path' [, 'path' [, ...}}) -> https://www.postgresql.org/docs/12/functions-json.html
- MySQL: JSON_EXTRACT(json_doc, 'path' [, 'path'] ...) -> https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html
- ClickHouse: JSONExtractString(json_doc, 'path' [, 'path'] ...) -> https://clickhouse.com/docs/en/sql-reference/functions/json-functions/
- Databricks: get_json_object(json_txt, 'path') -> https://spark.apache.org/docs/latest/api/sql/#get_json_object
#}

{# format_json_path -------------------------------------------------- #}
Expand Down Expand Up @@ -42,6 +43,15 @@
{{ "'$.\"" ~ json_path_list|join(".") ~ "\"'" }}
{%- endmacro %}

{% macro databricks__format_json_path(json_path_list) -%}
{# -- '$.x.y.z' #}
{%- set str_list = [] -%}
{%- for json_path in json_path_list -%}
{%- if str_list.append(json_path.replace("'", "\\'")) -%} {%- endif -%}
{%- endfor -%}
{{ "'$." ~ str_list|join(".") ~ "'" }}
{%- endmacro %}

{% macro redshift__format_json_path(json_path_list) -%}
{%- set quote = '"' if redshift_super_type() else "'" -%}
{%- set str_list = [] -%}
Expand Down Expand Up @@ -86,6 +96,14 @@
json_extract({{ from_table}}.{{ json_column }}, {{ format_json_path(json_path_list) }})
{%- endmacro %}

{% macro databricks__json_extract(from_table, json_column, json_path_list, normalized_json_path) -%}
{%- if from_table|string() == '' %}
get_json_object({{ json_column }}, {{ format_json_path(json_path_list) }})
{% else %}
get_json_object({{ from_table }}.{{ json_column }}, {{ format_json_path(json_path_list) }})
{% endif -%}
{%- endmacro %}

{% macro oracle__json_extract(from_table, json_column, json_path_list, normalized_json_path) -%}
json_value({{ json_column }}, {{ format_json_path(normalized_json_path) }})
{%- endmacro %}
Expand Down Expand Up @@ -191,6 +209,10 @@
JSONExtractRaw(assumeNotNull({{ json_column }}), {{ format_json_path(json_path_list) }})
{%- endmacro %}

{% macro databricks__json_extract_scalar(json_column, json_path_list, normalized_json_path) -%}
get_json_object({{ json_column }}, {{ format_json_path(json_path_list) }})
{%- endmacro %}

{# json_extract_array ------------------------------------------------- #}

{% macro json_extract_array(json_column, json_path_list, normalized_json_path) -%}
Expand Down Expand Up @@ -237,6 +259,10 @@
JSONExtractArrayRaw(assumeNotNull({{ json_column }}), {{ format_json_path(json_path_list) }})
{%- endmacro %}

{% macro databricks__json_extract_array(json_column, json_path_list, normalized_json_path) -%}
get_json_object({{ json_column }}, {{ format_json_path(json_path_list) }})
{%- endmacro %}

{# json_extract_string_array ------------------------------------------------- #}

{% macro json_extract_string_array(json_column, json_path_list, normalized_json_path) -%}
Expand Down
Expand Up @@ -4,12 +4,43 @@
- the column _airbyte_ab_id does not exists in the normalized tables and make sure it is well populated.
#}

{%- macro get_columns_in_relation_if_exist(target_table) -%}
{{ return(adapter.dispatch('get_columns_in_relation_if_exist')(target_table)) }}
{%- endmacro -%}

{%- macro default__get_columns_in_relation_if_exist(target_table) -%}
{{ return(adapter.get_columns_in_relation(target_table)) }}
{%- endmacro -%}

{%- macro databricks__get_columns_in_relation_if_exist(target_table) -%}
{%- if target_table.schema is none -%}
{%- set found_table = True %}
{%- else -%}
{% call statement('list_table_infos', fetch_result=True) -%}
show tables in {{ target_table.schema }} like '*'
{% endcall %}
{%- set existing_tables = load_result('list_table_infos').table -%}
{%- set found_table = [] %}
{%- for table in existing_tables -%}
{%- if table.tableName == target_table.identifier -%}
{% do found_table.append(table.tableName) %}
{%- endif -%}
{%- endfor -%}
{%- endif -%}
{%- if found_table -%}
{%- set cols = adapter.get_columns_in_relation(target_table) -%}
{{ return(cols) }}
{%- else -%}
{{ return ([]) }}
{%- endif -%}
{%- endmacro -%}

{%- macro need_full_refresh(col_ab_id, target_table=this) -%}
{%- if not execute -%}
{{ return(false) }}
{%- endif -%}
{%- set found_column = [] %}
{%- set cols = adapter.get_columns_in_relation(target_table) -%}
{%- set cols = get_columns_in_relation_if_exist(target_table) -%}
{%- for col in cols -%}
{%- if col.column == col_ab_id -%}
{% do found_column.append(col.column) %}
Expand All @@ -18,7 +49,7 @@
{%- if found_column -%}
{{ return(false) }}
{%- else -%}
{{ dbt_utils.log_info(target_table ~ "." ~ col_ab_id ~ " does not exist yet. The table will be created or rebuilt with dbt.full_refresh") }}
{{ dbt_utils.log_info(target_table ~ "." ~ col_ab_id ~ " does not exist. The table needs to be rebuilt in full_refresh") }}
{{ return(true) }}
{%- endif -%}
{%- endmacro -%}
Expand Down

0 comments on commit 0232182

Please sign in to comment.