From 0232182c36a70851cea23d5bc0630a211d0124f3 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?L=C3=A9on=20Stefani?= Date: Wed, 6 Jul 2022 14:04:55 +0200 Subject: [PATCH] Shrodingers/destination databricks dbt (#1) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * octavia-cli: fix workspace not having anonymous_data_collection property (#13869) * Update connection update calls to use central utility to ensure connection update has all data (#13564) * Update connection updates with build update utility * Add buildConnectionUpdate utility * Update components that update the connection to use utility when necessary * Use conection name when saving connection from replication view to prevent override from refreshed catalog * Improve connection check on ReplicationView onSubmit function * Display connection state in connection setting page (#13394) * Display Connection State in Setting page * memoize callback * rendering and confirmaton * setState API * Input validation * remove JSON step * rename apiMethod to `updateState` * test and adjust route * skip if sync is running * prevent state update when sync is running * code editor component * errors fixed * scss style * make linter happy * Back to monaco editor * Remove ability to edit state * Adjust FE code * Fix CSS problem * Update airbyte-webapp/src/locales/en.json Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> * just use PRE to render state for now Co-authored-by: Tim Roes Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> * update api for per stream (#13835) * Update airbyte-protocol.md (#13892) * Update airbyte-protocol.md * Fix typo * Fix prose * Add protocol reviewers for protocol documentation * Remove duplicate * Edited Amplitude, Mailchimp, and Zendesk Support docs (#13897) * deleting SUMMARY.md since we don't need it for docusaurus builds (#13901) * Do not hide unexpected errors in the check connection (#13903) * Do not hide unexpected errors in the check connection * Fix test * Common code to deserialize a state message in the new format (#13772) * Common code to deserialize a state message in the new format * PR comments and type changed to typed * Format * Add StateType and StateWrapper objects to the model * Use state wrapper instead of Either * Switch to optional * PR comments * Support array legacy state * format Co-authored-by: Jimmy Ma * πŸ› Source Amazon Seller Partner: handle start date for financial stream (#13633) * start and end date for finacial stream should not be more than 180 days apart * improve unit tests * make changes to start date for finance stream * update tests * lint changes * update version to 0.2.22 for source-amazon-seller-partner * Normalization: Fix incorrect jinja2 macro `json_extract_array` call (#13894) Signed-off-by: Sergey Chvalyuk * Docs: fixed the broken links (#13915) * 0.2.5 -> 0.2.6 (#13924) Signed-off-by: Sergey Chvalyuk * 13546 Fix integration tests source-postgres Mac OS (#13872) * 13546 Fix integration tests source-postgres Mac OS * 13548 Fixed integration tests source-tidb Mac OS (#13927) * Source MsSql : incr ver to include changes #13854 (#13887) * incr version * put PR id * docker ver * connectors that published (#13932) * Deprecate PART_SIZE_MB in connectors using S3/GCS storage (#13753) * Removed part_size from connectors that use StreamTransferManager * fixed S3DestinationConfigTest * fixed S3JsonlFormatConfigTest * upadate changelog and bump version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * upadate changelog and bump version for Redshift and Snowflake destinations * auto-bump connector version * fix GCS staging test * fix GCS staging test * auto-bump connector version Co-authored-by: Octavia Squidington III * Reverted changes in SshBastionContainer (#13934) * πŸŽ‰ New Source Dockerhub (#13931) * init * implement working source + tests * add docs * add docs * fix bad comments * Update airbyte-integrations/connectors/source-dockerhub/acceptance-test-config.yml * Update airbyte-integrations/connectors/source-dockerhub/Dockerfile * Update airbyte-integrations/connectors/source-dockerhub/.dockerignore * Apply suggestions from code review * Update docs/integrations/sources/dockerhub.md * Update airbyte-integrations/connectors/source-dockerhub/integration_tests/acceptance.py Co-authored-by: George Claireaux * address @Phlair's feedback * address @Phlair's feedback * each record is now a Docker image rather than response page * format * fix unit tests * fix acceptance tests * add icon, definition and generate seed spec * add requests to requirements Co-authored-by: sw-yx * commented out non-relevant tests (#13940) * Bump Airbyte version from 0.39.20-alpha to 0.39.21-alpha (#13938) Co-authored-by: alafanechere * newaction (#13942) * remove test action (#13944) * πŸŽ‰Source-mysql: aligned datatype test (#13945) * [13607] source-mysql: aligned datatype tests for regular and CDC ways + added CHAR fix to CDC processing * #13958 Source Stripe: fix configured catalogs (#13959) * πŸ› Source: Typeform - Update schema for Responses stream (#13935) * Upd responses schema * Upd docs * auto-bump connector version Co-authored-by: Octavia Squidington III * :window: Updated email invitation flow that enables invited users to set name and create password (#12788) * First pass accepting email link invitation * Update Auth service with signInWithEmailLink calls * Add AcceptEmailInvite component * Update FirebaseActionRoute to handle sign in mode * Rename ResetPasswordAction to FirebseActionRoute * Add create password setp to AcceptEmailInvite component * Remove continueURL from invite fetch * Update accept email invite for user to enter both email and password together * Set name during email link signup * Update AcceptEmailInvite to send name * Add updateName to UserService * Update AuthService to set name during sign up * Remove steps from AcceptEmailInvite component Remove setPassword from AuthService * Add header and title to accept invite page * Move invite error messages to en file * For invite link pages, show login link instead of sign up * Disable name update on sign in via email lnk * Resend email invite when the invite link is expired * Fix status message in accept email invite page * Re-enable set user's name during sign up email invite * Update signUpWithEmailLink so that sign up is successful even if we fail to update the user's name * Update comments on GoogleAuthService signInWithEmailLink * Add newsletter and accept terms checkboxes to accept email invite component * Extract signup form from signup page * Extract fields from signup form * Update accept email invite component to use field components from signup form * Ensure that sign up button is disable until form is valid and security checkbox is checked * Make error status text color in accept email link red * Update workspace check in DefaultView so that user lands in workspace selector when there are no workspaces * Add coment around continueUrl param usage in UserService * Remove usless default case in GoogleAuthService * Source Marketo: process fail during creation of an export job (#13930) * #9322 source Marketo: process fail during creation of an export job * #9322 source marketo: upd changelog * #9322 source marketo: fix unit test * #9322 source marketo: fix SATs * auto-bump connector version Co-authored-by: Octavia Squidington III * :window: :wrench: Add eslint rules for CSS modules (#13952) * add eslint-plugin-css-modules rules * Fixes: - turn on eslint css modules rule as error - remove unused styles * add warning message if styled components is used * Revert "add warning message if styled components is used" This reverts commit 4e92b8b2110142bb679f15aeb034e377e0dcc69c. * replace rule severity with words * Update salesforce.md Fixed broken link * :window: πŸ”§ Add auto-fixable linting rules to webapp (#13462) * Add new eslint rules that fit with our code style and downgrade rules to warn * allowExpressions in fragment eslint rule * Enable function-component-definition in eslint and fix styles * Cleanup lint file * Fix react/function-component-definition warnings manually * Add more auto-fixable rules and fix * Fix functions that require usless returns * Update array-type rule to array-simple * Fix eslint errors manually disable assignmentExpression for arrays in prefer-destructuring rule * Auto fix new linting issues after rebase * Enhance /publish to allow for multiple connectors and parallel execution (#13864) * start * revert * azblob * bq * bq denorm * megapublish baaaabyyyy * fix needs * matrix connectors * auto-bump connector version * dont failfast and max parallel 5 * multi runno * minor * testing matrix agents * name * testing multi agents * tmp fix * new multi agents * multi test * tryy * let's do this * magico * fix * label test * couple more connector bumps * temp * things * check this * lets gooo * more connectors * Delete TEMP-testing-command.yml * auto-bump connector version * added comment describing bash part * running single thread * catch sentry cli * auto-bump connector version * destinations * + snowflake * saved * auto-bump connector version * auto-bump connector version * java source bumps * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * remove twice-defined methods * label things * revert action * using the new test action * point at action * wrong tag on action * update pool label * update to use new ec2-github-runner fork * this needs to be more generic than publisher * change publish to run on pool * add comment about runner-pool usage * updated publish command docs for multi & parallel connector runs * auto-bump connector version * auto-bump connector version * auto-bump connector version * unbump failed publish versions * missed dockerfiles * remove failed docs * mssql fix * overhauled the git comment output * bumping a test connector that should work * slight order switcheroo * output connectors properly in first message * auto-bump connector version Co-authored-by: Octavia Squidington III * Bump Airbyte version from 0.39.21-alpha to 0.39.22-alpha (#13979) Co-authored-by: Phlair * Parker/temporal cloud (#13243) * switch to temporal cloud client for now * format * use client cert/key env secret instead of path to secret * add TODO comments * format * add logging to debug timeout issue * add more logging * change workflow task timeout * PR feedback: consolidate as much as possible, add missing javadoc * fix acceptance test, needs to specify localhost * add internal-use only comments * format * refactor to clean up TemporalClient and prepare it for future dependency injection framework * remove extraneous log statements * PR feedback * fix test * return isInitialized true in test * πŸ“„ Postgres source: fix CDC setup order in docs (#13949) * postgres source: fix CDC setup order docs * Update docs/integrations/sources/postgres.md Co-authored-by: Liren Tu * Per-stream state support for Postgres source (#13609) * WIP Per-stream state support for Postgres source * Fix failing test * Improve code coverage * Make global the default state manager * Add legacy adapter state manager * Formatting * Include legacy state for backwards compatibility * Add global state manager * Implement Global/CDC state handling * Fix test issues * Fix issue with updated method signature * Handle empty state case in global state manager * Adjust to protocol changes * Fix failing acceptance tests * Fix failing test * Fix unmodifiable list issue * Fix unmodifiable exception * PR feedback * Abstract global state manager selection * Handle conversion between different state types * Handle invalid conversion * Rename parameter * Refactor state manager creation * Fix failing tests * Fix failing integration tests * Add CDC test * Fix failing integration test * Revert change * Fix failing integration test * Use per-stream for postgres tests * Formatting * Correct stream descriptor validation * Correct permalink * PR feedback * Bump Airbyte version from 0.39.22-alpha to 0.39.23-alpha (#13984) Co-authored-by: pmossman * Adds test for new workflow (#13986) * Adds test for new workflow * Adds airbyte repo * remove testing line * Add new InterpolatedRequestOptionsProvider that encapsulates all variations of request arguments (#13472) * write out new request options provider and refactor components and parts of the YAML config * fix formatting * pr feedback to consolidate body_data_provider to simplify the code * pr feedback get rid of extraneous optional * publish oss for cloud (#13978) workflow to publish oss artifacts that cloud needs to build against use docker buildx to create arm images for local development * skip debezium engine startup in case no table is in INCREMENTAL mode (#13870) * πŸŽ‰ Source Github: break point added for workflows_runs stream (#13926) Signed-off-by: Sergey Chvalyuk * 6339: error when attempting to use azure sql database within an elastic pool as source for cdc based replication (#13866) * 6339: debug info * 6339: not using 'USE' on Azure SQL servers * 6339: cleanup * 6339: cleanup2 * 6339: cleanup3 * 6339: versions/changelogs updated * 6339: merge from master (consolidation issue) * 6339: dev connector version (for testing in airbyte cloud) * 6339: code review implementation * 6339: apply formatting * in case runners fail to spin up, this needs to run on github-hosted (#13996) * 12708: Add an option to use encryption with staging in Redshift Destination (#13675) * 12708: Add an option to use encryption with staging in Redshift Destination * 12708: docs/docker configs updated * 12708: merge with master * 12708: merge fix * 12708: code review implementation * 12708: fix for older configs * 12708: fix for older configs in check * 12708: merge from master (consolidation issue) * 12708: versions updated * :tada: New Source: Webflow (#13617) * Added webflow code * Updated readme * Updated README * Added webflow to source_definitions.yaml * Enhanced documentation for the Webflow source connector * Improved webflow source connector instructions * Moved Site ID to before API token in Spec.yaml (for presentation in the UI) * Addressed comments in PR. * Changes to address requests in PR review * Removed version from config * Minor udpate to spec.yaml for clarity * Updated to pass the accept-version as a constant rather than parameter * Updated check_connection to hit the collections API that requires both site id and the authentication token. * Fixed the test_check_connection to use the new check_connection function * Added a streams test for generate_streams * Re-named "autentication" object to "auth" to be more consistent with the way it is created by the CDK * Added in an explict line to instantiante an "auth" object from WebflowTokenAuthenticator, to make it easier to describe in the blog * Fixed a typo in a comment * Renamed some classes to be more intuitive * Renamed class to be more intuitive * Minor change to an internal method name * Made _get_collection_name_to_id_dict staticmethod * Fixed a unit-test error that only appeared when running " python -m pytest -s unit_tests". This was caused by Mocked settings from test_source.py leaking into test_streams.py * format: add double quotes and remove unused import * readme: remove semantic version naming of connector in build commands * Updated spec.yaml * auto-bump connector version * format files * add changelog * update dockerfile * auto-bump connector version Co-authored-by: sajarin Co-authored-by: Octavia Squidington III Co-authored-by: marcosmarxm * Source-oracle: fixed tests + checkstyle (#13997) * Source-oracle: fixed tests + checkstyle * πŸ›Destination-mysql: fixed integration test and build process (#13302) * [13180] destination-mysql: fixed integration test * update changelog to include debezium version upgrade (#13844) * make table headers look less like successes (#13999) * source-twilio: implement lookback windows (#13896) * Revert "12708: Add an option to use encryption with staging in Redshift Destination (#13675)" (#14010) This reverts commit aa28d448d820df9d79c2c0d06b38978d1108fb2c. * Revert "6339: error when attempting to use azure sql database within an elastic pool as source for cdc based replication (#13866)" (#14011) This reverts commit 0d870bd37bc3b5cd798b92115d73bcc45a42d8f7. * [low-code connectors] BasicHttpAuthenticator (#13733) * implement basichttpauthenticator * add optional refresh access token authenticator * remove prints * type hints * Fix and unit test * missing test * Add class to __init__ file * Add comment * migrate JsonSchemas to use basic path instead of JSONPath (#13917) * scaffold for catalog diff, needs fixing on type handling and tests (#13786) * Prepare release of JDBC connectors (#13987) * Prepare release of JDBC connectors * Update source definitions manually * use built in check for if path is definite (#13834) * 13535 Fixed bastion network for integration tests (#14007) * doc: add error troubleshooting `docker-compose up` (#13765) * fix: duplicate resource allocations in `airbyte-temporal` deployment (#13816) * helm-chart: Fix worker deployment format error (#13839) * add catalog diff connection read (#13918) * doc: fix small typo on Shopify documentation (#13992) * add streams to reset to job info (#13919) * Generate api for changes in #13370 and make code compatible (#14014) * Generate api for per-stream updates #13835 (#14021) * Revert "Prepare release of JDBC connectors (#13987)" (#14029) This reverts commit df759b30778082508e2872513800fac34d98ff7c. * Fix per stream state protocol backward compatibility (#14032) * rename state type field to fix backwards compatibility issue * replace usages of stateType with type * support semi incremental by adding extractor record filter (#13520) * support semi incremental by adding extractor record filter * refactor extractor into a record_selector that supports extraction and filtering of response records * Remove pydantic spec from amazon ads and use YAML spec (#13988) * add EdDSA support in SSH tunnel (#9494) * add EdDSA support * verify EdDSA support works correct Co-authored-by: Yurii Bidiuk * πŸŽ‰New source connector: source-metabase (#13752) * Add docs * Close metabase session when sync finishes * Close session in check_connection * Add source definition to seed * Add icon * improve cdc check for connectors (#14005) * improve should use cdc check * Revert "improve should use cdc check" This reverts commit 7d01727279d21d33a6c18ed3227ee94432636120. * improve should use cdc check * add unit test * Update webflow.md * Update webflow.md * Update webflow.md * Remove legacy sentry code from cdk (#14016) * rip sentry out of cdk * remove sentry dsn from gsc * Update webflow.md * Update webflow.md * Fixed broken links (#14071) * πŸͺŸPersist unsaved changes on schema refresh (#13895) * add form values tracker context * add clarifying comment * add same functionality to create connection * Update airbyte-webapp/src/components/CreateConnectionContent/CreateConnectionContent.tsx Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> * Fixes broken links so we can deploy again (#14075) also adds better error message for when this happens to others * Adds symmary.md to gitignore (#14078) * Added webflow icon (#14069) * Added webflow icon * Added icon * Build create connection form build failure (#14081) * Fix CDK obfuscation of nested secrets (#14035) * Added Buy Credits section to Managing Airbyte Cloud (#13905) * Added Buy Credits section to Managing Airbyte Cloud * Made some style changes * Made edits based on Natalie's suggestions * Deleted link * Deleted line * Edited email address * Updated reaching out to sales sentence * disable es-lit to fix build (#14087) * Release source connectors (#14077) * Release source connectors * Fix issue with database connection in test * Fix failing tests due to authentication * auto-bump connector version * auto-bump connector version * auto-bump connector version Co-authored-by: Octavia Squidington III * Bump Airbyte version from 0.39.23-alpha to 0.39.24-alpha (#14094) Co-authored-by: jdpgrailsdev * Emit the state to remove in the airbyte empty source (#13725) What This PR updates the EmptyAirbyteSource in order to perform a partial update and handle the new state message format. How The empty will now emit different messages based on the type of state being provided: Per stream: it will emit one message per stream that have been reset Global: It will emit one global message that will contain null for the stream that have been reset including the shared state Co-authored-by: Jimmy Ma * Add StatePersistence object (#13900) Add a StatePersistence object that supports Read/Writes of States to the DB with StreamDescriptor fields The only migrations that is supported are * moving from LEGACY to GLOBAL * moving from LEGACY to STREAM * All other state type migrations are expected to go through an explicit reset beforehand. * secret-persistence: Hashicorp Vault Secret Store (#13616) Co-authored-by: Amanda Murphy Co-authored-by: Benoit Moriceau * πŸ› Source Hubspot: remove `AirbyteSentry` dependency (#14102) * fixed * updated changelog * auto-bump connector version Co-authored-by: Octavia Squidington III * fix: format VaultSecretPersistenceTest.java (#14110) * Source Hubspot: extend error logging (#14054) * #291 incall - source Hubspot: extend error logging * huspot: upd changelog * auto-bump connector version Co-authored-by: Octavia Squidington III * Update webflow.md (#14083) * Update webflow.md Removed a description that is only applicable to people that are writing connector code, not to _users_ of the connector. * Update webflow.md * Update webflow.md * Update webflow.md * Update webflow.md * 12708: Add an option to use encryption with staging in Redshift Desti… (#14013) * 12708: Add an option to use encryption with staging in Redshift Destination (#13675) * 12708: Add an option to use encryption with staging in Redshift Destination * 12708: docs/docker configs updated * 12708: merge with master * 12708: merge fix * 12708: code review implementation * 12708: fix for older configs * 12708: fix for older configs in check * 12708: merge from master (consolidation issue) * 12708: versions updated * 12708: specs updated * 12708: specs updated * 12708: removing autogenerated files from PR * 12708: changelog updated * auto-bump connector version Co-authored-by: Octavia Squidington III * Source PayPal Transaction: Update Transaction Schema (#13682) * Update transaction schema. * Transform money values from strings to floats or integers. Co-authored-by: nataly Co-authored-by: Augustin * fix(jsonSchemas): raise error when items property not provided (#14018) * fix stream name in stream transformation update (#14044) * πŸ› Destination Redshift: Improved discovery for redshift-destination not SUPER streams (#13690) airbyte-12843: Improved discovery for redshift-destination not SUPER tables, excluded views from discovery. * Remove skiptests option (#14100) * update sentry release script (#14123) * Remove "additionalProperties": false from specs for connectors with staging (#14114) * Remove "additionalProperties": false from spec for connectors with staging * Remove "additionalProperties": false from spec for Redshift destination * bump versions * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version * auto-bump connector version Co-authored-by: Octavia Squidington III * [14003] source-oracle: added custom jdbc field (#14092) * [14003] source-oracle: added custom jdbc field * Add JobErrorReporter for sending sync job connector failures to Sentry (#13899) * skeleton for reporting connector errors to sentry * report on job failures instead of attempt failures * report sync job failures with relevant metadata using JobErrorReporter * send stack traces from python connectors to sentry * test JobCreationAndStatusUpdate and JobErrorReporter * logs * refactor into helper, initial tests * using sentry * run format * load reporting client from env * load sentry dsn from env * send java stack traces to sentry * test sentryclient, refactor to use Hub instance * ErrorReportingClient.report -> .reportJobFailureReason * inject exception helper, test stack trace parse error tagging * rm logs * more stack trace tests * remove logs * fix failing tests * rename ErrorReportingClient to JobErrorReportingClient * rename vars in docker-compose * Return an Optional instead of null when parsing stack traces * dont remove airbyte prefix when setting release name * from_trace_message static * remove failureSummary from jobfailure input, get from Job * send stacktrace string if we weren't able to parse * set deployment mode tag * update .env * just log if something goes wrong * Use StateMessageHelper in source (#14125) * Use StateMessageHelper in source * PR feedback and formatting * More PR feedback * Revert change * Revert changes * Bump Airbyte version from 0.39.24-alpha to 0.39.25-alpha (#14124) Co-authored-by: brianjlai * Refactor acceptance tests and utils (#13950) * Refactor Basic acceptance tests and utils * Refactor Advanced acceptance tests and utils * Remove unused code * Clear destination db data during cleanup * Cleanup comments * cleanup init code * test creating new desintation db for each test * cleanup desintation db init * Allow to edit api client * pull in temporal cloud changes * Rename helper to harness; set some funcs to private; turn init into constructor * add func to set env vars instead of using static vars and move some functionality out of init into acceptance tests * update javadoc Co-authored-by: Davin Chia * fix javadoc formatting * fix var naming Co-authored-by: Davin Chia * Bump Airbyte version from 0.39.25-alpha to 0.39.26-alpha (#14141) Co-authored-by: terencecho * πŸŽ‰ octavia-cli: Add ability to get existing resources (#13254) * 13541 Fixed integration tests source-db2 Mac OS (#14133) * 13523 Fix integration tests destination-cassandra Mac OS (#14134) * πŸ› Source Hubspot: fixed SAT test, commented out expected_records (#14140) * :bug: Source Intercom: extend `Contacts` schema with new properties (#14099) * Source Twilio: adopt best practices (#14000) * #1946 Source twilio: aopt best practices - tune tests * #1946 add expected_records to acceptance-test-config.yml * #1946 source twilio - upd schema and changelog * #1946 fix expected_records * #1946 source twilio: rm alerts from expected records as they expire in 30 days * #1946 source twilio: bump version * πŸŽ‰ Source BingAds: expose hourly/daily/weekly/monthly options from configuration (#13801) * #12489 - expose hourly/daily/weekly/monthly reports in discovery by default instead of in the connector's configuration settings removed: config settings for hourly/daily/weekly/monthly reports added: default value for all periodic reports to True * #12489 - expose hourly/daily/weekly/monthly reports in discovery by default instead of in the connector's configuration settings removed: unused class variables, if-statement * #12489 - expose hourly/daily/weekly/monthly reports in discovery by default instead of in the connector's configuration settings removed: unused variables from config * auto-bump connector version Co-authored-by: Octavia Squidington III * remove VersionMismatchServer (#14076) * remove VersionMismatchServer * remove VersionMismatchServerTest * revert intended changes * Increase instance termination time limit to 3 hours to accommodate connector builds. (#14181) * Use correct bash comment symbol. (#14183) * πŸŽ‰ New Source: Orbit.love (#13390) * source-orbit: add definition and specs (#14189) * πŸŽ‰ Base Norrmalization: clean-up Redshift `tmp_schemas` after SAT (#14015) Now after `base-normalization` SAT the Destination Redshift will be automatically cleaned up from test leftovers. Other destinations are not covered yet. * Source Salesforce: fix customIntegrationTest for SAT (#14172) * Source Amazon Ads: increase timeout for SAT (#14167) * πŸŽ‰ Introduce Google Analytics Data API source (#12701) * Introduce Google Analytics Data API source https://developers.google.com/analytics/devguides/reporting/data/v1 * Add Google Analytics Data API source PR link * Add `client` class for Google Analytics Data API * Move dimensions and metrics extraction to the `client` class In the Google Analytics Data API * Change the copyright date to 2022 in Google Analytics Data API * fix: removing incremental syncs * fix: change project_id to string * fix: flake check is failing * chore: added it to source definitions * chore: update seed file Co-authored-by: Harshith Mullapudi * πŸ› Destination Redshift: use s3 bucket path for s3 staging operations (#13916) * Publish acceptance test utils maven artifact (#14142) * Fix StatePersistence Legacy read/write (#14129) StatePersistence will wrap/unwrap legacy state on write/read to ensure compatibility with the old behavior/data. * πŸŽ‰ Destination connectors: Improved "SecondSync" checks in Standard Destination Acceptance tests (#14184) * [11731] Improved "SecondSync" checks in Standard Destination Acceptance tests * πŸ› Source Zendesk Support: fixed "Retry-After" non integer value (#14112) Signed-off-by: Sergey Chvalyuk * Source Tiktok Marketing: Videometrics (#13650) * added video metrics in streams.py * common metrics list updated. * updated streams.py with extended metrics required. * updated stream_test * updated configured_catalog * video metrics required list updated. * chore: formatting * chore: bump version in source definitions * chore: update seed file Co-authored-by: Harshith Mullapudi * πŸŽ‰ Source Github: secondary rate limits has to retry (#13955) Signed-off-by: Sergey Chvalyuk * Harshith/test pr 13118 (#14192) * Firebolt destination * feat: Write method dropdown * feat: Use future-proof Auth in SDK * refactor: Move writer instantiation * fix: tests are failing * fix: tests are failing * fix: tests are failing * chore: added connector to definitions * fix: formatting and spec * fix: formatting for orbit Co-authored-by: ptiurin * πŸͺŸ :art: Show credit usage on chart's specific day (#13503) * add tooltip to chart * Fixes: - update main chart color; - change onHover background color * change chart color pallet to grey 500 * update color reference * remove opacity from UsageCell * πŸ› destination-redshift: use s3 bucket path for s3 cleanup (#14190) * Improve documentation for Postgres Source (#13830) * Improve documentation for Postgres Source * add information about additional JDBC params * add anchors for doc sections * fix link to CDC on Bare Metal * add more details about parsing date/time values * add doc link to SSH fields * Handle null reset source config (#14202) * handle null reset source config * format * Wait indefinitely if connection is not active (#14200) * also wait indefinitely if connection is deleted * fix test * Bump Airbyte version from 0.39.26-alpha to 0.39.27-alpha (#14204) Co-authored-by: lmossman * Bmoric/feature flag for state deserialization (#14127) * Add Feature flag * Add default feature flag value * Update test * remove unsused * tmp * Update tests * rm unwanted change * PR comments * [low-code connectors] default types and default values (#14004) * default types and default values * cleanup * fixes so read works * remove prints and trycatch * comment * remove unused param * split file * extract method * extract methods * comment * optional * fix test * cleanup * delete interpolated request header provider * simplify next page url paginator interface * comment * format * add state type endpoint (#14111) * Bump Airbyte version from 0.39.27-alpha to 0.39.28-alpha (#14210) Co-authored-by: sherifnada * πŸ› source-orbit: remove workspace_old.json (#14208) * Fix: Docs plural login redirecting to wrong URL (#14207) * [docs] fix numbering and incorrect filename in CDK docs (#13045) * [docs] fix numbering in CDK docs * Update 5-declare-schema.md * Update 5-declare-schema.md * Update 6-read-data.md * Update 8-test-your-connector.md * Remove the old scheduler from HelmCharts helper (#14187) * Remove the old scheduler from HelmCharts helper The old scheduler was removed as part of https://github.com/airbytehq/airbyte/pull/13400 * Remove legacy `scheduler` comment in HelmCharts * Source Gitlab: add GroupIssueBoards stream (#13252) * GitLab Source: add GroupIssueBoards stream * Address stream schema comments * Address comments * Bump version * Add as empty stream * run seed file source (#14215) * fix 'cannot reach server' error on demo instance (#10020) * Update CODEOWNERS (#14209) * πŸŽ‰ Source Github: use GraphQL for `reviews` stream (#13989) Signed-off-by: Sergey Chvalyuk * workflow for publishing artifacts for cloud (#14199) * fix sentry org slug change (#14218) * Source File: correct spec json to match json format (#13738) * Upgrade spotless version and remove jvmargs workaround (#13705) * Source Zendesk Chat: Process large amount of data in batches for incremental (#14214) * increased the limit of itens in request * Configuration for max api pages on requests * included api_pagination_limit in sample * included api_pagination_limit in invalid_config * creating new table for chat_session * reverted api_pagination_limit approach * removed api_pagination_limit from TimeIncrementalStream * correct chat json * bump connector version * add changelog * run format * auto-bump connector version Co-authored-by: Roberto Bonnet Co-authored-by: Octavia Squidington III * Remove all @ts-ignore (#14221) * Bump hadoop to use version 3.3.3 (#14182) * Change the persistence activity to use the new persistence layer (#14205) * Change the persistence activity to use the new persistence layer * Use lombok * format * Use new State message helper * Fix build (#14225) * Fix build * Fix test * Use new state persistence for state reads (#14126) * Inject StatePersistence into DefaultJobCreator * Read the state from StatePersistence instead of ConfigRepository * Add a conversion helper to convert StateWrapper to State * Remove unused ConfigRepository.getConnectionState * Temporal per stream resets (#13990) * remove reset flags from workflow state + refactor * bring back cancelledForReset, since we need to distinguish between that case and a normal cancel * delete reset job streams on cancel or success * extract isResetJob to method * merge with master * set sync modes on streams in reset job correctly * format * Add test for getAllStreamsForConnection * fix tests * update more tests * add StreamResetActivityTests * fix tests for default job creator * remove outdated comment * remove debug lines * remove unused enum value * fix tests * fix constant equals ordering * make job mock not static * DRY and add comments * add comment about deleted streams * Remove io.airbyte.config.StreamDescriptor * regisster stream reset activity impl * refetch connection workflow when checking job id, since it may have been restarted * only cancel if workflow is running, to allow reset signal to always succeed even if batched with a workflow start * fix reset signal to use new doneWaiting workflow state prop * try to fix tests * fix reset cancel case * add acceptance test for resetting while sync is running * format * fix new acceptance test * lower sleep on test * raise sleep * increase sleep and timeout, and remove repeated test * use CatalogHelpers to extract stream descriptors * raise sleep and timeout to prevent transient failures * format Co-authored-by: alovew * fix PostgresJdbcSourceAcceptanceTest by activating the feature flag (#14240) * fix PostgresJdbcSourceAcceptanceTest by activating the feature flag * fix AbstractJdbcSourceAcceptanceTest as well * fix expected_spec for strict encrypt * [13539] Fix integration tests source-clickhouse Mac OS (#14201) * [13539] Fix integration tests source-clickhouse Mac OS fixed unit tests * [13524] Fix integration tests destination-clickhouse Mac OS fixed unit tests * 6339: error when attempting to use azure sql database within an elastic pool as source for cdc based replication (#14121) * 6339: implementation * 6339: changelog updated * 6339: definitions updated * 6339: definitions reverted * 6339: still struggling with publishing * auto-bump connector version * 6339: definitions reverted - correct * auto-bump connector version Co-authored-by: Octavia Squidington III * πŸͺŸ 🎨 Update favicon and table row image styles (#14020) * style changes to favicon and imageblock * fix import * revert component and props names to block * Update airbyte-webapp/src/components/ImageBlock/ImageBlock.tsx Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> * Update airbyte-webapp/src/components/ImageBlock/ImageBlock.module.scss Co-authored-by: Vladimir * Update airbyte-webapp/src/components/ImageBlock/ImageBlock.tsx Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> * Update airbyte-webapp/src/components/ImageBlock/ImageBlock.module.scss Co-authored-by: Vladimir * add storybook Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> Co-authored-by: Vladimir * upgrade potgresql version to fix default timestamp handling (#14211) * implement logic to trigger snapshot of new tables via debezium (#13994) * implement logic to trigger snapshot of new tables via debezium * format * improve test condition * fix build * BigQuery Denormalized "airbyte_type": "big_integer" to INT64 (#14079) * BigQuery Denormalized "airbyte_type": "big_integer" to INT64 * updated changelog * added unit test * removed star import * fixed checkstyle * bump version * auto-bump connector version Co-authored-by: Octavia Squidington III * Add Metrics section to Scaling Airbyte doc (#14224) * Added metrics section to scaling airbyte doc * Updated URL in doc * Deleted link * Added link * Added backslashes before brackets that aren't links * Edited note about tagged metrics * Changed list * Changed spacing * Changed spacing * Changed spacing * Deleted period * Fixed broken firebolt link * Added tables * Cleaned up wording in tables * Add ability to provide source/destination connector docker image (#14266) * Add ability to provide source/destination connector docker image * Make constant public * Bump Airbyte version from 0.39.28-alpha to 0.39.29-alpha (#14232) * disable flaky cmw test temporarily (#14269) * release new postgres source connector version 0.4.29 (#14265) * release new postgres source connector version 0.4.29 * add changelog * auto-bump connector version Co-authored-by: Octavia Squidington III * :tada: Source Tiktok marketing - remove granularity config option (#13890) * Removed granularity config option from spec, added corresponsing streams for each support granularity (hourly daily, lifetime), updated unittests, SAT * auto-formating * auto-formating * removed AdvertisersIds stream from list of exposed streams, updated docs * expose new style streams since 0.1.13, expose old streams for config for older version * update spec * fixed path to catalog * increased timeout * source bing-ads to ga (#13679) * Source Tiktok marketing - increase connector version (#14272) * increased connector version * increased connector version in seed * auto-bump connector version Co-authored-by: Octavia Squidington III * Fix flaky connection manager workflow test (#14271) * try thread sleep instead of test env, and run 100 times * replace testEnv.sleep with Thread.sleep in several tests * replace RepeatedTest with Test * replace testEnv.sleep with Thread.sleep after signals are executed * run each test 100 times to see if any are flaky * add log * change repetitions to 100 to avoid out of memory * format * swap repeated test for normal test * 13532 Fixed integration tests destination-mssql Mac OS (#14252) * 13532 Fixed integration tests destination-mssql Mac OS * Source Google Analytics: Specify integer for dimension ga:dateHourMinute (#14298) * Specify integer for dimension ga:dateHourMinute * Update changelog * πŸŽ‰ Source Github: rename field `mergeable` to `is_mergeable` (#14274) Signed-off-by: Sergey Chvalyuk * Update Airbyte Client (#14270) * #12668 #13198 enable full refresh, disable incremental and expected_records (#14191) * πŸŽ‰ Destination S3: update INSTANCE_PROFILE to use AWSDefaultProfileCredential (#14231) Co-authored-by: Mike Balmer * Source Zendesk Support: pagination group membership (#14304) * add next_page_tooken and request * correct group_membership paginatin * update doc * auto-bump connector version Co-authored-by: Octavia Squidington III * πŸͺŸ πŸ› Fix OAuth validation not allowing to create source or destination (#14197) * Enable "Set up source/destination" button only if the form is valid * Update how ServiceForm initial values are patched so that it correctly patches the configuration with default values * Update initial values patching in service form to use initialValues to preserve already set values Update useOAuthFlowAdapter to correctly merge the values from the oauth response * Remove unused values var from ServiceForm * Add acceptance tests for per-stream state updates (#14263) * Add acceptance tests for per-stream state updates * PR feedback * Formatting * More PR feedback * PR feedback * Remove unused constant * Make sure that the feature flag is transfer to container (#14314) * Make sure that the feature flag is transfer to container * propagate the feature flags * Avoid propagating the feature flags * Fix tests * Source Postgres : use more simple and comprehensive query to get selectable tables (#14251) * use more simple and comprehensive query to get selectable tables * cover case when schema is not specified * add test to check discover with different ways of grants * format * incr ver * incr ver * auto-bump connector version Co-authored-by: Octavia Squidington III * Fixed broken link * Fix for deleting stream resets (#14322) * Fix for deleting stream resets * Fix build by updating var (#14321) * Edited formatting (#14275) * Avoid error when creating dupl stream reset (#14328) * Bump Airbyte version from 0.39.29-alpha to 0.39.30-alpha (#14329) Co-authored-by: lmossman * Release new postgres strict encrypt version (#14331) * Bump postgres strict encrypt version * Update changelogs * Update doc * Release new destination s3 version to pick up latest change (#14332) * Bump s3 version * Update pr id * auto-bump connector version Co-authored-by: Octavia Squidington III * 13538 Fix integration tests destination-scylla Mac OS (#14308) * 13538 Fix integration tests destination-scylla Mac OS * Update cdk-speedrun.md (#14258) Added a link at the bottom of the article , so the user may find the more in-depth tutorial about building a real-world connector. * Update README.md (#14303) Added a link to https://airbyte.com/tutorials/extract-data-from-the-webflow-api in Webflow's README.md * Update building-a-python-source.md (#14262) * Update webflow.md (#14254) Added a link to the new blog - https://airbyte.com/tutorials/extract-data-from-the-webflow-api Co-authored-by: Simon SpΓ€ti * Alex/declarative stream incremental fix (#14268) * checkout files from test branch * read_incremental works * reset to master * remove dead code * comment * fix * Add test * comments * utc * format * small fix * Add test with rfc3339 * remove unused param * fix test * πŸ› SingerSource: Fix incompatibilities and typing issues (#14148) * Use logging.Logger in SingerSource * Fix SingerSource ConfigContainer This fixes typing issues with `ConfigContainer` and makes it compatible with `split_config`. Fixes #8710. * Fix SingerSource state and catalog typer issues * Rename SingerSource method args to match parent classes * Remove old comment about excluding Singer Co-authored-by: Alexandre Girard * Update source postgres release stage to beta (#14326) * fix NPE (#14353) * fix NPE * Add test * Fix trailing * πŸŽ‰ octavia-cli: Add ability to import existing resources (#14137) * helm chart: Add Image Pull Secrets Param (#14031) * fix format (#14354) * Bump Airbyte version from 0.39.30-alpha to 0.39.31-alpha (#14355) Co-authored-by: benmoriceau * tiktok to ga (#14358) * Update state.state type (#14360) * Run some DATs as part of base-normalization tests (#14312) * Revert "πŸŽ‰ Source Github: rename field `mergeable` to `is_mergeable` (#14274)" (#14338) * Revert "πŸŽ‰ Source Github: rename field `mergeable` to `is_mergeable` (#14274)" * Properly update the hasEmitted state (#14367) * Bmoric/state aggregator (#14364) * Update state.state type * Add state aggregator * Test and format * PR comments * Move to its own package * Update airbyte-workers/src/test/java/io/airbyte/workers/internal/state_aggregator/StateAggregatorTest.java Co-authored-by: Lake Mossman * format * Update airbyte-workers/src/main/java/io/airbyte/workers/internal/state_aggregator/DefaultStateAggregator.java Co-authored-by: Lake Mossman * format Co-authored-by: Lake Mossman * Bump Airbyte version from 0.39.31-alpha to 0.39.32-alpha (#14383) Co-authored-by: alafanechere * πŸ› Source Mixpanel: fix SAT tests (#14349) * Call the new revoke_user_session endpoint from the FE (#13165) * Source Instagram: change releaseStage to GA (#14162) * Source Google Analytics: Change releaseStage to GA (#13957) * source-outreach: fix record parsing and cursor field access (#14386) * Kustomize: Use `resources` since `bases` is deprecated (#14037) * fix: clone api doesn't take update configurations (#13592) * fix: clone api doesn't take update configurations * fix: you will be able to create clone in different workspace * fix: added description to source/destination body * cdk: Attach namespace to stream in catalog (#13923) * Source TiDB: correct jdbc string builder (#14243) * add icon for tidb-connector * Fix TiDB source connector * bump connector version * auto-bump connector version Co-authored-by: marcosmarxm Co-authored-by: Octavia Squidington III * Source Google Ads: use docsaurus feature for warn/note and udpdate doc (#14392) * use docsaurus feature for warn/note and udpdate doc * update description in supported streams * Source Facebook Marketing: allow configuration of MAX_BATCH_SIZE (#14267) * Add max batch size config * Bump semver * add changelog * auto-bump connector version Co-authored-by: Octavia Squidington III * πŸŽ‰ Source Github: add Retry for GraphQL API Resource limitations (#14376) Signed-off-by: Sergey Chvalyuk * Add more metadata to the JobErrorReporter (#14395) * add workspace_id and connector_repository as tags * add tag for connection url * fix urls for job notifier * format * fix failing test * beta -> generally_available (#14315) Signed-off-by: Sergey Chvalyuk * helm chart: Fix/double printing of extra volume mounts (#14091) * SentryJobErrorReporter: better handling of multiline chained java exceptions (#14398) * Docs: deploy on gcp use docusaurus tabs (#14401) * Revert "Kustomize: Use `resources` since `bases` is deprecated (#14037)" (#14415) This reverts commit 5c9a6a5fc655a9e597f755be8fc8ccf805a2537a. * Use Debezium Postgres image for CDC tests (#14318) * Use Debezium Postgres image for CDC tests * Formatting * πŸŽ‰ octavia-cli: Add ability to import all resources (#14374) * Bump Airbyte version from 0.39.32-alpha to 0.39.33-alpha (#14419) Co-authored-by: pedroslopez * πŸ“ MySql source: clarify tinyint to number conversion when size > 1 (#14424) * πŸͺŸ πŸ› Fix Setup Source Button on OAuth Sources (#14413) * don't disable setup button * make eslint happy * one more cleanup * use the spec to decide how to create config object * Bump Airbyte version from 0.39.33-alpha to 0.39.34-alpha (#14428) Co-authored-by: timroes * [low-code cdk] Enable configurable state checkpointing (#14317) * checkout files from test branch * read_incremental works * reset to master * remove dead code * comment * fix * Add test * comments * utc * format * small fix * Add test with rfc3339 * remove unused param * fix test * configurable state checkpointing * update test * fix type hints (#14352) * normalization: Do not return NULL for MySQL column values > 512 chars (#11694) Co-authored-by: Augustin Co-authored-by: Edmundo Ruiz Ghanem <168664+edmundito@users.noreply.github.com> Co-authored-by: Evan Tahler Co-authored-by: Tim Roes Co-authored-by: Charles Co-authored-by: Jonathan Pearlin Co-authored-by: Amruta Ranade <11484018+Amruta-Ranade@users.noreply.github.com> Co-authored-by: Benoit Moriceau Co-authored-by: Jimmy Ma Co-authored-by: Ganpat Agarwal Co-authored-by: Serhii Chvaliuk Co-authored-by: Rajakavitha Kodhandapani Co-authored-by: Yevhen Sukhomud Co-authored-by: Andrii Leonets <30464745+DoNotPanicUA@users.noreply.github.com> Co-authored-by: George Claireaux Co-authored-by: VitaliiMaltsev <39538064+VitaliiMaltsev@users.noreply.github.com> Co-authored-by: Octavia Squidington III Co-authored-by: sw-yx Co-authored-by: Baz Co-authored-by: Octavia Squidington III <90398440+octavia-squidington-iii@users.noreply.github.com> Co-authored-by: alafanechere Co-authored-by: Eugene Co-authored-by: Denis Davydov Co-authored-by: Anna Lvova <37615075+annalvova05@users.noreply.github.com> Co-authored-by: Vladimir Co-authored-by: Phlair Co-authored-by: Parker Mossman Co-authored-by: Adam Co-authored-by: Liren Tu Co-authored-by: pmossman Co-authored-by: Topher Lubaway Co-authored-by: Brian Lai <51336873+brianjlai@users.noreply.github.com> Co-authored-by: Peter Hu Co-authored-by: Subodh Kant Chaturvedi Co-authored-by: Tuhai Maksym Co-authored-by: Alexander Marquardt Co-authored-by: sajarin Co-authored-by: marcosmarxm Co-authored-by: Alexandre Girard Co-authored-by: steve withington Co-authored-by: Leo Sussan Co-authored-by: cenegd Co-authored-by: Tomas Perez Alvarez <72174660+Tomperez98@users.noreply.github.com> Co-authored-by: Lake Mossman Co-authored-by: Sherif A. Nada Co-authored-by: Edward Gao Co-authored-by: Yurii Bidiuk Co-authored-by: Christophe Duong Co-authored-by: Teal Larson Co-authored-by: Sophia Wiley <106352739+sophia-wiley@users.noreply.github.com> Co-authored-by: jdpgrailsdev Co-authored-by: Jimmy Ma Co-authored-by: Stella Chung Co-authored-by: Amanda Murphy Co-authored-by: Mohamed Magdy Co-authored-by: nataly Co-authored-by: Tyler Russell Co-authored-by: Alexander Tsukanov Co-authored-by: Pedro S. Lopez Co-authored-by: brianjlai Co-authored-by: terencecho Co-authored-by: Davin Chia Co-authored-by: terencecho Co-authored-by: Daniel Diamond <33811744+danieldiamond@users.noreply.github.com> Co-authored-by: drrest Co-authored-by: Marcos Marx Co-authored-by: Abhi Vaidyanatha Co-authored-by: Harshith Mullapudi Co-authored-by: Zawar Khan Co-authored-by: ptiurin Co-authored-by: Greg Solovyev Co-authored-by: lmossman Co-authored-by: sherifnada Co-authored-by: Sachin Jangid Co-authored-by: Chris Wu Co-authored-by: Jared Rhizor Co-authored-by: tison Co-authored-by: Roberto Bonnet Co-authored-by: Malik Diarra Co-authored-by: alovew Co-authored-by: Oleksandr Sheheda Co-authored-by: midavadim Co-authored-by: Arsen Losenko <20901439+arsenlosenko@users.noreply.github.com> Co-authored-by: Ryan Lewon Co-authored-by: Mike Balmer Co-authored-by: Anne <102554163+alovew@users.noreply.github.com> Co-authored-by: Liren Tu Co-authored-by: Simon SpΓ€ti Co-authored-by: Albin Skott Co-authored-by: Caleb Fornari Co-authored-by: benmoriceau Co-authored-by: Christian Martin Co-authored-by: jordan-glitch <65691557+jordan-glitch@users.noreply.github.com> Co-authored-by: Daemonxiao <35677990+Daemonxiao@users.noreply.github.com> Co-authored-by: Keith Thompson Co-authored-by: Leo Sussan Co-authored-by: pedroslopez Co-authored-by: timroes Co-authored-by: Johannes Nicolai --- .../airbyte/db/factory/DataSourceFactory.java | 5 + .../bases/base-normalization/.dockerignore | 1 + .../bases/base-normalization/build.gradle | 6 + .../base-normalization/databricks.Dockerfile | 34 +++ .../dbt_project.yml | 72 +++++ .../macros/cross_db_utils/array.sql | 9 + .../macros/cross_db_utils/columns.sql | 28 ++ .../cross_db_utils/current_timestamp.sql | 4 + .../macros/cross_db_utils/datatypes.sql | 22 ++ .../macros/cross_db_utils/json_operations.sql | 26 ++ .../macros/should_full_refresh.sql | 35 ++- .../integration_tests/dbt_integration_test.py | 4 + .../test_nested_streams/dbt_project.yml | 67 +++++ ..._columns_resulting_into_long_names_scd.sql | 91 ++++++ ...plex_columns_resulting_into_long_names.sql | 29 ++ ...ns_resulting_into_long_names_partition.sql | 81 ++++++ ...long_names_partition_double_array_data.sql | 80 ++++++ ..._columns_resulting_into_long_names_ab1.sql | 19 ++ ..._columns_resulting_into_long_names_ab2.sql | 19 ++ ...esulting_into_long_names_partition_ab1.sql | 19 ++ ..._names_partition_double_array_data_ab1.sql | 20 ++ ..._columns_resulting_into_long_names_scd.sql | 116 ++++++++ ...plex_columns_resulting_into_long_names.sql | 22 ++ ...ns_resulting_into_long_names_partition.sql | 19 ++ ...long_names_partition_double_array_data.sql | 18 ++ .../models/generated/sources.yml | 22 ++ ..._columns_resulting_into_long_names_scd.sql | 17 ++ ...plex_columns_resulting_into_long_names.sql | 17 ++ ...ns_resulting_into_long_names_partition.sql | 17 ++ ...long_names_partition_double_array_data.sql | 17 ++ .../test_simple_streams/dbt_project.yml | 67 +++++ .../test_simple_streams/first_dbt_project.yml | 67 +++++ .../dedup_exchange_rate_scd.sql | 109 ++++++++ .../dedup_exchange_rate.sql | 34 +++ .../test_normalization/exchange_rate.sql | 125 +++++++++ .../dedup_exchange_rate_stg.sql | 91 ++++++ .../multiple_column_names_conflicts_stg.sql | 85 ++++++ .../dedup_exchange_rate_ab1.sql | 24 ++ .../dedup_exchange_rate_ab2.sql | 24 ++ .../dedup_exchange_rate_scd.sql | 130 +++++++++ .../dedup_exchange_rate.sql | 27 ++ .../test_normalization/exchange_rate.sql | 25 ++ .../dedup_exchange_rate_stg.sql | 24 ++ .../models/generated/sources.yml | 15 + .../dedup_exchange_rate_ab1.sql | 24 ++ .../dedup_exchange_rate_ab2.sql | 24 ++ .../dedup_exchange_rate_scd.sql | 130 +++++++++ .../dedup_exchange_rate.sql | 27 ++ .../test_normalization/exchange_rate.sql | 25 ++ .../dedup_exchange_rate_stg.sql | 24 ++ .../modified_models/generated/sources.yml | 11 + .../dedup_exchange_rate_scd.sql | 17 ++ .../dedup_exchange_rate.sql | 17 ++ .../test_normalization/exchange_rate.sql | 125 +++++++++ .../dedup_exchange_rate_stg.sql | 91 ++++++ .../data_input/replace_identifiers.json | 3 + .../data_input/replace_identifiers.json | 3 + .../integration_tests/test_ephemeral.py | 2 + .../integration_tests/test_normalization.py | 5 +- .../normalization/destination_type.py | 1 + .../destination_name_transformer.py | 16 +- .../transform_catalog/reserved_keywords.py | 258 ++++++++++++++++++ .../transform_catalog/stream_processor.py | 131 +++++---- .../transform_config/transform.py | 17 ++ ...ons_catalog_expected_databricks_names.json | 32 +++ .../test_destination_name_transformer.py | 6 + .../unit_tests/test_table_name_registry.py | 1 - .../unit_tests/test_transform_config.py | 22 ++ .../destination-databricks/build.gradle | 5 +- .../DatabricksDestinationConfig.java | 4 +- .../databricks/DatabricksSqlOperations.java | 15 +- .../databricks/DatabricksStreamCopier.java | 191 ++++++------- .../DatabricksStreamCopierFactory.java | 4 +- .../src/main/resources/spec.json | 6 +- .../destination/s3/csv/S3CsvWriter.java | 8 +- .../src/components/EntityTable/utils.tsx | 4 +- .../src/config/ConfigServiceProvider.tsx | 2 +- .../src/core/domain/catalog/fieldUtil.ts | 3 +- airbyte-webapp/src/core/jsonSchema/types.ts | 3 +- .../src/core/request/AirbyteClient.ts | 3 +- .../Analytics/useAnalyticsService.tsx | 8 +- .../ConfirmationModalService.tsx | 4 +- .../services/Feature/FeatureService.test.tsx | 4 +- .../hooks/services/Feature/FeatureService.tsx | 8 +- .../src/hooks/services/useConnectionHook.tsx | 8 +- .../src/hooks/services/useConnector.tsx | 7 +- .../src/hooks/services/useConnectorAuth.tsx | 16 +- .../src/hooks/useTypesafeReducer.ts | 2 +- .../packages/cloud/services/auth/reducer.ts | 72 +++-- .../components/UsagePerConnectionTable.tsx | 8 +- .../components/ReplicationView.tsx | 8 +- .../CreationFormPage/CreationFormPage.tsx | 12 +- .../components/DestinationStep.tsx | 5 +- .../OnboardingPage/components/SourceStep.tsx | 5 +- .../connector/DestinationDefinitionService.ts | 8 +- ...tinationDefinitionSpecificationService.tsx | 8 +- .../connector/SourceDefinitionService.ts | 8 +- .../SourceDefinitionSpecificationService.tsx | 8 +- .../Connection/CatalogTree/CatalogSection.tsx | 28 +- .../CatalogTree/components/BulkHeader.tsx | 7 +- .../ConnectionForm/ConnectionForm.tsx | 4 +- .../calculateInitialCatalog.test.ts | 8 +- .../components/SyncCatalogField.tsx | 7 +- .../Connection/ConnectionForm/formConfig.tsx | 4 +- .../Controls/ConnectorServiceTypeControl.tsx | 8 +- .../Sections/auth/useOauthFlowAdapter.tsx | 12 +- .../ServiceForm/serviceFormContext.tsx | 8 +- .../DefaultNormalizationRunner.java | 3 +- .../NormalizationRunnerFactory.java | 1 + 109 files changed, 3005 insertions(+), 327 deletions(-) create mode 100644 airbyte-integrations/bases/base-normalization/databricks.Dockerfile create mode 100644 airbyte-integrations/bases/base-normalization/dbt-project-template-databricks/dbt_project.yml create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/dbt_project.yml create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/scd/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_scd.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_ab1.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_ab2.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_ab1.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab1.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/scd/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_scd.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/sources.yml create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/scd/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_scd.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/dbt_project.yml create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_dbt_project.yml create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_incremental/test_normalization/dedup_exchange_rate.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_tables/test_normalization/exchange_rate.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_views/test_normalization/multiple_column_names_conflicts_stg.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab1.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab2.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_incremental/test_normalization/dedup_exchange_rate.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_tables/test_normalization/exchange_rate.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/sources.yml create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab1.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab2.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_incremental/test_normalization/dedup_exchange_rate.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_tables/test_normalization/exchange_rate.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/sources.yml create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_incremental/test_normalization/dedup_exchange_rate.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_tables/test_normalization/exchange_rate.sql create mode 100644 airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql create mode 100644 airbyte-integrations/bases/base-normalization/unit_tests/resources/long_name_truncate_collisions_catalog_expected_databricks_names.json diff --git a/airbyte-db/db-lib/src/main/java/io/airbyte/db/factory/DataSourceFactory.java b/airbyte-db/db-lib/src/main/java/io/airbyte/db/factory/DataSourceFactory.java index f1b9a7a2fadf2..c0c933a89f840 100644 --- a/airbyte-db/db-lib/src/main/java/io/airbyte/db/factory/DataSourceFactory.java +++ b/airbyte-db/db-lib/src/main/java/io/airbyte/db/factory/DataSourceFactory.java @@ -267,6 +267,11 @@ public DataSource build() { * will preserve existing behavior that tests for the connection on first use, not on creation. */ config.setInitializationFailTimeout(Integer.MIN_VALUE); + /* + * Default timeout is 30 sec, which is too short when you work with cloud data warehouses clusters + * that can take 4-5 min to start up. Set it to 30 min to be sure + */ + config.setConnectionTimeout(30 * 60 * 1000); connectionProperties.forEach(config::addDataSourceProperty); diff --git a/airbyte-integrations/bases/base-normalization/.dockerignore b/airbyte-integrations/bases/base-normalization/.dockerignore index 1af2d8606be8f..4a7771a96c7e4 100644 --- a/airbyte-integrations/bases/base-normalization/.dockerignore +++ b/airbyte-integrations/bases/base-normalization/.dockerignore @@ -10,4 +10,5 @@ !dbt-project-template-oracle !dbt-project-template-clickhouse !dbt-project-template-snowflake +!dbt-project-template-databricks !dbt-project-template-redshift diff --git a/airbyte-integrations/bases/base-normalization/build.gradle b/airbyte-integrations/bases/base-normalization/build.gradle index 65cbb49440fa6..b4718f6cb9e18 100644 --- a/airbyte-integrations/bases/base-normalization/build.gradle +++ b/airbyte-integrations/bases/base-normalization/build.gradle @@ -75,6 +75,10 @@ task airbyteDockerSnowflake(type: Exec, dependsOn: checkSshScriptCopy) { configure buildAirbyteDocker('snowflake') dependsOn assemble } +task airbyteDockerDatabricks(type: Exec, dependsOn: checkSshScriptCopy) { + configure buildAirbyteDocker('databricks') + dependsOn assemble +} task airbyteDockerRedshift(type: Exec, dependsOn: checkSshScriptCopy) { configure buildAirbyteDocker('redshift') dependsOn assemble @@ -85,6 +89,7 @@ airbyteDocker.dependsOn(airbyteDockerMySql) airbyteDocker.dependsOn(airbyteDockerOracle) airbyteDocker.dependsOn(airbyteDockerClickhouse) airbyteDocker.dependsOn(airbyteDockerSnowflake) +airbyteDocker.dependsOn(airbyteDockerDatabricks) airbyteDocker.dependsOn(airbyteDockerRedshift) task("customIntegrationTestPython", type: PythonTask, dependsOn: installTestReqs) { @@ -100,6 +105,7 @@ task("customIntegrationTestPython", type: PythonTask, dependsOn: installTestReqs dependsOn ':airbyte-integrations:connectors:destination-oracle:airbyteDocker' dependsOn ':airbyte-integrations:connectors:destination-mssql:airbyteDocker' dependsOn ':airbyte-integrations:connectors:destination-clickhouse:airbyteDocker' + dependsOn ':airbyte-integrations:connectors:destination-databricks:airbyteDocker' } // DATs have some additional tests that exercise normalization code paths, diff --git a/airbyte-integrations/bases/base-normalization/databricks.Dockerfile b/airbyte-integrations/bases/base-normalization/databricks.Dockerfile new file mode 100644 index 0000000000000..8c85d56df6332 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/databricks.Dockerfile @@ -0,0 +1,34 @@ +FROM fishtownanalytics/dbt:1.0.0 +COPY --from=airbyte/base-airbyte-protocol-python:0.1.1 /airbyte /airbyte + +# Install SSH Tunneling dependencies +RUN apt-get update && apt-get install -y jq sshpass + +WORKDIR /airbyte +COPY entrypoint.sh . +COPY build/sshtunneling.sh . + +WORKDIR /airbyte/normalization_code +COPY normalization ./normalization +COPY setup.py . +COPY dbt-project-template/ ./dbt-template/ +COPY dbt-project-template-databricks/* ./dbt-template/ + +# Install python dependencies +WORKDIR /airbyte/base_python_structs +RUN pip install . + +WORKDIR /airbyte/normalization_code +RUN pip install . + +WORKDIR /airbyte/normalization_code/dbt-template/ +# Download external dbt dependencies +RUN pip install dbt-databricks==1.0.0 +RUN dbt deps + +WORKDIR /airbyte +ENV AIRBYTE_ENTRYPOINT "/airbyte/entrypoint.sh" +ENTRYPOINT ["/airbyte/entrypoint.sh"] + +LABEL io.airbyte.version=0.1.73 +LABEL io.airbyte.name=airbyte/normalization-databricks diff --git a/airbyte-integrations/bases/base-normalization/dbt-project-template-databricks/dbt_project.yml b/airbyte-integrations/bases/base-normalization/dbt-project-template-databricks/dbt_project.yml new file mode 100644 index 0000000000000..ffc34638ecdc3 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/dbt-project-template-databricks/dbt_project.yml @@ -0,0 +1,72 @@ +# This file is necessary to install dbt-utils with dbt deps +# the content will be overwritten by the transform function + +# Name your package! Package names should contain only lowercase characters +# and underscores. A good package name should reflect your organization's +# name or the intended use of these models +name: "airbyte_utils" +version: "1.0" +config-version: 2 + +# This setting configures which "profile" dbt uses for this project. Profiles contain +# database connection information, and should be configured in the ~/.dbt/profiles.yml file +profile: "normalize" + +# These configurations specify where dbt should look for different types of files. +# The `model-paths` config, for example, states that source models can be found +# in the "models/" directory. You probably won't need to change these! +model-paths: ["models"] +docs-paths: ["docs"] +analysis-paths: ["analysis"] +test-paths: ["tests"] +seed-paths: ["data"] +macro-paths: ["macros"] + +target-path: "../build" # directory which will store compiled SQL files +log-path: "../logs" # directory which will store DBT logs +packages-install-path: "/tmp/dbt_modules" # directory which will store external DBT dependencies + +clean-targets: # directories to be removed by `dbt clean` + - "build" + - "dbt_modules" + +quoting: + database: true + # Temporarily disabling the behavior of the ExtendedNameTransformer on table/schema names, see (issue #1785) + # all schemas should be unquoted + schema: false + identifier: false + +# You can define configurations for models in the `model-paths` directory here. +# Using these configurations, you can enable or disable models, change how they +# are materialized, and more! +models: + +transient: false + airbyte_utils: + +materialized: table + generated: + airbyte_ctes: + +tags: airbyte_internal_cte + +materialized: ephemeral + airbyte_incremental: + +tags: incremental_tables + +materialized: incremental + +incremental_strategy: merge + # schema change test is supported automatically by the merge operation + # need to be run against a cluster with spark.databricks.delta.schema.autoMerge.enabled = True + # schema merge being handled at the final step, if a schema changes in one of the primary keys + # that coalesce differently to string, unicity will be broken + +on_schema_change: "ignore" + +file_format: delta + +pre-hook: 'SET spark.databricks.delta.schema.autoMerge.enabled = True' + airbyte_tables: + +tags: normalized_tables + +materialized: table + +file_format: delta + airbyte_views: + +tags: airbyte_internal_views + +materialized: view + +dispatch: + - macro_namespace: dbt_utils + search_order: ["airbyte_utils", "dbt_utils"] diff --git a/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/array.sql b/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/array.sql index 35df40780d876..03fd0f9035d6c 100644 --- a/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/array.sql +++ b/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/array.sql @@ -6,6 +6,7 @@ - postgres: unnest() -> https://www.postgresqltutorial.com/postgresql-array/ - MSSQL: openjson() –> https://docs.microsoft.com/en-us/sql/relational-databases/json/validate-query-and-change-json-data-with-built-in-functions-sql-server?view=sql-server-ver15 - ClickHouse: ARRAY JOIN –> https://clickhouse.com/docs/zh/sql-reference/statements/select/array-join/ + - Databricks: LATERAL VIEW -> https://docs.databricks.com/spark/latest/spark-sql/language-manual/sql-ref-syntax-qry-select-lateral-view.html #} {# cross_join_unnest ------------------------------------------------- #} @@ -50,6 +51,10 @@ cross join table(flatten({{ array_col }})) as {{ array_col }} {%- endmacro %} +{% macro databricks__cross_join_unnest(stream_name, array_col) -%} + lateral view outer explode(from_json({{ array_col }}, 'array')) as _airbyte_nested_data +{%- endmacro %} + {% macro sqlserver__cross_join_unnest(stream_name, array_col) -%} {# https://docs.microsoft.com/en-us/sql/relational-databases/json/convert-json-data-to-rows-and-columns-with-openjson-sql-server?view=sql-server-ver15#option-1---openjson-with-the-default-output #} CROSS APPLY ( @@ -87,6 +92,10 @@ _airbyte_nested_data {%- endmacro %} +{% macro databricks__unnested_column_value(column_col) -%} + _airbyte_nested_data +{%- endmacro %} + {% macro oracle__unnested_column_value(column_col) -%} {{ column_col }} {%- endmacro %} diff --git a/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/columns.sql b/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/columns.sql index 0b695c1c576e7..453fde7e5a947 100644 --- a/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/columns.sql +++ b/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/columns.sql @@ -14,3 +14,31 @@ {% endcall %} {% endmacro %} + +{# + This changes the behaviour of the default adapter macro, since DBT defaults to 256 when there are no explicit varchar limits + (cf : https://github.com/dbt-labs/dbt-core/blob/3996a69861d5ba9a460092c93b7e08d8e2a63f88/core/dbt/adapters/base/column.py#L91) + Since normalization code uses varchar for string type (and not text) on postgres, we need to set the max length possible when using unlimited varchars + (cf : https://dba.stackexchange.com/questions/189876/size-limit-of-character-varying-postgresql) +#} + +{% macro postgres__get_columns_in_relation(relation) -%} + {% call statement('get_columns_in_relation', fetch_result=True) %} + select + column_name, + data_type, + COALESCE(character_maximum_length, 10485760), + numeric_precision, + numeric_scale + + from {{ relation.information_schema('columns') }} + where table_name = '{{ relation.identifier }}' + {% if relation.schema %} + and table_schema = '{{ relation.schema }}' + {% endif %} + order by ordinal_position + + {% endcall %} + {% set table = load_result('get_columns_in_relation').table %} + {{ return(sql_convert_columns_in_relation(table)) }} +{% endmacro %} \ No newline at end of file diff --git a/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/current_timestamp.sql b/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/current_timestamp.sql index a9df34c9e4979..945ea1c65909d 100644 --- a/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/current_timestamp.sql +++ b/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/current_timestamp.sql @@ -5,3 +5,7 @@ {% macro oracle__current_timestamp() %} CURRENT_TIMESTAMP {% endmacro %} + +{% macro databricks__current_timestamp() %} + CURRENT_TIMESTAMP +{% endmacro %} \ No newline at end of file diff --git a/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/datatypes.sql b/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/datatypes.sql index d03bf3613dc43..40576fd43e5fb 100644 --- a/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/datatypes.sql +++ b/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/datatypes.sql @@ -8,6 +8,10 @@ string {% endmacro %} +{%- macro databricks__type_json() -%} + string +{%- endmacro -%} + {%- macro redshift__type_json() -%} {%- if redshift_super_type() -%} super @@ -91,6 +95,10 @@ INT {% endmacro %} +{% macro databricks__type_int() %} + INT +{% endmacro %} + {# bigint ------------------------------------------------- #} {% macro mysql__type_bigint() %} @@ -105,6 +113,10 @@ BIGINT {% endmacro %} +{% macro databricks__type_bigint() %} + BIGINT +{% endmacro %} + {# numeric ------------------------------------------------- --#} {% macro mysql__type_numeric() %} @@ -115,6 +127,10 @@ Float64 {% endmacro %} +{% macro databricks__type_numeric() %} + FLOAT +{% endmacro %} + {# timestamp ------------------------------------------------- --#} {% macro mysql__type_timestamp() %} @@ -146,6 +162,12 @@ timestamp {% endmacro %} +{#-- Spark timestamps are already 'point in time', even if converted / stored without the original tz info, relative to session tz --#} +{#-- cf: https://docs.databricks.com/spark/latest/dataframes-datasets/dates-timestamps.html --#} +{% macro databricks__type_timestamp_with_timezone() %} + timestamp +{% endmacro %} + {#-- MySQL doesnt allow cast operation to work with TIMESTAMP so we have to use char --#} {%- macro mysql__type_timestamp_with_timezone() -%} char diff --git a/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/json_operations.sql b/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/json_operations.sql index f6bfa26d22852..71c3fb0387f7e 100644 --- a/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/json_operations.sql +++ b/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/cross_db_utils/json_operations.sql @@ -6,6 +6,7 @@ - Postgres: json_extract_path_text(, 'path' [, 'path' [, ...}}) -> https://www.postgresql.org/docs/12/functions-json.html - MySQL: JSON_EXTRACT(json_doc, 'path' [, 'path'] ...) -> https://dev.mysql.com/doc/refman/8.0/en/json-search-functions.html - ClickHouse: JSONExtractString(json_doc, 'path' [, 'path'] ...) -> https://clickhouse.com/docs/en/sql-reference/functions/json-functions/ + - Databricks: get_json_object(json_txt, 'path') -> https://spark.apache.org/docs/latest/api/sql/#get_json_object #} {# format_json_path -------------------------------------------------- #} @@ -42,6 +43,15 @@ {{ "'$.\"" ~ json_path_list|join(".") ~ "\"'" }} {%- endmacro %} +{% macro databricks__format_json_path(json_path_list) -%} + {# -- '$.x.y.z' #} + {%- set str_list = [] -%} + {%- for json_path in json_path_list -%} + {%- if str_list.append(json_path.replace("'", "\\'")) -%} {%- endif -%} + {%- endfor -%} + {{ "'$." ~ str_list|join(".") ~ "'" }} +{%- endmacro %} + {% macro redshift__format_json_path(json_path_list) -%} {%- set quote = '"' if redshift_super_type() else "'" -%} {%- set str_list = [] -%} @@ -86,6 +96,14 @@ json_extract({{ from_table}}.{{ json_column }}, {{ format_json_path(json_path_list) }}) {%- endmacro %} +{% macro databricks__json_extract(from_table, json_column, json_path_list, normalized_json_path) -%} + {%- if from_table|string() == '' %} + get_json_object({{ json_column }}, {{ format_json_path(json_path_list) }}) + {% else %} + get_json_object({{ from_table }}.{{ json_column }}, {{ format_json_path(json_path_list) }}) + {% endif -%} +{%- endmacro %} + {% macro oracle__json_extract(from_table, json_column, json_path_list, normalized_json_path) -%} json_value({{ json_column }}, {{ format_json_path(normalized_json_path) }}) {%- endmacro %} @@ -191,6 +209,10 @@ JSONExtractRaw(assumeNotNull({{ json_column }}), {{ format_json_path(json_path_list) }}) {%- endmacro %} +{% macro databricks__json_extract_scalar(json_column, json_path_list, normalized_json_path) -%} + get_json_object({{ json_column }}, {{ format_json_path(json_path_list) }}) +{%- endmacro %} + {# json_extract_array ------------------------------------------------- #} {% macro json_extract_array(json_column, json_path_list, normalized_json_path) -%} @@ -237,6 +259,10 @@ JSONExtractArrayRaw(assumeNotNull({{ json_column }}), {{ format_json_path(json_path_list) }}) {%- endmacro %} +{% macro databricks__json_extract_array(json_column, json_path_list, normalized_json_path) -%} + get_json_object({{ json_column }}, {{ format_json_path(json_path_list) }}) +{%- endmacro %} + {# json_extract_string_array ------------------------------------------------- #} {% macro json_extract_string_array(json_column, json_path_list, normalized_json_path) -%} diff --git a/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/should_full_refresh.sql b/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/should_full_refresh.sql index ff2c6d54ecce3..b84cfb76d01e8 100644 --- a/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/should_full_refresh.sql +++ b/airbyte-integrations/bases/base-normalization/dbt-project-template/macros/should_full_refresh.sql @@ -4,12 +4,43 @@ - the column _airbyte_ab_id does not exists in the normalized tables and make sure it is well populated. #} +{%- macro get_columns_in_relation_if_exist(target_table) -%} + {{ return(adapter.dispatch('get_columns_in_relation_if_exist')(target_table)) }} +{%- endmacro -%} + +{%- macro default__get_columns_in_relation_if_exist(target_table) -%} + {{ return(adapter.get_columns_in_relation(target_table)) }} +{%- endmacro -%} + +{%- macro databricks__get_columns_in_relation_if_exist(target_table) -%} + {%- if target_table.schema is none -%} + {%- set found_table = True %} + {%- else -%} + {% call statement('list_table_infos', fetch_result=True) -%} + show tables in {{ target_table.schema }} like '*' + {% endcall %} + {%- set existing_tables = load_result('list_table_infos').table -%} + {%- set found_table = [] %} + {%- for table in existing_tables -%} + {%- if table.tableName == target_table.identifier -%} + {% do found_table.append(table.tableName) %} + {%- endif -%} + {%- endfor -%} + {%- endif -%} + {%- if found_table -%} + {%- set cols = adapter.get_columns_in_relation(target_table) -%} + {{ return(cols) }} + {%- else -%} + {{ return ([]) }} + {%- endif -%} +{%- endmacro -%} + {%- macro need_full_refresh(col_ab_id, target_table=this) -%} {%- if not execute -%} {{ return(false) }} {%- endif -%} {%- set found_column = [] %} - {%- set cols = adapter.get_columns_in_relation(target_table) -%} + {%- set cols = get_columns_in_relation_if_exist(target_table) -%} {%- for col in cols -%} {%- if col.column == col_ab_id -%} {% do found_column.append(col.column) %} @@ -18,7 +49,7 @@ {%- if found_column -%} {{ return(false) }} {%- else -%} - {{ dbt_utils.log_info(target_table ~ "." ~ col_ab_id ~ " does not exist yet. The table will be created or rebuilt with dbt.full_refresh") }} + {{ dbt_utils.log_info(target_table ~ "." ~ col_ab_id ~ " does not exist. The table needs to be rebuilt in full_refresh") }} {{ return(true) }} {%- endif -%} {%- endmacro -%} diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/dbt_integration_test.py b/airbyte-integrations/bases/base-normalization/integration_tests/dbt_integration_test.py index ead7e2ad0d0db..7ca915f51aa13 100644 --- a/airbyte-integrations/bases/base-normalization/integration_tests/dbt_integration_test.py +++ b/airbyte-integrations/bases/base-normalization/integration_tests/dbt_integration_test.py @@ -342,6 +342,8 @@ def generate_profile_yaml_file( } elif destination_type.value == DestinationType.MYSQL.value: profiles_config["database"] = self.target_schema + elif destination_type.value == DestinationType.DATABRICKS.value: + profiles_config["database_schema"] = self.target_schema elif destination_type.value == DestinationType.REDSHIFT.value: profiles_config["schema"] = self.target_schema if random_schema: @@ -394,6 +396,8 @@ def get_normalization_image(destination_type: DestinationType) -> str: return "airbyte/normalization-clickhouse:dev" elif DestinationType.SNOWFLAKE.value == destination_type.value: return "airbyte/normalization-snowflake:dev" + elif DestinationType.DATABRICKS.value == destination_type.value: + return "airbyte/normalization-databricks:dev" elif DestinationType.REDSHIFT.value == destination_type.value: return "airbyte/normalization-redshift:dev" else: diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/dbt_project.yml b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/dbt_project.yml new file mode 100644 index 0000000000000..cf22d38101f7e --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/dbt_project.yml @@ -0,0 +1,67 @@ +# This file is necessary to install dbt-utils with dbt deps +# the content will be overwritten by the transform function + +# Name your package! Package names should contain only lowercase characters +# and underscores. A good package name should reflect your organization's +# name or the intended use of these models +name: "airbyte_utils" +version: "1.0" +config-version: 2 + +# This setting configures which "profile" dbt uses for this project. Profiles contain +# database connection information, and should be configured in the ~/.dbt/profiles.yml file +profile: "normalize" + +# These configurations specify where dbt should look for different types of files. +# The `model-paths` config, for example, states that source models can be found +# in the "models/" directory. You probably won't need to change these! +model-paths: ["models"] +docs-paths: ["docs"] +analysis-paths: ["analysis"] +test-paths: ["tests"] +seed-paths: ["data"] +macro-paths: ["macros"] + +target-path: "../build" # directory which will store compiled SQL files +log-path: "../logs" # directory which will store DBT logs +packages-install-path: "/tmp/dbt_modules" # directory which will store external DBT dependencies + +clean-targets: # directories to be removed by `dbt clean` + - "build" + - "dbt_modules" + +quoting: + database: true + # Temporarily disabling the behavior of the ExtendedNameTransformer on table/schema names, see (issue #1785) + # all schemas should be unquoted + schema: false + identifier: false + +# You can define configurations for models in the `model-paths` directory here. +# Using these configurations, you can enable or disable models, change how they +# are materialized, and more! +models: + +transient: false + airbyte_utils: + +materialized: table + generated: + airbyte_ctes: + +tags: airbyte_internal_cte + +materialized: ephemeral + airbyte_incremental: + +tags: incremental_tables + +materialized: incremental + +on_schema_change: sync_all_columns + +incremental_strategy: merge + +file_format: delta + airbyte_tables: + +tags: normalized_tables + +materialized: table + +file_format: delta + airbyte_views: + +tags: airbyte_internal_views + +materialized: view + +dispatch: + - macro_namespace: dbt_utils + search_order: ["airbyte_utils", "dbt_utils"] diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/scd/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_scd.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/scd/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_scd.sql new file mode 100644 index 0000000000000..9083d5e11f633 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/scd/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_scd.sql @@ -0,0 +1,91 @@ + + create or replace table test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_scd + + + using delta + + + + + + as + +-- depends_on: ref('nested_stream_with_complex_columns_resulting_into_long_names_stg') +with + +input_data as ( + select * + from _airbyte_test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_stg + -- nested_stream_with_complex_columns_resulting_into_long_names from test_normalization._airbyte_raw_nested_stream_with_complex_columns_resulting_into_long_names +), + +scd_data as ( + -- SQL model to build a Type 2 Slowly Changing Dimension (SCD) table for each record identified by their primary key + select + md5(cast(coalesce(cast(id as + string +), '') as + string +)) as _airbyte_unique_key, + id, + date, + `partition`, + date as _airbyte_start_at, + lag(date) over ( + partition by id + order by + date is null asc, + date desc, + _airbyte_emitted_at desc + ) as _airbyte_end_at, + case when row_number() over ( + partition by id + order by + date is null asc, + date desc, + _airbyte_emitted_at desc + ) = 1 then 1 else 0 end as _airbyte_active_row, + _airbyte_ab_id, + _airbyte_emitted_at, + _airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid + from input_data +), +dedup_data as ( + select + -- we need to ensure de-duplicated rows for merge/update queries + -- additionally, we generate a unique key for the scd table + row_number() over ( + partition by + _airbyte_unique_key, + _airbyte_start_at, + _airbyte_emitted_at + order by _airbyte_active_row desc, _airbyte_ab_id + ) as _airbyte_row_num, + md5(cast(coalesce(cast(_airbyte_unique_key as + string +), '') || '-' || coalesce(cast(_airbyte_start_at as + string +), '') || '-' || coalesce(cast(_airbyte_emitted_at as + string +), '') as + string +)) as _airbyte_unique_key_scd, + scd_data.* + from scd_data +) +select + _airbyte_unique_key, + _airbyte_unique_key_scd, + id, + date, + `partition`, + _airbyte_start_at, + _airbyte_end_at, + _airbyte_active_row, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at, + _airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid +from dedup_data where _airbyte_row_num = 1 \ No newline at end of file diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names.sql new file mode 100644 index 0000000000000..9f218983cfdc1 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names.sql @@ -0,0 +1,29 @@ + + create or replace table test_normalization.nested_stream_with_complex_columns_resulting_into_long_names + + + using delta + + + + + + as + +-- Final base SQL model +-- depends_on: test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_scd +select + _airbyte_unique_key, + id, + date, + `partition`, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at, + _airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid +from test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_scd +-- nested_stream_with_complex_columns_resulting_into_long_names from test_normalization._airbyte_raw_nested_stream_with_complex_columns_resulting_into_long_names +where 1 = 1 +and _airbyte_active_row = 1 diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition.sql new file mode 100644 index 0000000000000..2d29fca350add --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition.sql @@ -0,0 +1,81 @@ + + create or replace table test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_partition + + + using delta + + + + + + as + +with __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_ab1 as ( + +-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema +-- depends_on: test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_scd +select + _airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid, + get_json_object(`partition`, '$.double_array_data') as double_array_data, + get_json_object(`partition`, '$.DATA') as `DATA`, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_scd as table_alias +-- partition at nested_stream_with_complex_columns_resulting_into_long_names/partition +where 1 = 1 +and `partition` is not null + +), __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_ab2 as ( + +-- SQL model to cast each column to its adequate SQL type converted from the JSON schema type +-- depends_on: __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_ab1 +select + _airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid, + double_array_data, + `DATA`, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_ab1 +-- partition at nested_stream_with_complex_columns_resulting_into_long_names/partition +where 1 = 1 + +), __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_ab3 as ( + +-- SQL model to build a hash column based on the values of this record +-- depends_on: __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_ab2 +select + md5(cast(coalesce(cast(_airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid as + string +), '') || '-' || coalesce(cast(double_array_data as + string +), '') || '-' || coalesce(cast(`DATA` as + string +), '') as + string +)) as _airbyte_partition_hashid, + tmp.* +from __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_ab2 tmp +-- partition at nested_stream_with_complex_columns_resulting_into_long_names/partition +where 1 = 1 + +)-- Final base SQL model +-- depends_on: __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_ab3 +select + _airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid, + double_array_data, + `DATA`, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at, + _airbyte_partition_hashid +from __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_ab3 +-- partition at nested_stream_with_complex_columns_resulting_into_long_names/partition from test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_scd +where 1 = 1 diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data.sql new file mode 100644 index 0000000000000..deed6ea87cef5 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/first_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data.sql @@ -0,0 +1,80 @@ + + create or replace table test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data + + + using delta + + + + + + as + +with __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab1 as ( + +-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema +-- depends_on: test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_partition + +select + _airbyte_partition_hashid, + get_json_object(_airbyte_nested_data, '$.id') as id, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_partition as table_alias +-- double_array_data at nested_stream_with_complex_columns_resulting_into_long_names/partition/double_array_data +lateral view outer explode(from_json(double_array_data, 'array')) as _airbyte_nested_data +where 1 = 1 +and double_array_data is not null + +), __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab2 as ( + +-- SQL model to cast each column to its adequate SQL type converted from the JSON schema type +-- depends_on: __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab1 +select + _airbyte_partition_hashid, + cast(id as + string +) as id, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab1 +-- double_array_data at nested_stream_with_complex_columns_resulting_into_long_names/partition/double_array_data +where 1 = 1 + +), __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab3 as ( + +-- SQL model to build a hash column based on the values of this record +-- depends_on: __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab2 +select + md5(cast(coalesce(cast(_airbyte_partition_hashid as + string +), '') || '-' || coalesce(cast(id as + string +), '') as + string +)) as _airbyte_double_array_data_hashid, + tmp.* +from __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab2 tmp +-- double_array_data at nested_stream_with_complex_columns_resulting_into_long_names/partition/double_array_data +where 1 = 1 + +)-- Final base SQL model +-- depends_on: __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab3 +select + _airbyte_partition_hashid, + id, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at, + _airbyte_double_array_data_hashid +from __dbt__cte__nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab3 +-- double_array_data at nested_stream_with_complex_columns_resulting_into_long_names/partition/double_array_data from test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_partition +where 1 = 1 diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_ab1.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_ab1.sql new file mode 100644 index 0000000000000..5b485431be318 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_ab1.sql @@ -0,0 +1,19 @@ +{{ config( + unique_key = '_airbyte_ab_id', + schema = "_airbyte_test_normalization", + tags = [ "top-level-intermediate" ] +) }} +-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema +-- depends_on: {{ source('test_normalization', '_airbyte_raw_nested_stream_with_complex_columns_resulting_into_long_names') }} +select + {{ json_extract_scalar('_airbyte_data', ['id'], ['id']) }} as id, + {{ json_extract_scalar('_airbyte_data', ['date'], ['date']) }} as date, + {{ json_extract('table_alias', '_airbyte_data', ['partition'], ['partition']) }} as {{ adapter.quote('partition') }}, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at +from {{ source('test_normalization', '_airbyte_raw_nested_stream_with_complex_columns_resulting_into_long_names') }} as table_alias +-- nested_stream_with_complex_columns_resulting_into_long_names +where 1 = 1 +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_ab2.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_ab2.sql new file mode 100644 index 0000000000000..41ce342165382 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_ab2.sql @@ -0,0 +1,19 @@ +{{ config( + unique_key = '_airbyte_ab_id', + schema = "_airbyte_test_normalization", + tags = [ "top-level-intermediate" ] +) }} +-- SQL model to cast each column to its adequate SQL type converted from the JSON schema type +-- depends_on: {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_ab1') }} +select + cast(id as {{ dbt_utils.type_string() }}) as id, + cast(date as {{ dbt_utils.type_string() }}) as date, + cast({{ adapter.quote('partition') }} as {{ type_json() }}) as {{ adapter.quote('partition') }}, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at +from {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_ab1') }} +-- nested_stream_with_complex_columns_resulting_into_long_names +where 1 = 1 +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_ab1.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_ab1.sql new file mode 100644 index 0000000000000..7f038d1b7f84e --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_ab1.sql @@ -0,0 +1,19 @@ +{{ config( + schema = "_airbyte_test_normalization", + tags = [ "nested-intermediate" ] +) }} +-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema +-- depends_on: {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_scd') }} +select + _airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid, + {{ json_extract_array(adapter.quote('partition'), ['double_array_data'], ['double_array_data']) }} as double_array_data, + {{ json_extract_array(adapter.quote('partition'), ['DATA'], ['DATA']) }} as {{ adapter.quote('DATA') }}, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at +from {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_scd') }} as table_alias +-- partition at nested_stream_with_complex_columns_resulting_into_long_names/partition +where 1 = 1 +and {{ adapter.quote('partition') }} is not null +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab1.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab1.sql new file mode 100644 index 0000000000000..261d3facb3c08 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_ctes/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab1.sql @@ -0,0 +1,20 @@ +{{ config( + schema = "_airbyte_test_normalization", + tags = [ "nested-intermediate" ] +) }} +-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema +-- depends_on: {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_partition') }} +{{ unnest_cte(ref('nested_stream_with_complex_columns_resulting_into_long_names_partition'), 'partition', 'double_array_data') }} +select + _airbyte_partition_hashid, + {{ json_extract_scalar(unnested_column_value('double_array_data'), ['id'], ['id']) }} as id, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at +from {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_partition') }} as table_alias +-- double_array_data at nested_stream_with_complex_columns_resulting_into_long_names/partition/double_array_data +{{ cross_join_unnest('partition', 'double_array_data') }} +where 1 = 1 +and double_array_data is not null +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/scd/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_scd.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/scd/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_scd.sql new file mode 100644 index 0000000000000..e2f937dbd4064 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/scd/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_scd.sql @@ -0,0 +1,116 @@ +{{ config( + unique_key = "_airbyte_unique_key_scd", + schema = "test_normalization", + post_hook = ["drop view _airbyte_test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_stg"], + tags = [ "top-level" ] +) }} +-- depends_on: ref('nested_stream_with_complex_columns_resulting_into_long_names_stg') +with +{% if is_incremental() %} +new_data as ( + -- retrieve incremental "new" data + select + * + from {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_stg') }} + -- nested_stream_with_complex_columns_resulting_into_long_names from {{ source('test_normalization', '_airbyte_raw_nested_stream_with_complex_columns_resulting_into_long_names') }} + where 1 = 1 + {{ incremental_clause('_airbyte_emitted_at') }} +), +new_data_ids as ( + -- build a subset of _airbyte_unique_key from rows that are new + select distinct + {{ dbt_utils.surrogate_key([ + 'id', + ]) }} as _airbyte_unique_key + from new_data +), +empty_new_data as ( + -- build an empty table to only keep the table's column types + select * from new_data where 1 = 0 +), +previous_active_scd_data as ( + -- retrieve "incomplete old" data that needs to be updated with an end date because of new changes + select + {{ star_intersect(ref('nested_stream_with_complex_columns_resulting_into_long_names_stg'), this, from_alias='inc_data', intersect_alias='this_data') }} + from {{ this }} as this_data + -- make a join with new_data using primary key to filter active data that need to be updated only + join new_data_ids on this_data._airbyte_unique_key = new_data_ids._airbyte_unique_key + -- force left join to NULL values (we just need to transfer column types only for the star_intersect macro on schema changes) + left join empty_new_data as inc_data on this_data._airbyte_ab_id = inc_data._airbyte_ab_id + where _airbyte_active_row = 1 +), +input_data as ( + select {{ dbt_utils.star(ref('nested_stream_with_complex_columns_resulting_into_long_names_stg')) }} from new_data + union all + select {{ dbt_utils.star(ref('nested_stream_with_complex_columns_resulting_into_long_names_stg')) }} from previous_active_scd_data +), +{% else %} +input_data as ( + select * + from {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_stg') }} + -- nested_stream_with_complex_columns_resulting_into_long_names from {{ source('test_normalization', '_airbyte_raw_nested_stream_with_complex_columns_resulting_into_long_names') }} +), +{% endif %} +scd_data as ( + -- SQL model to build a Type 2 Slowly Changing Dimension (SCD) table for each record identified by their primary key + select + {{ dbt_utils.surrogate_key([ + 'id', + ]) }} as _airbyte_unique_key, + id, + date, + {{ adapter.quote('partition') }}, + date as _airbyte_start_at, + lag(date) over ( + partition by id + order by + date is null asc, + date desc, + _airbyte_emitted_at desc + ) as _airbyte_end_at, + case when row_number() over ( + partition by id + order by + date is null asc, + date desc, + _airbyte_emitted_at desc + ) = 1 then 1 else 0 end as _airbyte_active_row, + _airbyte_ab_id, + _airbyte_emitted_at, + _airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid + from input_data +), +dedup_data as ( + select + -- we need to ensure de-duplicated rows for merge/update queries + -- additionally, we generate a unique key for the scd table + row_number() over ( + partition by + _airbyte_unique_key, + _airbyte_start_at, + _airbyte_emitted_at + order by _airbyte_active_row desc, _airbyte_ab_id + ) as _airbyte_row_num, + {{ dbt_utils.surrogate_key([ + '_airbyte_unique_key', + '_airbyte_start_at', + '_airbyte_emitted_at' + ]) }} as _airbyte_unique_key_scd, + scd_data.* + from scd_data +) +select + _airbyte_unique_key, + _airbyte_unique_key_scd, + id, + date, + {{ adapter.quote('partition') }}, + _airbyte_start_at, + _airbyte_end_at, + _airbyte_active_row, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at, + _airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid +from dedup_data where _airbyte_row_num = 1 + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names.sql new file mode 100644 index 0000000000000..110c82cffa963 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names.sql @@ -0,0 +1,22 @@ +{{ config( + unique_key = "_airbyte_unique_key", + schema = "test_normalization", + tags = [ "top-level" ] +) }} +-- Final base SQL model +-- depends_on: {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_scd') }} +select + _airbyte_unique_key, + id, + date, + {{ adapter.quote('partition') }}, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at, + _airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid +from {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_scd') }} +-- nested_stream_with_complex_columns_resulting_into_long_names from {{ source('test_normalization', '_airbyte_raw_nested_stream_with_complex_columns_resulting_into_long_names') }} +where 1 = 1 +and _airbyte_active_row = 1 +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition.sql new file mode 100644 index 0000000000000..01ff1b155f297 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition.sql @@ -0,0 +1,19 @@ +{{ config( + schema = "test_normalization", + tags = [ "nested" ] +) }} +-- Final base SQL model +-- depends_on: {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_partition_ab3') }} +select + _airbyte_nested_stream_with_complex_columns_resulting_into_long_names_hashid, + double_array_data, + {{ adapter.quote('DATA') }}, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at, + _airbyte_partition_hashid +from {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_partition_ab3') }} +-- partition at nested_stream_with_complex_columns_resulting_into_long_names/partition from {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_scd') }} +where 1 = 1 +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data.sql new file mode 100644 index 0000000000000..6d3622700fd72 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data.sql @@ -0,0 +1,18 @@ +{{ config( + schema = "test_normalization", + tags = [ "nested" ] +) }} +-- Final base SQL model +-- depends_on: {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab3') }} +select + _airbyte_partition_hashid, + id, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at, + _airbyte_double_array_data_hashid +from {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data_ab3') }} +-- double_array_data at nested_stream_with_complex_columns_resulting_into_long_names/partition/double_array_data from {{ ref('nested_stream_with_complex_columns_resulting_into_long_names_partition') }} +where 1 = 1 +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/sources.yml b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/sources.yml new file mode 100644 index 0000000000000..92fa4c9a2580e --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/models/generated/sources.yml @@ -0,0 +1,22 @@ +version: 2 +sources: +- name: test_normalization + quoting: + database: true + schema: false + identifier: false + tables: + - name: _airbyte_raw_conflict_stream_array + - name: _airbyte_raw_conflict_stream_name + - name: _airbyte_raw_conflict_stream_scalar + - name: _airbyte_raw_nested_stream_with_complex_columns_resulting_into_long_names + - name: _airbyte_raw_non_nested_stream_without_namespace_resulting_into_long_names + - name: _airbyte_raw_some_stream_that_was_empty + - name: _airbyte_raw_unnest_alias +- name: test_normalization_namespace + quoting: + database: true + schema: false + identifier: false + tables: + - name: _airbyte_raw_simple_stream_with_namespace_resulting_into_long_names diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/scd/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_scd.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/scd/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_scd.sql new file mode 100644 index 0000000000000..30ea23a378124 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/scd/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_scd.sql @@ -0,0 +1,17 @@ + + + + + merge into test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_scd as DBT_INTERNAL_DEST + using nested_stream_with_complex_columns_resulting_into_long_names_scd__dbt_tmp as DBT_INTERNAL_SOURCE + + + + on DBT_INTERNAL_SOURCE._airbyte_unique_key_scd = DBT_INTERNAL_DEST._airbyte_unique_key_scd + + + + when matched then update set + * + + when not matched then insert * diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names.sql new file mode 100644 index 0000000000000..11dc986b5efa1 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names.sql @@ -0,0 +1,17 @@ + + + + + merge into test_normalization.nested_stream_with_complex_columns_resulting_into_long_names as DBT_INTERNAL_DEST + using nested_stream_with_complex_columns_resulting_into_long_names__dbt_tmp as DBT_INTERNAL_SOURCE + + + + on DBT_INTERNAL_SOURCE._airbyte_unique_key = DBT_INTERNAL_DEST._airbyte_unique_key + + + + when matched then update set + * + + when not matched then insert * diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition.sql new file mode 100644 index 0000000000000..63390765ebc19 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition.sql @@ -0,0 +1,17 @@ + + + + + merge into test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_partition as DBT_INTERNAL_DEST + using nested_stream_with_complex_columns_resulting_into_long_names_partition__dbt_tmp as DBT_INTERNAL_SOURCE + + + + on false + + + + when matched then update set + * + + when not matched then insert * diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data.sql new file mode 100644 index 0000000000000..86f02366efef1 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_nested_streams/second_output/airbyte_incremental/test_normalization/nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data.sql @@ -0,0 +1,17 @@ + + + + + merge into test_normalization.nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data as DBT_INTERNAL_DEST + using nested_stream_with_complex_columns_resulting_into_long_names_partition_double_array_data__dbt_tmp as DBT_INTERNAL_SOURCE + + + + on false + + + + when matched then update set + * + + when not matched then insert * diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/dbt_project.yml b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/dbt_project.yml new file mode 100644 index 0000000000000..45d191f6c5256 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/dbt_project.yml @@ -0,0 +1,67 @@ +# This file is necessary to install dbt-utils with dbt deps +# the content will be overwritten by the transform function + +# Name your package! Package names should contain only lowercase characters +# and underscores. A good package name should reflect your organization's +# name or the intended use of these models +name: "airbyte_utils" +version: "1.0" +config-version: 2 + +# This setting configures which "profile" dbt uses for this project. Profiles contain +# database connection information, and should be configured in the ~/.dbt/profiles.yml file +profile: "normalize" + +# These configurations specify where dbt should look for different types of files. +# The `model-paths` config, for example, states that source models can be found +# in the "models/" directory. You probably won't need to change these! +model-paths: ["modified_models"] +docs-paths: ["docs"] +analysis-paths: ["analysis"] +test-paths: ["tests"] +seed-paths: ["data"] +macro-paths: ["macros"] + +target-path: "../build" # directory which will store compiled SQL files +log-path: "../logs" # directory which will store DBT logs +packages-install-path: "/tmp/dbt_modules" # directory which will store external DBT dependencies + +clean-targets: # directories to be removed by `dbt clean` + - "build" + - "dbt_modules" + +quoting: + database: true + # Temporarily disabling the behavior of the ExtendedNameTransformer on table/schema names, see (issue #1785) + # all schemas should be unquoted + schema: false + identifier: true + +# You can define configurations for models in the `model-paths` directory here. +# Using these configurations, you can enable or disable models, change how they +# are materialized, and more! +models: + +transient: false + airbyte_utils: + +materialized: table + generated: + airbyte_ctes: + +tags: airbyte_internal_cte + +materialized: ephemeral + airbyte_incremental: + +tags: incremental_tables + +materialized: incremental + +on_schema_change: sync_all_columns + +incremental_strategy: merge + +file_format: delta + airbyte_tables: + +tags: normalized_tables + +materialized: table + +file_format: delta + airbyte_views: + +tags: airbyte_internal_views + +materialized: view + +dispatch: + - macro_namespace: dbt_utils + search_order: ["airbyte_utils", "dbt_utils"] diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_dbt_project.yml b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_dbt_project.yml new file mode 100644 index 0000000000000..4e4f23938ff7a --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_dbt_project.yml @@ -0,0 +1,67 @@ +# This file is necessary to install dbt-utils with dbt deps +# the content will be overwritten by the transform function + +# Name your package! Package names should contain only lowercase characters +# and underscores. A good package name should reflect your organization's +# name or the intended use of these models +name: "airbyte_utils" +version: "1.0" +config-version: 2 + +# This setting configures which "profile" dbt uses for this project. Profiles contain +# database connection information, and should be configured in the ~/.dbt/profiles.yml file +profile: "normalize" + +# These configurations specify where dbt should look for different types of files. +# The `model-paths` config, for example, states that source models can be found +# in the "models/" directory. You probably won't need to change these! +model-paths: ["models"] +docs-paths: ["docs"] +analysis-paths: ["analysis"] +test-paths: ["tests"] +seed-paths: ["data"] +macro-paths: ["macros"] + +target-path: "../build" # directory which will store compiled SQL files +log-path: "../logs" # directory which will store DBT logs +packages-install-path: "/tmp/dbt_modules" # directory which will store external DBT dependencies + +clean-targets: # directories to be removed by `dbt clean` + - "build" + - "dbt_modules" + +quoting: + database: true + # Temporarily disabling the behavior of the ExtendedNameTransformer on table/schema names, see (issue #1785) + # all schemas should be unquoted + schema: false + identifier: true + +# You can define configurations for models in the `model-paths` directory here. +# Using these configurations, you can enable or disable models, change how they +# are materialized, and more! +models: + +transient: false + airbyte_utils: + +materialized: table + generated: + airbyte_ctes: + +tags: airbyte_internal_cte + +materialized: ephemeral + airbyte_incremental: + +tags: incremental_tables + +materialized: incremental + +on_schema_change: sync_all_columns + +incremental_strategy: merge + +file_format: delta + airbyte_tables: + +tags: normalized_tables + +materialized: table + +file_format: delta + airbyte_views: + +tags: airbyte_internal_views + +materialized: view + +dispatch: + - macro_namespace: dbt_utils + search_order: ["airbyte_utils", "dbt_utils"] diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql new file mode 100644 index 0000000000000..7d18cde26c798 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql @@ -0,0 +1,109 @@ + + create or replace table test_normalization.`dedup_exchange_rate_scd` + + + using delta + + + + + + as + +-- depends_on: ref('dedup_exchange_rate_stg') +with + +input_data as ( + select * + from _airbyte_test_normalization.`dedup_exchange_rate_stg` + -- dedup_exchange_rate from test_normalization._airbyte_raw_dedup_exchange_rate +), + +scd_data as ( + -- SQL model to build a Type 2 Slowly Changing Dimension (SCD) table for each record identified by their primary key + select + md5(cast(coalesce(cast(id as + string +), '') || '-' || coalesce(cast(currency as + string +), '') || '-' || coalesce(cast(NZD as + string +), '') as + string +)) as _airbyte_unique_key, + id, + currency, + date, + timestamp_col, + HKD_special___characters, + HKD_special___characters_1, + NZD, + USD, + date as _airbyte_start_at, + lag(date) over ( + partition by id, currency, cast(NZD as + string +) + order by + date is null asc, + date desc, + _airbyte_emitted_at desc + ) as _airbyte_end_at, + case when row_number() over ( + partition by id, currency, cast(NZD as + string +) + order by + date is null asc, + date desc, + _airbyte_emitted_at desc + ) = 1 then 1 else 0 end as _airbyte_active_row, + _airbyte_ab_id, + _airbyte_emitted_at, + _airbyte_dedup_exchange_rate_hashid + from input_data +), +dedup_data as ( + select + -- we need to ensure de-duplicated rows for merge/update queries + -- additionally, we generate a unique key for the scd table + row_number() over ( + partition by + _airbyte_unique_key, + _airbyte_start_at, + _airbyte_emitted_at + order by _airbyte_active_row desc, _airbyte_ab_id + ) as _airbyte_row_num, + md5(cast(coalesce(cast(_airbyte_unique_key as + string +), '') || '-' || coalesce(cast(_airbyte_start_at as + string +), '') || '-' || coalesce(cast(_airbyte_emitted_at as + string +), '') as + string +)) as _airbyte_unique_key_scd, + scd_data.* + from scd_data +) +select + _airbyte_unique_key, + _airbyte_unique_key_scd, + id, + currency, + date, + timestamp_col, + HKD_special___characters, + HKD_special___characters_1, + NZD, + USD, + _airbyte_start_at, + _airbyte_end_at, + _airbyte_active_row, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at, + _airbyte_dedup_exchange_rate_hashid +from dedup_data where _airbyte_row_num = 1 \ No newline at end of file diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_incremental/test_normalization/dedup_exchange_rate.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_incremental/test_normalization/dedup_exchange_rate.sql new file mode 100644 index 0000000000000..f372290d40bc3 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_incremental/test_normalization/dedup_exchange_rate.sql @@ -0,0 +1,34 @@ + + create or replace table test_normalization.`dedup_exchange_rate` + + + using delta + + + + + + as + +-- Final base SQL model +-- depends_on: test_normalization.`dedup_exchange_rate_scd` +select + _airbyte_unique_key, + id, + currency, + date, + timestamp_col, + HKD_special___characters, + HKD_special___characters_1, + NZD, + USD, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at, + _airbyte_dedup_exchange_rate_hashid +from test_normalization.`dedup_exchange_rate_scd` +-- dedup_exchange_rate from test_normalization._airbyte_raw_dedup_exchange_rate +where 1 = 1 +and _airbyte_active_row = 1 diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_tables/test_normalization/exchange_rate.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_tables/test_normalization/exchange_rate.sql new file mode 100644 index 0000000000000..dde9e833b1067 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_tables/test_normalization/exchange_rate.sql @@ -0,0 +1,125 @@ + + create or replace table test_normalization.`exchange_rate` + + + using delta + + + + + + as + +with __dbt__cte__exchange_rate_ab1 as ( + +-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema +-- depends_on: test_normalization._airbyte_raw_exchange_rate +select + get_json_object(_airbyte_data, '$.id') as id, + get_json_object(_airbyte_data, '$.currency') as currency, + get_json_object(_airbyte_data, '$.date') as date, + get_json_object(_airbyte_data, '$.timestamp_col') as timestamp_col, + get_json_object(_airbyte_data, '$.HKD@spéçiΓ€l & characters') as HKD_special___characters, + get_json_object(_airbyte_data, '$.HKD_special___characters') as HKD_special___characters_1, + get_json_object(_airbyte_data, '$.NZD') as NZD, + get_json_object(_airbyte_data, '$.USD') as USD, + get_json_object(_airbyte_data, '$.column`_\'with"_quotes') as column___with__quotes, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from test_normalization._airbyte_raw_exchange_rate as table_alias +-- exchange_rate +where 1 = 1 +), __dbt__cte__exchange_rate_ab2 as ( + +-- SQL model to cast each column to its adequate SQL type converted from the JSON schema type +-- depends_on: __dbt__cte__exchange_rate_ab1 +select + cast(id as + BIGINT +) as id, + cast(currency as + string +) as currency, + cast(nullif(date, '') as + date +) as date, + cast(nullif(timestamp_col, '') as + timestamp +) as timestamp_col, + cast(HKD_special___characters as + float +) as HKD_special___characters, + cast(HKD_special___characters_1 as + string +) as HKD_special___characters_1, + cast(NZD as + float +) as NZD, + cast(USD as + float +) as USD, + cast(column___with__quotes as + string +) as column___with__quotes, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from __dbt__cte__exchange_rate_ab1 +-- exchange_rate +where 1 = 1 +), __dbt__cte__exchange_rate_ab3 as ( + +-- SQL model to build a hash column based on the values of this record +-- depends_on: __dbt__cte__exchange_rate_ab2 +select + md5(cast(coalesce(cast(id as + string +), '') || '-' || coalesce(cast(currency as + string +), '') || '-' || coalesce(cast(date as + string +), '') || '-' || coalesce(cast(timestamp_col as + string +), '') || '-' || coalesce(cast(HKD_special___characters as + string +), '') || '-' || coalesce(cast(HKD_special___characters_1 as + string +), '') || '-' || coalesce(cast(NZD as + string +), '') || '-' || coalesce(cast(USD as + string +), '') || '-' || coalesce(cast(column___with__quotes as + string +), '') as + string +)) as _airbyte_exchange_rate_hashid, + tmp.* +from __dbt__cte__exchange_rate_ab2 tmp +-- exchange_rate +where 1 = 1 +)-- Final base SQL model +-- depends_on: __dbt__cte__exchange_rate_ab3 +select + id, + currency, + date, + timestamp_col, + HKD_special___characters, + HKD_special___characters_1, + NZD, + USD, + column___with__quotes, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at, + _airbyte_exchange_rate_hashid +from __dbt__cte__exchange_rate_ab3 +-- exchange_rate from test_normalization._airbyte_raw_exchange_rate +where 1 = 1 \ No newline at end of file diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql new file mode 100644 index 0000000000000..1892c3652bd0e --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql @@ -0,0 +1,91 @@ +create or replace view _airbyte_test_normalization.`dedup_exchange_rate_stg` + + as + +with __dbt__cte__dedup_exchange_rate_ab1 as ( + +-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema +-- depends_on: test_normalization._airbyte_raw_dedup_exchange_rate +select + get_json_object(_airbyte_data, '$.id') as id, + get_json_object(_airbyte_data, '$.currency') as currency, + get_json_object(_airbyte_data, '$.date') as date, + get_json_object(_airbyte_data, '$.timestamp_col') as timestamp_col, + get_json_object(_airbyte_data, '$.HKD@spéçiΓ€l & characters') as HKD_special___characters, + get_json_object(_airbyte_data, '$.HKD_special___characters') as HKD_special___characters_1, + get_json_object(_airbyte_data, '$.NZD') as NZD, + get_json_object(_airbyte_data, '$.USD') as USD, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from test_normalization._airbyte_raw_dedup_exchange_rate as table_alias +-- dedup_exchange_rate +where 1 = 1 + +), __dbt__cte__dedup_exchange_rate_ab2 as ( + +-- SQL model to cast each column to its adequate SQL type converted from the JSON schema type +-- depends_on: __dbt__cte__dedup_exchange_rate_ab1 +select + cast(id as + BIGINT +) as id, + cast(currency as + string +) as currency, + cast(nullif(date, '') as + date +) as date, + cast(nullif(timestamp_col, '') as + timestamp +) as timestamp_col, + cast(HKD_special___characters as + float +) as HKD_special___characters, + cast(HKD_special___characters_1 as + string +) as HKD_special___characters_1, + cast(NZD as + float +) as NZD, + cast(USD as + float +) as USD, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from __dbt__cte__dedup_exchange_rate_ab1 +-- dedup_exchange_rate +where 1 = 1 + +)-- SQL model to build a hash column based on the values of this record +-- depends_on: __dbt__cte__dedup_exchange_rate_ab2 +select + md5(cast(coalesce(cast(id as + string +), '') || '-' || coalesce(cast(currency as + string +), '') || '-' || coalesce(cast(date as + string +), '') || '-' || coalesce(cast(timestamp_col as + string +), '') || '-' || coalesce(cast(HKD_special___characters as + string +), '') || '-' || coalesce(cast(HKD_special___characters_1 as + string +), '') || '-' || coalesce(cast(NZD as + string +), '') || '-' || coalesce(cast(USD as + string +), '') as + string +)) as _airbyte_dedup_exchange_rate_hashid, + tmp.* +from __dbt__cte__dedup_exchange_rate_ab2 tmp +-- dedup_exchange_rate +where 1 = 1 + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_views/test_normalization/multiple_column_names_conflicts_stg.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_views/test_normalization/multiple_column_names_conflicts_stg.sql new file mode 100644 index 0000000000000..771ea4d0e4ad9 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/first_output/airbyte_views/test_normalization/multiple_column_names_conflicts_stg.sql @@ -0,0 +1,85 @@ +create or replace view _airbyte_test_normalization.`multiple_column_names_conflicts_stg` + + as + +with __dbt__cte__multiple_column_names_conflicts_ab1 as ( + +-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema +-- depends_on: test_normalization._airbyte_raw_multiple_column_names_conflicts +select + get_json_object(_airbyte_data, '$.id') as id, + get_json_object(_airbyte_data, '$.User Id') as User_Id, + get_json_object(_airbyte_data, '$.user_id') as user_id_1, + get_json_object(_airbyte_data, '$.User id') as User_id_2, + get_json_object(_airbyte_data, '$.user id') as user_id_3, + get_json_object(_airbyte_data, '$.User@Id') as User_Id_4, + get_json_object(_airbyte_data, '$.UserId') as UserId, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from test_normalization._airbyte_raw_multiple_column_names_conflicts as table_alias +-- multiple_column_names_conflicts +where 1 = 1 + +), __dbt__cte__multiple_column_names_conflicts_ab2 as ( + +-- SQL model to cast each column to its adequate SQL type converted from the JSON schema type +-- depends_on: __dbt__cte__multiple_column_names_conflicts_ab1 +select + cast(id as + BIGINT +) as id, + cast(User_Id as + string +) as User_Id, + cast(user_id_1 as + float +) as user_id_1, + cast(User_id_2 as + float +) as User_id_2, + cast(user_id_3 as + float +) as user_id_3, + cast(User_Id_4 as + string +) as User_Id_4, + cast(UserId as + float +) as UserId, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from __dbt__cte__multiple_column_names_conflicts_ab1 +-- multiple_column_names_conflicts +where 1 = 1 + +)-- SQL model to build a hash column based on the values of this record +-- depends_on: __dbt__cte__multiple_column_names_conflicts_ab2 +select + md5(cast(coalesce(cast(id as + string +), '') || '-' || coalesce(cast(User_Id as + string +), '') || '-' || coalesce(cast(user_id_1 as + string +), '') || '-' || coalesce(cast(User_id_2 as + string +), '') || '-' || coalesce(cast(user_id_3 as + string +), '') || '-' || coalesce(cast(User_Id_4 as + string +), '') || '-' || coalesce(cast(UserId as + string +), '') as + string +)) as _airbyte_multiple_column_names_conflicts_hashid, + tmp.* +from __dbt__cte__multiple_column_names_conflicts_ab2 tmp +-- multiple_column_names_conflicts +where 1 = 1 + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab1.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab1.sql new file mode 100644 index 0000000000000..e9c00e6398a8d --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab1.sql @@ -0,0 +1,24 @@ +{{ config( + unique_key = '_airbyte_ab_id', + schema = "_airbyte_test_normalization", + tags = [ "top-level-intermediate" ] +) }} +-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema +-- depends_on: {{ source('test_normalization', '_airbyte_raw_dedup_exchange_rate') }} +select + {{ json_extract_scalar('_airbyte_data', ['id'], ['id']) }} as id, + {{ json_extract_scalar('_airbyte_data', ['currency'], ['currency']) }} as currency, + {{ json_extract_scalar('_airbyte_data', ['date'], ['date']) }} as date, + {{ json_extract_scalar('_airbyte_data', ['timestamp_col'], ['timestamp_col']) }} as timestamp_col, + {{ json_extract_scalar('_airbyte_data', ['HKD@spéçiΓ€l & characters'], ['HKD@spéçiΓ€l & characters']) }} as HKD_special___characters, + {{ json_extract_scalar('_airbyte_data', ['HKD_special___characters'], ['HKD_special___characters']) }} as HKD_special___characters_1, + {{ json_extract_scalar('_airbyte_data', ['NZD'], ['NZD']) }} as NZD, + {{ json_extract_scalar('_airbyte_data', ['USD'], ['USD']) }} as USD, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at +from {{ source('test_normalization', '_airbyte_raw_dedup_exchange_rate') }} as table_alias +-- dedup_exchange_rate +where 1 = 1 +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab2.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab2.sql new file mode 100644 index 0000000000000..b6fe4404ddaad --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab2.sql @@ -0,0 +1,24 @@ +{{ config( + unique_key = '_airbyte_ab_id', + schema = "_airbyte_test_normalization", + tags = [ "top-level-intermediate" ] +) }} +-- SQL model to cast each column to its adequate SQL type converted from the JSON schema type +-- depends_on: {{ ref('dedup_exchange_rate_ab1') }} +select + cast(id as {{ dbt_utils.type_bigint() }}) as id, + cast(currency as {{ dbt_utils.type_string() }}) as currency, + cast({{ empty_string_to_null('date') }} as {{ type_date() }}) as date, + cast({{ empty_string_to_null('timestamp_col') }} as {{ type_timestamp_with_timezone() }}) as timestamp_col, + cast(HKD_special___characters as {{ dbt_utils.type_float() }}) as HKD_special___characters, + cast(HKD_special___characters_1 as {{ dbt_utils.type_string() }}) as HKD_special___characters_1, + cast(NZD as {{ dbt_utils.type_float() }}) as NZD, + cast(USD as {{ dbt_utils.type_float() }}) as USD, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at +from {{ ref('dedup_exchange_rate_ab1') }} +-- dedup_exchange_rate +where 1 = 1 +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql new file mode 100644 index 0000000000000..d55bd46b740c2 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql @@ -0,0 +1,130 @@ +{{ config( + unique_key = "_airbyte_unique_key_scd", + schema = "test_normalization", + post_hook = ["drop view _airbyte_test_normalization.dedup_exchange_rate_stg"], + tags = [ "top-level" ] +) }} +-- depends_on: ref('dedup_exchange_rate_stg') +with +{% if is_incremental() %} +new_data as ( + -- retrieve incremental "new" data + select + * + from {{ ref('dedup_exchange_rate_stg') }} + -- dedup_exchange_rate from {{ source('test_normalization', '_airbyte_raw_dedup_exchange_rate') }} + where 1 = 1 + {{ incremental_clause('_airbyte_emitted_at') }} +), +new_data_ids as ( + -- build a subset of _airbyte_unique_key from rows that are new + select distinct + {{ dbt_utils.surrogate_key([ + 'id', + 'currency', + 'NZD', + ]) }} as _airbyte_unique_key + from new_data +), +empty_new_data as ( + -- build an empty table to only keep the table's column types + select * from new_data where 1 = 0 +), +previous_active_scd_data as ( + -- retrieve "incomplete old" data that needs to be updated with an end date because of new changes + select + {{ star_intersect(ref('dedup_exchange_rate_stg'), this, from_alias='inc_data', intersect_alias='this_data') }} + from {{ this }} as this_data + -- make a join with new_data using primary key to filter active data that need to be updated only + join new_data_ids on this_data._airbyte_unique_key = new_data_ids._airbyte_unique_key + -- force left join to NULL values (we just need to transfer column types only for the star_intersect macro on schema changes) + left join empty_new_data as inc_data on this_data._airbyte_ab_id = inc_data._airbyte_ab_id + where _airbyte_active_row = 1 +), +input_data as ( + select {{ dbt_utils.star(ref('dedup_exchange_rate_stg')) }} from new_data + union all + select {{ dbt_utils.star(ref('dedup_exchange_rate_stg')) }} from previous_active_scd_data +), +{% else %} +input_data as ( + select * + from {{ ref('dedup_exchange_rate_stg') }} + -- dedup_exchange_rate from {{ source('test_normalization', '_airbyte_raw_dedup_exchange_rate') }} +), +{% endif %} +scd_data as ( + -- SQL model to build a Type 2 Slowly Changing Dimension (SCD) table for each record identified by their primary key + select + {{ dbt_utils.surrogate_key([ + 'id', + 'currency', + 'NZD', + ]) }} as _airbyte_unique_key, + id, + currency, + date, + timestamp_col, + HKD_special___characters, + HKD_special___characters_1, + NZD, + USD, + date as _airbyte_start_at, + lag(date) over ( + partition by id, currency, cast(NZD as {{ dbt_utils.type_string() }}) + order by + date is null asc, + date desc, + _airbyte_emitted_at desc + ) as _airbyte_end_at, + case when row_number() over ( + partition by id, currency, cast(NZD as {{ dbt_utils.type_string() }}) + order by + date is null asc, + date desc, + _airbyte_emitted_at desc + ) = 1 then 1 else 0 end as _airbyte_active_row, + _airbyte_ab_id, + _airbyte_emitted_at, + _airbyte_dedup_exchange_rate_hashid + from input_data +), +dedup_data as ( + select + -- we need to ensure de-duplicated rows for merge/update queries + -- additionally, we generate a unique key for the scd table + row_number() over ( + partition by + _airbyte_unique_key, + _airbyte_start_at, + _airbyte_emitted_at + order by _airbyte_active_row desc, _airbyte_ab_id + ) as _airbyte_row_num, + {{ dbt_utils.surrogate_key([ + '_airbyte_unique_key', + '_airbyte_start_at', + '_airbyte_emitted_at' + ]) }} as _airbyte_unique_key_scd, + scd_data.* + from scd_data +) +select + _airbyte_unique_key, + _airbyte_unique_key_scd, + id, + currency, + date, + timestamp_col, + HKD_special___characters, + HKD_special___characters_1, + NZD, + USD, + _airbyte_start_at, + _airbyte_end_at, + _airbyte_active_row, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at, + _airbyte_dedup_exchange_rate_hashid +from dedup_data where _airbyte_row_num = 1 + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_incremental/test_normalization/dedup_exchange_rate.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_incremental/test_normalization/dedup_exchange_rate.sql new file mode 100644 index 0000000000000..c04b6bec8879e --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_incremental/test_normalization/dedup_exchange_rate.sql @@ -0,0 +1,27 @@ +{{ config( + unique_key = "_airbyte_unique_key", + schema = "test_normalization", + tags = [ "top-level" ] +) }} +-- Final base SQL model +-- depends_on: {{ ref('dedup_exchange_rate_scd') }} +select + _airbyte_unique_key, + id, + currency, + date, + timestamp_col, + HKD_special___characters, + HKD_special___characters_1, + NZD, + USD, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at, + _airbyte_dedup_exchange_rate_hashid +from {{ ref('dedup_exchange_rate_scd') }} +-- dedup_exchange_rate from {{ source('test_normalization', '_airbyte_raw_dedup_exchange_rate') }} +where 1 = 1 +and _airbyte_active_row = 1 +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_tables/test_normalization/exchange_rate.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_tables/test_normalization/exchange_rate.sql new file mode 100644 index 0000000000000..1e53f86818650 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_tables/test_normalization/exchange_rate.sql @@ -0,0 +1,25 @@ +{{ config( + unique_key = '_airbyte_ab_id', + schema = "test_normalization", + tags = [ "top-level" ] +) }} +-- Final base SQL model +-- depends_on: {{ ref('exchange_rate_ab3') }} +select + id, + currency, + date, + timestamp_col, + HKD_special___characters, + HKD_special___characters_1, + NZD, + USD, + column___with__quotes, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at, + _airbyte_exchange_rate_hashid +from {{ ref('exchange_rate_ab3') }} +-- exchange_rate from {{ source('test_normalization', '_airbyte_raw_exchange_rate') }} +where 1 = 1 + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql new file mode 100644 index 0000000000000..53ce2f42d55b0 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql @@ -0,0 +1,24 @@ +{{ config( + unique_key = '_airbyte_ab_id', + schema = "_airbyte_test_normalization", + tags = [ "top-level-intermediate" ] +) }} +-- SQL model to build a hash column based on the values of this record +-- depends_on: {{ ref('dedup_exchange_rate_ab2') }} +select + {{ dbt_utils.surrogate_key([ + 'id', + 'currency', + 'date', + 'timestamp_col', + 'HKD_special___characters', + 'HKD_special___characters_1', + 'NZD', + 'USD', + ]) }} as _airbyte_dedup_exchange_rate_hashid, + tmp.* +from {{ ref('dedup_exchange_rate_ab2') }} tmp +-- dedup_exchange_rate +where 1 = 1 +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/sources.yml b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/sources.yml new file mode 100644 index 0000000000000..97bf0d05cbd40 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/models/generated/sources.yml @@ -0,0 +1,15 @@ +version: 2 +sources: +- name: test_normalization + quoting: + database: true + schema: false + identifier: false + tables: + - name: _airbyte_raw_1_prefix_startwith_number + - name: _airbyte_raw_dedup_cdc_excluded + - name: _airbyte_raw_dedup_exchange_rate + - name: _airbyte_raw_exchange_rate + - name: _airbyte_raw_multiple_column_names_conflicts + - name: _airbyte_raw_pos_dedup_cdcx + - name: _airbyte_raw_renamed_dedup_cdc_excluded diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab1.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab1.sql new file mode 100644 index 0000000000000..72508b74a2e98 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab1.sql @@ -0,0 +1,24 @@ +{{ config( + unique_key = '_airbyte_ab_id', + schema = "_airbyte_test_normalization", + tags = [ "top-level-intermediate" ] +) }} +-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema +-- depends_on: {{ source('test_normalization', '_airbyte_raw_dedup_exchange_rate') }} +select + {{ json_extract_scalar('_airbyte_data', ['id'], ['id']) }} as id, + {{ json_extract_scalar('_airbyte_data', ['currency'], ['currency']) }} as currency, + {{ json_extract_scalar('_airbyte_data', ['new_column'], ['new_column']) }} as new_column, + {{ json_extract_scalar('_airbyte_data', ['date'], ['date']) }} as date, + {{ json_extract_scalar('_airbyte_data', ['timestamp_col'], ['timestamp_col']) }} as timestamp_col, + {{ json_extract_scalar('_airbyte_data', ['HKD@spéçiΓ€l & characters'], ['HKD@spéçiΓ€l & characters']) }} as HKD_special___characters, + {{ json_extract_scalar('_airbyte_data', ['NZD'], ['NZD']) }} as NZD, + {{ json_extract_scalar('_airbyte_data', ['USD'], ['USD']) }} as USD, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at +from {{ source('test_normalization', '_airbyte_raw_dedup_exchange_rate') }} as table_alias +-- dedup_exchange_rate +where 1 = 1 +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab2.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab2.sql new file mode 100644 index 0000000000000..5ed92210108af --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_ctes/test_normalization/dedup_exchange_rate_ab2.sql @@ -0,0 +1,24 @@ +{{ config( + unique_key = '_airbyte_ab_id', + schema = "_airbyte_test_normalization", + tags = [ "top-level-intermediate" ] +) }} +-- SQL model to cast each column to its adequate SQL type converted from the JSON schema type +-- depends_on: {{ ref('dedup_exchange_rate_ab1') }} +select + cast(id as {{ dbt_utils.type_float() }}) as id, + cast(currency as {{ dbt_utils.type_string() }}) as currency, + cast(new_column as {{ dbt_utils.type_float() }}) as new_column, + cast({{ empty_string_to_null('date') }} as {{ type_date() }}) as date, + cast({{ empty_string_to_null('timestamp_col') }} as {{ type_timestamp_with_timezone() }}) as timestamp_col, + cast(HKD_special___characters as {{ dbt_utils.type_float() }}) as HKD_special___characters, + cast(NZD as {{ dbt_utils.type_float() }}) as NZD, + cast(USD as {{ dbt_utils.type_bigint() }}) as USD, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at +from {{ ref('dedup_exchange_rate_ab1') }} +-- dedup_exchange_rate +where 1 = 1 +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql new file mode 100644 index 0000000000000..42e2990788d14 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql @@ -0,0 +1,130 @@ +{{ config( + unique_key = "_airbyte_unique_key_scd", + schema = "test_normalization", + post_hook = ["drop view _airbyte_test_normalization.dedup_exchange_rate_stg"], + tags = [ "top-level" ] +) }} +-- depends_on: ref('dedup_exchange_rate_stg') +with +{% if is_incremental() %} +new_data as ( + -- retrieve incremental "new" data + select + * + from {{ ref('dedup_exchange_rate_stg') }} + -- dedup_exchange_rate from {{ source('test_normalization', '_airbyte_raw_dedup_exchange_rate') }} + where 1 = 1 + {{ incremental_clause('_airbyte_emitted_at') }} +), +new_data_ids as ( + -- build a subset of _airbyte_unique_key from rows that are new + select distinct + {{ dbt_utils.surrogate_key([ + 'id', + 'currency', + 'NZD', + ]) }} as _airbyte_unique_key + from new_data +), +empty_new_data as ( + -- build an empty table to only keep the table's column types + select * from new_data where 1 = 0 +), +previous_active_scd_data as ( + -- retrieve "incomplete old" data that needs to be updated with an end date because of new changes + select + {{ star_intersect(ref('dedup_exchange_rate_stg'), this, from_alias='inc_data', intersect_alias='this_data') }} + from {{ this }} as this_data + -- make a join with new_data using primary key to filter active data that need to be updated only + join new_data_ids on this_data._airbyte_unique_key = new_data_ids._airbyte_unique_key + -- force left join to NULL values (we just need to transfer column types only for the star_intersect macro on schema changes) + left join empty_new_data as inc_data on this_data._airbyte_ab_id = inc_data._airbyte_ab_id + where _airbyte_active_row = 1 +), +input_data as ( + select {{ dbt_utils.star(ref('dedup_exchange_rate_stg')) }} from new_data + union all + select {{ dbt_utils.star(ref('dedup_exchange_rate_stg')) }} from previous_active_scd_data +), +{% else %} +input_data as ( + select * + from {{ ref('dedup_exchange_rate_stg') }} + -- dedup_exchange_rate from {{ source('test_normalization', '_airbyte_raw_dedup_exchange_rate') }} +), +{% endif %} +scd_data as ( + -- SQL model to build a Type 2 Slowly Changing Dimension (SCD) table for each record identified by their primary key + select + {{ dbt_utils.surrogate_key([ + 'id', + 'currency', + 'NZD', + ]) }} as _airbyte_unique_key, + id, + currency, + new_column, + date, + timestamp_col, + HKD_special___characters, + NZD, + USD, + date as _airbyte_start_at, + lag(date) over ( + partition by cast(id as {{ dbt_utils.type_string() }}), currency, cast(NZD as {{ dbt_utils.type_string() }}) + order by + date is null asc, + date desc, + _airbyte_emitted_at desc + ) as _airbyte_end_at, + case when row_number() over ( + partition by cast(id as {{ dbt_utils.type_string() }}), currency, cast(NZD as {{ dbt_utils.type_string() }}) + order by + date is null asc, + date desc, + _airbyte_emitted_at desc + ) = 1 then 1 else 0 end as _airbyte_active_row, + _airbyte_ab_id, + _airbyte_emitted_at, + _airbyte_dedup_exchange_rate_hashid + from input_data +), +dedup_data as ( + select + -- we need to ensure de-duplicated rows for merge/update queries + -- additionally, we generate a unique key for the scd table + row_number() over ( + partition by + _airbyte_unique_key, + _airbyte_start_at, + _airbyte_emitted_at + order by _airbyte_active_row desc, _airbyte_ab_id + ) as _airbyte_row_num, + {{ dbt_utils.surrogate_key([ + '_airbyte_unique_key', + '_airbyte_start_at', + '_airbyte_emitted_at' + ]) }} as _airbyte_unique_key_scd, + scd_data.* + from scd_data +) +select + _airbyte_unique_key, + _airbyte_unique_key_scd, + id, + currency, + new_column, + date, + timestamp_col, + HKD_special___characters, + NZD, + USD, + _airbyte_start_at, + _airbyte_end_at, + _airbyte_active_row, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at, + _airbyte_dedup_exchange_rate_hashid +from dedup_data where _airbyte_row_num = 1 + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_incremental/test_normalization/dedup_exchange_rate.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_incremental/test_normalization/dedup_exchange_rate.sql new file mode 100644 index 0000000000000..1d70a92d24b40 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_incremental/test_normalization/dedup_exchange_rate.sql @@ -0,0 +1,27 @@ +{{ config( + unique_key = "_airbyte_unique_key", + schema = "test_normalization", + tags = [ "top-level" ] +) }} +-- Final base SQL model +-- depends_on: {{ ref('dedup_exchange_rate_scd') }} +select + _airbyte_unique_key, + id, + currency, + new_column, + date, + timestamp_col, + HKD_special___characters, + NZD, + USD, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at, + _airbyte_dedup_exchange_rate_hashid +from {{ ref('dedup_exchange_rate_scd') }} +-- dedup_exchange_rate from {{ source('test_normalization', '_airbyte_raw_dedup_exchange_rate') }} +where 1 = 1 +and _airbyte_active_row = 1 +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_tables/test_normalization/exchange_rate.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_tables/test_normalization/exchange_rate.sql new file mode 100644 index 0000000000000..3270f3243c334 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_tables/test_normalization/exchange_rate.sql @@ -0,0 +1,25 @@ +{{ config( + unique_key = '_airbyte_ab_id', + schema = "test_normalization", + tags = [ "top-level" ] +) }} +-- Final base SQL model +-- depends_on: {{ ref('exchange_rate_ab3') }} +select + id, + currency, + new_column, + date, + timestamp_col, + HKD_special___characters, + NZD, + USD, + column___with__quotes, + _airbyte_ab_id, + _airbyte_emitted_at, + {{ current_timestamp() }} as _airbyte_normalized_at, + _airbyte_exchange_rate_hashid +from {{ ref('exchange_rate_ab3') }} +-- exchange_rate from {{ source('test_normalization', '_airbyte_raw_exchange_rate') }} +where 1 = 1 + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql new file mode 100644 index 0000000000000..1630302c61386 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql @@ -0,0 +1,24 @@ +{{ config( + unique_key = '_airbyte_ab_id', + schema = "_airbyte_test_normalization", + tags = [ "top-level-intermediate" ] +) }} +-- SQL model to build a hash column based on the values of this record +-- depends_on: {{ ref('dedup_exchange_rate_ab2') }} +select + {{ dbt_utils.surrogate_key([ + 'id', + 'currency', + 'new_column', + 'date', + 'timestamp_col', + 'HKD_special___characters', + 'NZD', + 'USD', + ]) }} as _airbyte_dedup_exchange_rate_hashid, + tmp.* +from {{ ref('dedup_exchange_rate_ab2') }} tmp +-- dedup_exchange_rate +where 1 = 1 +{{ incremental_clause('_airbyte_emitted_at') }} + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/sources.yml b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/sources.yml new file mode 100644 index 0000000000000..dd538a80131ae --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/modified_models/generated/sources.yml @@ -0,0 +1,11 @@ +version: 2 +sources: +- name: test_normalization + quoting: + database: true + schema: false + identifier: false + tables: + - name: _airbyte_raw_dedup_exchange_rate + - name: _airbyte_raw_exchange_rate + - name: _airbyte_raw_renamed_dedup_cdc_excluded diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql new file mode 100644 index 0000000000000..dc3aad7922dbf --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_incremental/scd/test_normalization/dedup_exchange_rate_scd.sql @@ -0,0 +1,17 @@ + + + + + merge into test_normalization.`dedup_exchange_rate_scd` as DBT_INTERNAL_DEST + using `dedup_exchange_rate_scd__dbt_tmp` as DBT_INTERNAL_SOURCE + + + + on DBT_INTERNAL_SOURCE._airbyte_unique_key_scd = DBT_INTERNAL_DEST._airbyte_unique_key_scd + + + + when matched then update set + * + + when not matched then insert * diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_incremental/test_normalization/dedup_exchange_rate.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_incremental/test_normalization/dedup_exchange_rate.sql new file mode 100644 index 0000000000000..d67613d286f9d --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_incremental/test_normalization/dedup_exchange_rate.sql @@ -0,0 +1,17 @@ + + + + + merge into test_normalization.`dedup_exchange_rate` as DBT_INTERNAL_DEST + using `dedup_exchange_rate__dbt_tmp` as DBT_INTERNAL_SOURCE + + + + on DBT_INTERNAL_SOURCE._airbyte_unique_key = DBT_INTERNAL_DEST._airbyte_unique_key + + + + when matched then update set + * + + when not matched then insert * diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_tables/test_normalization/exchange_rate.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_tables/test_normalization/exchange_rate.sql new file mode 100644 index 0000000000000..dde9e833b1067 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_tables/test_normalization/exchange_rate.sql @@ -0,0 +1,125 @@ + + create or replace table test_normalization.`exchange_rate` + + + using delta + + + + + + as + +with __dbt__cte__exchange_rate_ab1 as ( + +-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema +-- depends_on: test_normalization._airbyte_raw_exchange_rate +select + get_json_object(_airbyte_data, '$.id') as id, + get_json_object(_airbyte_data, '$.currency') as currency, + get_json_object(_airbyte_data, '$.date') as date, + get_json_object(_airbyte_data, '$.timestamp_col') as timestamp_col, + get_json_object(_airbyte_data, '$.HKD@spéçiΓ€l & characters') as HKD_special___characters, + get_json_object(_airbyte_data, '$.HKD_special___characters') as HKD_special___characters_1, + get_json_object(_airbyte_data, '$.NZD') as NZD, + get_json_object(_airbyte_data, '$.USD') as USD, + get_json_object(_airbyte_data, '$.column`_\'with"_quotes') as column___with__quotes, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from test_normalization._airbyte_raw_exchange_rate as table_alias +-- exchange_rate +where 1 = 1 +), __dbt__cte__exchange_rate_ab2 as ( + +-- SQL model to cast each column to its adequate SQL type converted from the JSON schema type +-- depends_on: __dbt__cte__exchange_rate_ab1 +select + cast(id as + BIGINT +) as id, + cast(currency as + string +) as currency, + cast(nullif(date, '') as + date +) as date, + cast(nullif(timestamp_col, '') as + timestamp +) as timestamp_col, + cast(HKD_special___characters as + float +) as HKD_special___characters, + cast(HKD_special___characters_1 as + string +) as HKD_special___characters_1, + cast(NZD as + float +) as NZD, + cast(USD as + float +) as USD, + cast(column___with__quotes as + string +) as column___with__quotes, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from __dbt__cte__exchange_rate_ab1 +-- exchange_rate +where 1 = 1 +), __dbt__cte__exchange_rate_ab3 as ( + +-- SQL model to build a hash column based on the values of this record +-- depends_on: __dbt__cte__exchange_rate_ab2 +select + md5(cast(coalesce(cast(id as + string +), '') || '-' || coalesce(cast(currency as + string +), '') || '-' || coalesce(cast(date as + string +), '') || '-' || coalesce(cast(timestamp_col as + string +), '') || '-' || coalesce(cast(HKD_special___characters as + string +), '') || '-' || coalesce(cast(HKD_special___characters_1 as + string +), '') || '-' || coalesce(cast(NZD as + string +), '') || '-' || coalesce(cast(USD as + string +), '') || '-' || coalesce(cast(column___with__quotes as + string +), '') as + string +)) as _airbyte_exchange_rate_hashid, + tmp.* +from __dbt__cte__exchange_rate_ab2 tmp +-- exchange_rate +where 1 = 1 +)-- Final base SQL model +-- depends_on: __dbt__cte__exchange_rate_ab3 +select + id, + currency, + date, + timestamp_col, + HKD_special___characters, + HKD_special___characters_1, + NZD, + USD, + column___with__quotes, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at, + _airbyte_exchange_rate_hashid +from __dbt__cte__exchange_rate_ab3 +-- exchange_rate from test_normalization._airbyte_raw_exchange_rate +where 1 = 1 \ No newline at end of file diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql new file mode 100644 index 0000000000000..1892c3652bd0e --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/integration_tests/normalization_test_output/databricks/test_simple_streams/second_output/airbyte_views/test_normalization/dedup_exchange_rate_stg.sql @@ -0,0 +1,91 @@ +create or replace view _airbyte_test_normalization.`dedup_exchange_rate_stg` + + as + +with __dbt__cte__dedup_exchange_rate_ab1 as ( + +-- SQL model to parse JSON blob stored in a single column and extract into separated field columns as described by the JSON Schema +-- depends_on: test_normalization._airbyte_raw_dedup_exchange_rate +select + get_json_object(_airbyte_data, '$.id') as id, + get_json_object(_airbyte_data, '$.currency') as currency, + get_json_object(_airbyte_data, '$.date') as date, + get_json_object(_airbyte_data, '$.timestamp_col') as timestamp_col, + get_json_object(_airbyte_data, '$.HKD@spéçiΓ€l & characters') as HKD_special___characters, + get_json_object(_airbyte_data, '$.HKD_special___characters') as HKD_special___characters_1, + get_json_object(_airbyte_data, '$.NZD') as NZD, + get_json_object(_airbyte_data, '$.USD') as USD, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from test_normalization._airbyte_raw_dedup_exchange_rate as table_alias +-- dedup_exchange_rate +where 1 = 1 + +), __dbt__cte__dedup_exchange_rate_ab2 as ( + +-- SQL model to cast each column to its adequate SQL type converted from the JSON schema type +-- depends_on: __dbt__cte__dedup_exchange_rate_ab1 +select + cast(id as + BIGINT +) as id, + cast(currency as + string +) as currency, + cast(nullif(date, '') as + date +) as date, + cast(nullif(timestamp_col, '') as + timestamp +) as timestamp_col, + cast(HKD_special___characters as + float +) as HKD_special___characters, + cast(HKD_special___characters_1 as + string +) as HKD_special___characters_1, + cast(NZD as + float +) as NZD, + cast(USD as + float +) as USD, + _airbyte_ab_id, + _airbyte_emitted_at, + + CURRENT_TIMESTAMP + as _airbyte_normalized_at +from __dbt__cte__dedup_exchange_rate_ab1 +-- dedup_exchange_rate +where 1 = 1 + +)-- SQL model to build a hash column based on the values of this record +-- depends_on: __dbt__cte__dedup_exchange_rate_ab2 +select + md5(cast(coalesce(cast(id as + string +), '') || '-' || coalesce(cast(currency as + string +), '') || '-' || coalesce(cast(date as + string +), '') || '-' || coalesce(cast(timestamp_col as + string +), '') || '-' || coalesce(cast(HKD_special___characters as + string +), '') || '-' || coalesce(cast(HKD_special___characters_1 as + string +), '') || '-' || coalesce(cast(NZD as + string +), '') || '-' || coalesce(cast(USD as + string +), '') as + string +)) as _airbyte_dedup_exchange_rate_hashid, + tmp.* +from __dbt__cte__dedup_exchange_rate_ab2 tmp +-- dedup_exchange_rate +where 1 = 1 + diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/resources/test_nested_streams/data_input/replace_identifiers.json b/airbyte-integrations/bases/base-normalization/integration_tests/resources/test_nested_streams/data_input/replace_identifiers.json index e15f5b7dd7f94..8ce50d0dd24a8 100644 --- a/airbyte-integrations/bases/base-normalization/integration_tests/resources/test_nested_streams/data_input/replace_identifiers.json +++ b/airbyte-integrations/bases/base-normalization/integration_tests/resources/test_nested_streams/data_input/replace_identifiers.json @@ -79,5 +79,8 @@ { "non_nested_stream_without_namespace_resulting_into_long_names": "non_nested_stream_wit__lting_into_long_names" } + ], + "databricks": [ + { "HKD_special___characters": "HKD_special___characters_1", "'\"HKD@spéçiΓ€l & characters\"'": "HKD_special___characters" } ] } diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/resources/test_simple_streams/data_input/replace_identifiers.json b/airbyte-integrations/bases/base-normalization/integration_tests/resources/test_simple_streams/data_input/replace_identifiers.json index ddb47f1fbbcb1..8aae8928c7149 100644 --- a/airbyte-integrations/bases/base-normalization/integration_tests/resources/test_simple_streams/data_input/replace_identifiers.json +++ b/airbyte-integrations/bases/base-normalization/integration_tests/resources/test_simple_streams/data_input/replace_identifiers.json @@ -6,6 +6,9 @@ "\\\"column`_'with\\\"\\\"_quotes\\\" is not null": "column___with__quotes is not null" } ], + "databricks": [ + { "HKD_special___characters": "HKD_special___characters_1", "'\"HKD@spéçiΓ€l & characters\"'": "HKD_special___characters" } + ], "oracle": [ { "HKD_special___characters": "HKD_special___characters_1" }, { "'\"HKD@spéçiΓ€l & characters\"'": "HKD_special___characters" }, diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/test_ephemeral.py b/airbyte-integrations/bases/base-normalization/integration_tests/test_ephemeral.py index 9e86a5771e339..cfc9145be4f9b 100644 --- a/airbyte-integrations/bases/base-normalization/integration_tests/test_ephemeral.py +++ b/airbyte-integrations/bases/base-normalization/integration_tests/test_ephemeral.py @@ -148,6 +148,8 @@ def setup_test_dir(integration_type: str) -> str: copy_tree("../dbt-project-template-oracle", test_root_dir) elif integration_type == DestinationType.SNOWFLAKE.value: copy_tree("../dbt-project-template-snowflake", test_root_dir) + elif integration_type == DestinationType.DATABRICKS.value: + copy_tree("../dbt-project-template-databricks", test_root_dir) return test_root_dir diff --git a/airbyte-integrations/bases/base-normalization/integration_tests/test_normalization.py b/airbyte-integrations/bases/base-normalization/integration_tests/test_normalization.py index 0c72fddf76a7a..d23c1dedaaa43 100644 --- a/airbyte-integrations/bases/base-normalization/integration_tests/test_normalization.py +++ b/airbyte-integrations/bases/base-normalization/integration_tests/test_normalization.py @@ -141,7 +141,7 @@ def run_schema_change_normalization(destination_type: DestinationType, test_reso pytest.skip(f"{destination_type} does not support schema change in incremental yet (requires dbt 0.21.0+)") if destination_type.value in [DestinationType.SNOWFLAKE.value, DestinationType.CLICKHOUSE.value]: pytest.skip(f"{destination_type} is disabled as it doesnt support schema change in incremental yet (column type changes)") - if destination_type.value in [DestinationType.MSSQL.value, DestinationType.SNOWFLAKE.value]: + if destination_type.value in [DestinationType.MSSQL.value, DestinationType.SNOWFLAKE.value, DestinationType.DATABRICKS.value]: # TODO: create/fix github issue in corresponding dbt-adapter repository to handle schema changes (outside airbyte's control) pytest.skip(f"{destination_type} is disabled as it doesnt fully support schema change in incremental yet") @@ -204,6 +204,9 @@ def setup_test_dir(destination_type: DestinationType, test_resource_name: str) - elif destination_type.value == DestinationType.SNOWFLAKE.value: copy_tree("../dbt-project-template-snowflake", test_root_dir) dbt_project_yaml = "../dbt-project-template-snowflake/dbt_project.yml" + elif destination_type.value == DestinationType.DATABRICKS.value: + copy_tree("../dbt-project-template-databricks", test_root_dir) + dbt_project_yaml = "../dbt-project-template-databricks/dbt_project.yml" elif destination_type.value == DestinationType.REDSHIFT.value: copy_tree("../dbt-project-template-redshift", test_root_dir) dbt_project_yaml = "../dbt-project-template-redshift/dbt_project.yml" diff --git a/airbyte-integrations/bases/base-normalization/normalization/destination_type.py b/airbyte-integrations/bases/base-normalization/normalization/destination_type.py index 12da9b2bd0446..624262328cd67 100644 --- a/airbyte-integrations/bases/base-normalization/normalization/destination_type.py +++ b/airbyte-integrations/bases/base-normalization/normalization/destination_type.py @@ -15,6 +15,7 @@ class DestinationType(Enum): POSTGRES = "postgres" REDSHIFT = "redshift" SNOWFLAKE = "snowflake" + DATABRICKS = "databricks" @classmethod def from_string(cls, string_value: str) -> "DestinationType": diff --git a/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/destination_name_transformer.py b/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/destination_name_transformer.py index 29db8e3ff7e39..e06f073416bd5 100644 --- a/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/destination_name_transformer.py +++ b/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/destination_name_transformer.py @@ -27,6 +27,8 @@ DestinationType.MSSQL.value: 64, # https://stackoverflow.com/questions/68358686/what-is-the-maximum-length-of-a-column-in-clickhouse-can-it-be-modified DestinationType.CLICKHOUSE.value: 63, + # https://www.stitchdata.com/docs/destinations/databricks-delta/reference#:~:text=Must%20be%20less,Databricks%20Delta%20Lake%20(AWS). (According that stitch is correct) + DestinationType.DATABRICKS.value: 122, } # DBT also needs to generate suffix to table names, so we need to make sure it has enough characters to do so... @@ -164,11 +166,15 @@ def __normalize_identifier_name( if truncate: result = self.truncate_identifier_name(input_name=result, conflict=conflict, conflict_level=conflict_level) if self.needs_quotes(result): - if self.destination_type.value != DestinationType.MYSQL.value: + if ( + self.destination_type.value != DestinationType.MYSQL.value + and self.destination_type.value != DestinationType.DATABRICKS.value + ): result = result.replace('"', '""') else: result = result.replace("`", "_") - result = result.replace("'", "\\'") + if self.destination_type.value != DestinationType.DATABRICKS.value: + result = result.replace("'", "\\'") result = self.__normalize_identifier_case(result, is_quoted=True) result = self.apply_quote(result) if not in_jinja: @@ -200,6 +206,8 @@ def __normalize_naming_conventions(self, input_name: str, is_column: bool = Fals doesnt_start_with_alphaunderscore = match("[^A-Za-z_]", result[0]) is not None if is_column and doesnt_start_with_alphaunderscore: result = f"_{result}" + elif self.destination_type.value == DestinationType.DATABRICKS.value: + result = transform_standard_naming(result) return result def __normalize_identifier_case(self, input_name: str, is_quoted: bool = False) -> str: @@ -228,6 +236,8 @@ def __normalize_identifier_case(self, input_name: str, is_quoted: bool = False) result = input_name.upper() elif self.destination_type.value == DestinationType.CLICKHOUSE.value: pass + elif self.destination_type.value == DestinationType.DATABRICKS.value: + pass else: raise KeyError(f"Unknown destination type {self.destination_type}") return result @@ -266,6 +276,8 @@ def normalize_column_identifier_case_for_lookup(self, input_name: str, is_quoted result = input_name.upper() elif self.destination_type.value == DestinationType.CLICKHOUSE.value: pass + elif self.destination_type.value == DestinationType.DATABRICKS.value: + result = input_name.lower() else: raise KeyError(f"Unknown destination type {self.destination_type}") return result diff --git a/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/reserved_keywords.py b/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/reserved_keywords.py index 43bc67b26e582..53cf8033b7e49 100644 --- a/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/reserved_keywords.py +++ b/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/reserved_keywords.py @@ -2535,6 +2535,263 @@ "REGR_SYY", } +# https://deepkb.com/CO_000013/en/kb/IMPORT-fbfa59f0-2bf1-31fe-bb7b-0f9efe9932c6/spark-sql-keywords +DATABRICKS = { + "ALL", + "ALTER", + "ANALYZE", + "AND", + "ANTI", + "ANY", + "ARCHIVE", + "ARRAY", + "AS", + "ASC", + "AT", + "AUTHORIZATION", + "BETWEEN", + "BOTH", + "BUCKET", + "BUCKETS", + "BY", + "CACHE", + "CASCADE", + "CASE", + "CAST", + "CHANGE", + "CHECK", + "CLEAR", + "CLUSTER", + "CLUSTERED", + "CODEGEN", + "COLLATE", + "COLLECTION", + "COLUMN", + "COLUMNS", + "COMMENT", + "COMMIT", + "COMPACT", + "COMPACTIONS", + "COMPUTE", + "CONCATENATE", + "CONSTRAINT", + "COST", + "CREATE", + "CROSS", + "CUBE", + "CURRENT", + "CURRENT_DATE", + "CURRENT_TIME", + "CURRENT_TIMESTAMP", + "CURRENT_USER", + "DATA", + "DATABASE", + "DATABASES", + "DAY", + "DBPROPERTIES", + "DEFINED", + "DELETE", + "DELIMITED", + "DESC", + "DESCRIBE", + "DFS", + "DIRECTORIES", + "DIRECTORY", + "DISTINCT", + "DISTRIBUTE", + "DIV", + "DROP", + "ELSE", + "END", + "ESCAPE", + "ESCAPED", + "EXCEPT", + "EXCHANGE", + "EXISTS", + "EXPLAIN", + "EXPORT", + "EXTENDED", + "EXTERNAL", + "EXTRACT", + "FALSE", + "FETCH", + "FIELDS", + "FILTER", + "FILEFORMAT", + "FIRST", + "FIRST_VALUE", + "FOLLOWING", + "FOR", + "FOREIGN", + "FORMAT", + "FORMATTED", + "FROM", + "FULL", + "FUNCTION", + "FUNCTIONS", + "GLOBAL", + "GRANT", + "GROUP", + "GROUPING", + "HAVING", + "HOUR", + "IF", + "IGNORE", + "IMPORT", + "IN", + "INDEX", + "INDEXES", + "INNER", + "INPATH", + "INPUTFORMAT", + "INSERT", + "INTERSECT", + "INTERVAL", + "INTO", + "IS", + "ITEMS", + "JOIN", + "KEYS", + "LAST", + "LAST_VALUE", + "LATERAL", + "LAZY", + "LEADING", + "LEFT", + "LIKE", + "LIMIT", + "LINES", + "LIST", + "LOAD", + "LOCAL", + "LOCATION", + "LOCK", + "LOCKS", + "LOGICAL", + "MACRO", + "MAP", + "MATCHED", + "MERGE", + "MINUS", + "MINUTE", + "MONTH", + "MSCK", + "NAMESPACE", + "NAMESPACES", + "NATURAL", + "NO", + "NOT", + "NULL", + "NULLS", + "OF", + "ON", + "ONLY", + "OPTION", + "OPTIONS", + "OR", + "ORDER", + "OUT", + "OUTER", + "OUTPUTFORMAT", + "OVER", + "OVERLAPS", + "OVERLAY", + "OVERWRITE", + "OWNER", + "PARTITION", + "PARTITIONED", + "PARTITIONS", + "PERCENT", + "PIVOT", + "PLACING", + "POSITION", + "PRECEDING", + "PRIMARY", + "PRINCIPALS", + "PROPERTIES", + "PURGE", + "QUERY", + "RANGE", + "RECORDREADER", + "RECORDWRITER", + "RECOVER", + "REDUCE", + "REFERENCES", + "REFRESH", + "RENAME", + "REPAIR", + "REPLACE", + "RESET", + "RESPECT", + "RESTRICT", + "REVOKE", + "RIGHT", + "RLIKE", + "ROLE", + "ROLES", + "ROLLBACK", + "ROLLUP", + "ROW", + "ROWS", + "SCHEMA", + "SECOND", + "SELECT", + "SEMI", + "SEPARATED", + "SERDE", + "SERDEPROPERTIES", + "SESSION_USER", + "SET", + "SETS", + "SHOW", + "SKEWED", + "SOME", + "SORT", + "SORTED", + "START", + "STATISTICS", + "STORED", + "STRATIFY", + "STRUCT", + "SUBSTR", + "SUBSTRING", + "TABLE", + "TABLES", + "TABLESAMPLE", + "TBLPROPERTIES", + "TEMPORARY", + "TERMINATED", + "THEN", + "TO", + "TOUCH", + "TRAILING", + "TRANSACTION", + "TRANSACTIONS", + "TRANSFORM", + "TRIM", + "TRUE", + "TRUNCATE", + "UNARCHIVE", + "UNBOUNDED", + "UNCACHE", + "UNION", + "UNIQUE", + "UNKNOWN", + "UNLOCK", + "UNSET", + "UPDATE", + "USE", + "USER", + "USING", + "VALUES", + "VIEW", + "WHEN", + "WHERE", + "WINDOW", + "WITH", + "YEAR", +} + # In ClickHouse, keywords are not reserved. # Ref: https://clickhouse.com/docs/en/sql-reference/syntax/#syntax-keywords CLICKHOUSE: Set[str] = set() @@ -2548,6 +2805,7 @@ DestinationType.ORACLE.value: ORACLE, DestinationType.MSSQL.value: MSSQL, DestinationType.CLICKHOUSE.value: CLICKHOUSE, + DestinationType.DATABRICKS.value: DATABRICKS, } diff --git a/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/stream_processor.py b/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/stream_processor.py index 544b030dbedb1..c2b142a9752d7 100644 --- a/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/stream_processor.py +++ b/airbyte-integrations/bases/base-normalization/normalization/transform_catalog/stream_processor.py @@ -44,7 +44,8 @@ class PartitionScheme(Enum): """ ACTIVE_ROW = "active_row" # partition by _airbyte_active_row - UNIQUE_KEY = "unique_key" # partition by _airbyte_emitted_at, sorted by _airbyte_unique_key + # partition by _airbyte_emitted_at, sorted by _airbyte_unique_key + UNIQUE_KEY = "unique_key" NOTHING = "nothing" # no partitions DEFAULT = "" # partition by _airbyte_emitted_at @@ -1264,6 +1265,75 @@ def get_model_materialization_mode(self, is_intermediate: bool, column_count: in else: return TableMaterializationType.TABLE + def partition_config_bigquery(self, partition_by: PartitionScheme) -> Dict: + config = {} + # see https://docs.getdbt.com/reference/resource-configs/bigquery-configs + if partition_by == PartitionScheme.UNIQUE_KEY: + config["cluster_by"] = f'["{self.airbyte_unique_key}","{self.airbyte_emitted_at}"]' + elif partition_by == PartitionScheme.ACTIVE_ROW: + config["cluster_by"] = f'["{self.airbyte_unique_key}_scd","{self.airbyte_emitted_at}"]' + else: + config["cluster_by"] = f'"{self.airbyte_emitted_at}"' + if partition_by == PartitionScheme.ACTIVE_ROW: + config["partition_by"] = ( + '{"field": "_airbyte_active_row", "data_type": "int64", ' '"range": {"start": 0, "end": 1, "interval": 1}}' + ) + elif partition_by == PartitionScheme.NOTHING: + pass + else: + config["partition_by"] = '{"field": "' + self.airbyte_emitted_at + '", "data_type": "timestamp", "granularity": "day"}' + return config + + def partition_config_postgres(self, partition_by: PartitionScheme) -> Dict: + # see https://docs.getdbt.com/reference/resource-configs/postgres-configs + config = {} + if partition_by == PartitionScheme.ACTIVE_ROW: + config["indexes"] = ( + "[{'columns':['_airbyte_active_row','" + + self.airbyte_unique_key + + "_scd','" + + self.airbyte_emitted_at + + "'],'type': 'btree'}]" + ) + elif partition_by == PartitionScheme.UNIQUE_KEY: + config["indexes"] = "[{'columns':['" + self.airbyte_unique_key + "'],'unique':True}]" + else: + config["indexes"] = "[{'columns':['" + self.airbyte_emitted_at + "'],'type':'btree'}]" + return config + + def partition_config_redshift(self, partition_by: PartitionScheme) -> Dict: + # see https://docs.getdbt.com/reference/resource-configs/redshift-configs + config = {} + if partition_by == PartitionScheme.ACTIVE_ROW: + config["sort"] = f'["_airbyte_active_row", "{self.airbyte_unique_key}_scd", "{self.airbyte_emitted_at}"]' + elif partition_by == PartitionScheme.UNIQUE_KEY: + config["sort"] = f'["{self.airbyte_unique_key}", "{self.airbyte_emitted_at}"]' + elif partition_by == PartitionScheme.NOTHING: + pass + else: + config["sort"] = f'"{self.airbyte_emitted_at}"' + return config + + def partition_config_snowflake(self, partition_by: PartitionScheme) -> Dict: + # see https://docs.getdbt.com/reference/resource-configs/snowflake-configs + config = {} + if partition_by == PartitionScheme.ACTIVE_ROW: + config["cluster_by"] = f'["_AIRBYTE_ACTIVE_ROW", "{self.airbyte_unique_key.upper()}_SCD", "{self.airbyte_emitted_at.upper()}"]' + elif partition_by == PartitionScheme.UNIQUE_KEY: + config["cluster_by"] = f'["{self.airbyte_unique_key.upper()}", "{self.airbyte_emitted_at.upper()}"]' + elif partition_by == PartitionScheme.NOTHING: + pass + else: + config["clustered_by"] = f'["{self.airbyte_emitted_at.upper()}"]' + return config + + partition_configurers = { + DestinationType.BIGQUERY: partition_config_bigquery, + DestinationType.POSTGRES: partition_config_postgres, + DestinationType.SNOWFLAKE: partition_config_snowflake, + DestinationType.REDSHIFT: partition_config_redshift, + } + def get_model_partition_config(self, partition_by: PartitionScheme, unique_key: str) -> Dict: """ Defines partition, clustering and unique key parameters for each destination. @@ -1274,59 +1344,12 @@ def get_model_partition_config(self, partition_by: PartitionScheme, unique_key: But in certain models, such as SCD tables for example, we also need to retrieve older data to update their type 2 SCD end_dates, thus a different partitioning scheme is used to optimize that use case. """ - config = {} - if self.destination_type == DestinationType.BIGQUERY: - # see https://docs.getdbt.com/reference/resource-configs/bigquery-configs - if partition_by == PartitionScheme.UNIQUE_KEY: - config["cluster_by"] = f'["{self.airbyte_unique_key}","{self.airbyte_emitted_at}"]' - elif partition_by == PartitionScheme.ACTIVE_ROW: - config["cluster_by"] = f'["{self.airbyte_unique_key}_scd","{self.airbyte_emitted_at}"]' - else: - config["cluster_by"] = f'"{self.airbyte_emitted_at}"' - if partition_by == PartitionScheme.ACTIVE_ROW: - config["partition_by"] = ( - '{"field": "_airbyte_active_row", "data_type": "int64", ' '"range": {"start": 0, "end": 1, "interval": 1}}' - ) - elif partition_by == PartitionScheme.NOTHING: - pass - else: - config["partition_by"] = '{"field": "' + self.airbyte_emitted_at + '", "data_type": "timestamp", "granularity": "day"}' - elif self.destination_type == DestinationType.POSTGRES: - # see https://docs.getdbt.com/reference/resource-configs/postgres-configs - if partition_by == PartitionScheme.ACTIVE_ROW: - config["indexes"] = ( - "[{'columns':['_airbyte_active_row','" - + self.airbyte_unique_key - + "_scd','" - + self.airbyte_emitted_at - + "'],'type': 'btree'}]" - ) - elif partition_by == PartitionScheme.UNIQUE_KEY: - config["indexes"] = "[{'columns':['" + self.airbyte_unique_key + "'],'unique':True}]" - else: - config["indexes"] = "[{'columns':['" + self.airbyte_emitted_at + "'],'type':'btree'}]" - elif self.destination_type == DestinationType.REDSHIFT: - # see https://docs.getdbt.com/reference/resource-configs/redshift-configs - if partition_by == PartitionScheme.ACTIVE_ROW: - config["sort"] = f'["_airbyte_active_row", "{self.airbyte_unique_key}_scd", "{self.airbyte_emitted_at}"]' - elif partition_by == PartitionScheme.UNIQUE_KEY: - config["sort"] = f'["{self.airbyte_unique_key}", "{self.airbyte_emitted_at}"]' - elif partition_by == PartitionScheme.NOTHING: - pass - else: - config["sort"] = f'"{self.airbyte_emitted_at}"' - elif self.destination_type == DestinationType.SNOWFLAKE: - # see https://docs.getdbt.com/reference/resource-configs/snowflake-configs - if partition_by == PartitionScheme.ACTIVE_ROW: - config[ - "cluster_by" - ] = f'["_AIRBYTE_ACTIVE_ROW", "{self.airbyte_unique_key.upper()}_SCD", "{self.airbyte_emitted_at.upper()}"]' - elif partition_by == PartitionScheme.UNIQUE_KEY: - config["cluster_by"] = f'["{self.airbyte_unique_key.upper()}", "{self.airbyte_emitted_at.upper()}"]' - elif partition_by == PartitionScheme.NOTHING: - pass - else: - config["cluster_by"] = f'["{self.airbyte_emitted_at.upper()}"]' + config = ( + self.destination_type in self.partition_configurers.keys() + and self.partition_configurers[self.destination_type](self, partition_by) + or {} + ) + if unique_key: config["unique_key"] = f'"{unique_key}"' elif not self.parent: diff --git a/airbyte-integrations/bases/base-normalization/normalization/transform_config/transform.py b/airbyte-integrations/bases/base-normalization/normalization/transform_config/transform.py index 42e3838b8d7c3..429ed0f93a7a0 100644 --- a/airbyte-integrations/bases/base-normalization/normalization/transform_config/transform.py +++ b/airbyte-integrations/bases/base-normalization/normalization/transform_config/transform.py @@ -57,6 +57,7 @@ def transform(self, integration_type: DestinationType, config: Dict[str, Any]): DestinationType.ORACLE.value: self.transform_oracle, DestinationType.MSSQL.value: self.transform_mssql, DestinationType.CLICKHOUSE.value: self.transform_clickhouse, + DestinationType.DATABRICKS.value: self.transform_databricks, }[integration_type.value](config) # merge pre-populated base_profile with destination-specific configuration. @@ -305,6 +306,22 @@ def transform_clickhouse(config: Dict[str, Any]): dbt_config["port"] = config["tcp-port"] return dbt_config + @staticmethod + def transform_databricks(config: Dict[str, Any]): + print("transform_databricks") + # https://docs.getdbt.com/reference/warehouse-profiles/databricks-profile + dbt_config = { + "type": "databricks", + "host": config["databricks_server_hostname"], + "http_path": config["databricks_http_path"], + "token": config["databricks_personal_access_token"], + "threads": 8, + "connect_retries": 5, + "connect_timeout": 210, + "schema": config["database_schema"], + } + return dbt_config + @staticmethod def read_json_config(input_path: str): with open(input_path, "r") as file: diff --git a/airbyte-integrations/bases/base-normalization/unit_tests/resources/long_name_truncate_collisions_catalog_expected_databricks_names.json b/airbyte-integrations/bases/base-normalization/unit_tests/resources/long_name_truncate_collisions_catalog_expected_databricks_names.json new file mode 100644 index 0000000000000..bd9334521efb9 --- /dev/null +++ b/airbyte-integrations/bases/base-normalization/unit_tests/resources/long_name_truncate_collisions_catalog_expected_databricks_names.json @@ -0,0 +1,32 @@ +{ + "_airbyte_another.postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine.postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine": { + "file": "_airbyte_another_postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine", + "schema": "_airbyte_another", + "table": "postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine" + }, + "_airbyte_schema_test.postgres_has_a_64_characters_and_not_more_limit_to_table_names_but_other_destinations_are_fine.postgres_has_a_64_characters_and_not_more_limit_to_table_names_but_other_destinations_are_fine": { + "file": "postgres_has_a_64_characters_and_not_more_limit_to_table_names_but_other_destinations_are_fine", + "schema": "_airbyte_schema_test", + "table": "postgres_has_a_64_characters_and_not_more_limit_to_table_names_but_other_destinations_are_fine" + }, + "_airbyte_schema_test.postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine.postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine": { + "file": "_airbyte_schema_test_postgres_has_a_64_characters___to_table_names_but_other_destinations_are_fine_c41", + "schema": "_airbyte_schema_test", + "table": "postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine" + }, + "another.postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine.postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine": { + "file": "another_postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine", + "schema": "another", + "table": "postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine" + }, + "schema_test.postgres_has_a_64_characters_and_not_more_limit_to_table_names_but_other_destinations_are_fine.postgres_has_a_64_characters_and_not_more_limit_to_table_names_but_other_destinations_are_fine": { + "file": "postgres_has_a_64_characters_and_not_more_limit_to_table_names_but_other_destinations_are_fine", + "schema": "schema_test", + "table": "postgres_has_a_64_characters_and_not_more_limit_to_table_names_but_other_destinations_are_fine" + }, + "schema_test.postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine.postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine": { + "file": "schema_test_postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine", + "schema": "schema_test", + "table": "postgres_has_a_64_characters_limit_to_table_names_but_other_destinations_are_fine" + } +} \ No newline at end of file diff --git a/airbyte-integrations/bases/base-normalization/unit_tests/test_destination_name_transformer.py b/airbyte-integrations/bases/base-normalization/unit_tests/test_destination_name_transformer.py index 496e1f4211300..8e992748273b6 100644 --- a/airbyte-integrations/bases/base-normalization/unit_tests/test_destination_name_transformer.py +++ b/airbyte-integrations/bases/base-normalization/unit_tests/test_destination_name_transformer.py @@ -37,6 +37,7 @@ def before_tests(request): ("Hello World", "Redshift", True), ("Hello World", "MySQL", True), ("Hello World", "MSSQL", True), + ("Hello World", "Databricks", True), # Reserved Word for BigQuery and MySQL only ("Groups", "Postgres", False), ("Groups", "BigQuery", True), @@ -44,6 +45,7 @@ def before_tests(request): ("Groups", "Redshift", False), ("Groups", "MySQL", True), ("Groups", "MSSQL", False), + ("Groups", "Databricks", False), # Doesnt start with alpha or underscore ("100x200", "Postgres", True), ("100x200", "BigQuery", False), @@ -51,6 +53,7 @@ def before_tests(request): ("100x200", "Redshift", True), ("100x200", "MySQL", True), ("100x200", "MSSQL", True), + ("100x200", "Databricks", True), # Contains non alpha numeric ("post.wall", "Postgres", True), ("post.wall", "BigQuery", False), @@ -58,6 +61,7 @@ def before_tests(request): ("post.wall", "Redshift", True), ("post.wall", "MySQL", True), ("post.wall", "MSSQL", True), + ("post.wall", "Databricks", True), ], ) def test_needs_quote(input_str: str, destination_type: str, expected: bool): @@ -108,6 +112,7 @@ def test_transform_standard_naming(input_str: str, expected: str): ("Identifier Name", "Redshift", "{{ adapter.quote('identifier name') }}", "adapter.quote('identifier name')"), ("Identifier Name", "MySQL", "{{ adapter.quote('Identifier Name') }}", "adapter.quote('Identifier Name')"), ("Identifier Name", "MSSQL", "{{ adapter.quote('Identifier Name') }}", "adapter.quote('Identifier Name')"), + ("Identifier Name", "Databricks", "Identifier_Name", "'Identifier_Name'"), # Reserved Word for BigQuery and MySQL only ("Groups", "Postgres", "groups", "'groups'"), ("Groups", "BigQuery", "{{ adapter.quote('Groups') }}", "adapter.quote('Groups')"), @@ -115,6 +120,7 @@ def test_transform_standard_naming(input_str: str, expected: str): ("Groups", "Redshift", "groups", "'groups'"), ("Groups", "MySQL", "{{ adapter.quote('Groups') }}", "adapter.quote('Groups')"), ("Groups", "MSSQL", "groups", "'groups'"), + ("Groups", "Databricks", "Groups", "'Groups'"), ], ) def test_normalize_column_name(input_str: str, destination_type: str, expected: str, expected_in_jinja: str): diff --git a/airbyte-integrations/bases/base-normalization/unit_tests/test_table_name_registry.py b/airbyte-integrations/bases/base-normalization/unit_tests/test_table_name_registry.py index 27fbcb20235d3..dce029dd79814 100644 --- a/airbyte-integrations/bases/base-normalization/unit_tests/test_table_name_registry.py +++ b/airbyte-integrations/bases/base-normalization/unit_tests/test_table_name_registry.py @@ -61,7 +61,6 @@ def test_resolve_names(destination_type: DestinationType, catalog_file: str): """ integration_type = destination_type.value tables_registry = TableNameRegistry(destination_type) - catalog = read_json(f"resources/{catalog_file}.json") # process top level diff --git a/airbyte-integrations/bases/base-normalization/unit_tests/test_transform_config.py b/airbyte-integrations/bases/base-normalization/unit_tests/test_transform_config.py index 92c3c29504355..93d9e11971ca7 100644 --- a/airbyte-integrations/bases/base-normalization/unit_tests/test_transform_config.py +++ b/airbyte-integrations/bases/base-normalization/unit_tests/test_transform_config.py @@ -413,6 +413,28 @@ def test_transform_clickhouse(self): assert expected == actual assert extract_schema(actual) == "default" + def test_transform_databricks(self): + input = { + "databricks_server_hostname": "airbyte.io", + "database_schema": "default", + "databricks_http_path": "/sql/1.0/endpoints/aeb42c", + "databricks_personal_access_token": "aebd43", + } + actual = TransformConfig().transform_databricks(input) + expected = { + "type": "databricks", + "host": "airbyte.io", + "http_path": "/sql/1.0/endpoints/aeb42c", + "token": "aebd43", + "threads": 8, + "connect_retries": 5, + "connect_timeout": 210, + "schema": "default", + } + + assert expected == actual + assert extract_schema(actual) == "default" + # test that the full config is produced. this overlaps slightly with the transform_postgres test. def test_transform(self): input = { diff --git a/airbyte-integrations/connectors/destination-databricks/build.gradle b/airbyte-integrations/connectors/destination-databricks/build.gradle index e5bca02f4cd13..6bfab5879a7b8 100644 --- a/airbyte-integrations/connectors/destination-databricks/build.gradle +++ b/airbyte-integrations/connectors/destination-databricks/build.gradle @@ -29,7 +29,6 @@ dependencies { implementation project(':airbyte-config:config-models') implementation project(':airbyte-protocol:protocol-models') implementation project(':airbyte-integrations:bases:base-java') - implementation files(project(':airbyte-integrations:bases:base-java').airbyteDocker.outputs) implementation project(':airbyte-integrations:connectors:destination-jdbc') implementation project(':airbyte-integrations:connectors:destination-s3') implementation group: 'com.databricks', name: 'databricks-jdbc', version: '2.6.25' @@ -43,4 +42,8 @@ dependencies { integrationTestJavaImplementation project(':airbyte-integrations:bases:standard-destination-test') integrationTestJavaImplementation project(':airbyte-integrations:connectors:destination-databricks') + integrationTestJavaImplementation 'org.apache.commons:commons-lang3:3.11' + + implementation files(project(':airbyte-integrations:bases:base-java').airbyteDocker.outputs) + integrationTestJavaImplementation files(project(':airbyte-integrations:bases:base-normalization').airbyteDocker.outputs) } diff --git a/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksDestinationConfig.java b/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksDestinationConfig.java index f84024acfb49e..8c3c41c038c40 100644 --- a/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksDestinationConfig.java +++ b/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksDestinationConfig.java @@ -5,10 +5,9 @@ package io.airbyte.integrations.destination.databricks; import com.fasterxml.jackson.databind.JsonNode; -import com.fasterxml.jackson.databind.ObjectMapper; import com.google.common.base.Preconditions; import io.airbyte.integrations.destination.s3.S3DestinationConfig; -import io.airbyte.integrations.destination.s3.parquet.S3ParquetFormatConfig; +import io.airbyte.integrations.destination.s3.credential.S3AccessKeyCredentialConfig; /** * Currently only S3 is supported. So the data source config is always {@link S3DestinationConfig}. @@ -66,7 +65,6 @@ public static S3DestinationConfig getDataSource(final JsonNode dataSource) { .withAccessKeyCredential( dataSource.get("s3_access_key_id").asText(), dataSource.get("s3_secret_access_key").asText()) - .withFormatConfig(new S3ParquetFormatConfig(new ObjectMapper().createObjectNode())) .get(); } diff --git a/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksSqlOperations.java b/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksSqlOperations.java index 432b4b543d6ef..196323a5a4aac 100644 --- a/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksSqlOperations.java +++ b/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksSqlOperations.java @@ -8,6 +8,8 @@ import io.airbyte.integrations.base.JavaBaseConstants; import io.airbyte.integrations.destination.jdbc.JdbcSqlOperations; import io.airbyte.protocol.models.AirbyteRecordMessage; + +import java.sql.SQLException; import java.util.List; public class DatabricksSqlOperations extends JdbcSqlOperations { @@ -33,9 +35,20 @@ public String createTableQuery(final JdbcDatabase database, final String schemaN JavaBaseConstants.COLUMN_NAME_EMITTED_AT); } + @Override + public String copyTableQuery(final JdbcDatabase database, final String schemaName, final String srcTableName, final String dstTableName) { + return String.format("COPY INTO %s.%s FROM (SELECT * FROM %s.%s)", schemaName, dstTableName, schemaName, srcTableName); + } + + @Override + public void dropTableIfExists(final JdbcDatabase database, final String schemaName, final String tableName) throws SQLException { + database.execute(String.format("DROP TABLE IF EXISTS %s.%s;", schemaName, tableName)); + } + + @Override public void createSchemaIfNotExists(final JdbcDatabase database, final String schemaName) throws Exception { - database.execute(String.format("create database if not exists %s;", schemaName)); + database.execute(String.format("CREATE DATABASE IF NOT EXISTS %s;", schemaName)); } @Override diff --git a/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksStreamCopier.java b/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksStreamCopier.java index cc0b3a6196377..b6524c83739c9 100644 --- a/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksStreamCopier.java +++ b/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksStreamCopier.java @@ -5,113 +5,96 @@ package io.airbyte.integrations.destination.databricks; import com.amazonaws.services.s3.AmazonS3; -import com.fasterxml.jackson.databind.ObjectMapper; import io.airbyte.db.jdbc.JdbcDatabase; +import io.airbyte.integrations.base.JavaBaseConstants; import io.airbyte.integrations.destination.ExtendedNameTransformer; import io.airbyte.integrations.destination.jdbc.SqlOperations; -import io.airbyte.integrations.destination.jdbc.copy.StreamCopier; +import io.airbyte.integrations.destination.jdbc.copy.s3.S3CopyConfig; +import io.airbyte.integrations.destination.jdbc.copy.s3.S3StreamCopier; import io.airbyte.integrations.destination.s3.S3DestinationConfig; -import io.airbyte.integrations.destination.s3.parquet.S3ParquetFormatConfig; -import io.airbyte.integrations.destination.s3.parquet.S3ParquetWriter; +import io.airbyte.integrations.destination.s3.util.S3OutputPathHelper; import io.airbyte.integrations.destination.s3.writer.S3WriterFactory; -import io.airbyte.protocol.models.AirbyteRecordMessage; import io.airbyte.protocol.models.ConfiguredAirbyteStream; import io.airbyte.protocol.models.DestinationSyncMode; + +import java.sql.SQLException; import java.sql.Timestamp; -import java.util.UUID; +import java.util.HashSet; +import java.util.Set; import org.slf4j.Logger; import org.slf4j.LoggerFactory; /** - * This implementation is similar to {@link StreamCopier}. The difference is that this - * implementation creates Parquet staging files, instead of CSV ones. + * This implementation extends {@link S3StreamCopier}. It bypasses some steps, + * because databricks is able to load multiple staging files at once *

*

* It does the following operations: *
    - *
  • 1. Parquet writer writes data stream into staging parquet file in - * s3://bucket-name/bucket-path/staging-folder.
  • - *
  • 2. Create a tmp delta table based on the staging parquet file.
  • - *
  • 3. Create the destination delta table based on the tmp delta table schema in - * s3://bucket/stream-name.
  • - *
  • 4. Copy the staging parquet file into the destination delta table.
  • - *
  • 5. Delete the tmp delta table, and the staging parquet file.
  • + *
  • 1. {@link S3StreamCopier} writes CSV files in + * s3://bucket-name/bucket-path/schema-name/stream-name .
  • + *
  • 2. Create a destination table with location s3://bucket-name/bucket-path/delta_tables/schema-name/stream-name
  • + *
  • 4. Copy the staging CSV files into the destination delta table.
  • + *
  • 5. Let the {@link S3StreamCopier} handle the files deleting
  • *
*/ -public class DatabricksStreamCopier implements StreamCopier { +public class DatabricksStreamCopier extends S3StreamCopier { private static final Logger LOGGER = LoggerFactory.getLogger(DatabricksStreamCopier.class); - private static final ObjectMapper MAPPER = new ObjectMapper(); private final String schemaName; private final String streamName; private final DestinationSyncMode destinationSyncMode; - private final AmazonS3 s3Client; private final S3DestinationConfig s3Config; - private final boolean purgeStagingData; private final JdbcDatabase database; private final DatabricksSqlOperations sqlOperations; private final String tmpTableName; private final String destTableName; - private final S3ParquetWriter parquetWriter; - private final String tmpTableLocation; + private String tmpTableLocation; private final String destTableLocation; - private final String stagingFolder; + private final Set filenames = new HashSet<>(); public DatabricksStreamCopier(final String stagingFolder, - final String schema, - final ConfiguredAirbyteStream configuredStream, - final AmazonS3 s3Client, - final JdbcDatabase database, - final DatabricksDestinationConfig databricksConfig, - final ExtendedNameTransformer nameTransformer, - final SqlOperations sqlOperations, - final S3WriterFactory writerFactory, - final Timestamp uploadTime) + final String schema, + final ConfiguredAirbyteStream configuredStream, + final AmazonS3 s3Client, + final JdbcDatabase database, + final DatabricksDestinationConfig databricksConfig, + final ExtendedNameTransformer nameTransformer, + final SqlOperations sqlOperations, + final S3WriterFactory writerFactory, + final Timestamp uploadTime, + final S3CopyConfig copyConfig) throws Exception { + super(stagingFolder, schema, s3Client, database, copyConfig, nameTransformer, sqlOperations, configuredStream, + uploadTime, 10); this.schemaName = schema; this.streamName = configuredStream.getStream().getName(); this.destinationSyncMode = configuredStream.getDestinationSyncMode(); - this.s3Client = s3Client; this.s3Config = databricksConfig.getS3DestinationConfig(); - this.purgeStagingData = databricksConfig.isPurgeStagingData(); this.database = database; this.sqlOperations = (DatabricksSqlOperations) sqlOperations; this.tmpTableName = nameTransformer.getTmpTableName(streamName); - this.destTableName = nameTransformer.getIdentifier(streamName); - this.stagingFolder = stagingFolder; - - final S3DestinationConfig stagingS3Config = getStagingS3DestinationConfig(s3Config, stagingFolder); - this.parquetWriter = (S3ParquetWriter) writerFactory.create(stagingS3Config, s3Client, configuredStream, uploadTime); - - this.tmpTableLocation = String.format("s3://%s/%s", - s3Config.getBucketName(), parquetWriter.getOutputPrefix()); - this.destTableLocation = String.format("s3://%s/%s/%s/%s", - s3Config.getBucketName(), s3Config.getBucketPath(), databricksConfig.getDatabaseSchema(), streamName); - + this.destTableName = nameTransformer.getRawTableName(streamName); + this.tmpTableLocation = getFullS3Path(s3Config.getBucketName(), S3OutputPathHelper.getOutputPrefix(s3Config.getBucketPath(), configuredStream.getStream())); + this.destTableLocation = String.format("s3://%s/%s/%s/%s/%s", + s3Config.getBucketName(), s3Config.getBucketPath(), "delta_tables", schemaName, streamName); LOGGER.info("[Stream {}] Database schema: {}", streamName, schemaName); - LOGGER.info("[Stream {}] Parquet schema: {}", streamName, parquetWriter.getSchema()); - LOGGER.info("[Stream {}] Tmp table {} location: {}", streamName, tmpTableName, tmpTableLocation); LOGGER.info("[Stream {}] Data table {} location: {}", streamName, destTableName, destTableLocation); - - parquetWriter.initialize(); } @Override public String prepareStagingFile() { - return String.join("/", s3Config.getBucketPath(), stagingFolder); - } + final String file = super.prepareStagingFile(); + + this.filenames.add(file); - @Override - public void write(final UUID id, final AirbyteRecordMessage recordMessage, final String fileName) throws Exception { - parquetWriter.write(id, recordMessage); - } + LOGGER.info("[Stream {}] File {} location: {}", streamName, tmpTableName, + getFullS3Path(s3Config.getBucketName(), file)); - @Override - public void closeStagingUploader(final boolean hasFailed) throws Exception { - parquetWriter.close(hasFailed); + return file; } @Override @@ -122,20 +105,19 @@ public void createDestinationSchema() throws Exception { @Override public void createTemporaryTable() throws Exception { - LOGGER.info("[Stream {}] Creating tmp table {} from staging file: {}", streamName, tmpTableName, tmpTableLocation); - - sqlOperations.dropTableIfExists(database, schemaName, tmpTableName); - final String createTmpTable = String.format("CREATE TABLE %s.%s USING parquet LOCATION '%s';", schemaName, tmpTableName, tmpTableLocation); - LOGGER.info(createTmpTable); - database.execute(createTmpTable); + // The dest table is created directly based on the staging file. So no separate + // copying step is needed. } @Override public void copyStagingFileToTemporaryTable() { - // The tmp table is created directly based on the staging file. So no separate copying step is - // needed. + // The dest table is created directly based on the staging file. So no separate + // copying step is needed. } + /** + * Creates a destination raw table with the specified location + */ @Override public String createDestinationTable() throws Exception { LOGGER.info("[Stream {}] Creating destination table if it does not exist: {}", streamName, destTableName); @@ -146,70 +128,59 @@ public String createDestinationTable() throws Exception { : "CREATE TABLE IF NOT EXISTS"; final String createTable = String.format( - "%s %s.%s " + - "USING delta " + - "LOCATION '%s' " + + "%s %s.%s (%s STRING, %s STRING, %s TIMESTAMP) " + "COMMENT 'Created from stream %s' " + - "TBLPROPERTIES ('airbyte.destinationSyncMode' = '%s', %s) " + - // create the table based on the schema of the tmp table - "AS SELECT * FROM %s.%s LIMIT 0", + "TBLPROPERTIES ('airbyte.destinationSyncMode' = '%s', %s) ", createStatement, schemaName, destTableName, - destTableLocation, + JavaBaseConstants.COLUMN_NAME_AB_ID, + JavaBaseConstants.COLUMN_NAME_DATA, + JavaBaseConstants.COLUMN_NAME_EMITTED_AT, streamName, destinationSyncMode.value(), - String.join(", ", DatabricksConstants.DEFAULT_TBL_PROPERTIES), - schemaName, tmpTableName); + String.join(", ", DatabricksConstants.DEFAULT_TBL_PROPERTIES)); LOGGER.info(createTable); database.execute(createTable); return destTableName; } + /** + * Overrides the global generateMergeStatement in order to + * copy the staging data directly into the dest table + */ @Override public String generateMergeStatement(final String destTableName) { + if (filenames.size() == 0) { + LOGGER.info("[Stream: {}] No data to be written", streamName, destTableName); + // Need to by pass merge if empty stream + return "SELECT 0"; + } final String copyData = String.format( "COPY INTO %s.%s " + - "FROM '%s' " + - "FILEFORMAT = PARQUET " + - "PATTERN = '%s'", + "FROM (SELECT _c0 as %s, _c1 as %s, _c2::TIMESTAMP as %s FROM '%s') " + + "FILEFORMAT = CSV " + + "FILES = (%s) " + + "FORMAT_OPTIONS('quote' = '\"', 'escape' = '\"', 'enforceSchema' = 'false', 'multiLine' = 'true', 'header' = 'false', 'unescapedQuoteHandling' = 'STOP_AT_CLOSING_QUOTE')", schemaName, destTableName, - tmpTableLocation, - parquetWriter.getOutputFilename()); + JavaBaseConstants.COLUMN_NAME_AB_ID, + JavaBaseConstants.COLUMN_NAME_DATA, + JavaBaseConstants.COLUMN_NAME_EMITTED_AT, + this.tmpTableLocation, + String.join(",", filenames.stream().map(elem -> { + final String[] uriParts = elem.split("/"); + return String.format("'%s'", uriParts[uriParts.length - 1]); + }).toList())); LOGGER.info(copyData); return copyData; } - @Override - public void removeFileAndDropTmpTable() throws Exception { - if (purgeStagingData) { - LOGGER.info("[Stream {}] Deleting tmp table: {}", streamName, tmpTableName); - sqlOperations.dropTableIfExists(database, schemaName, tmpTableName); - - LOGGER.info("[Stream {}] Deleting staging file: {}", streamName, parquetWriter.getOutputFilePath()); - s3Client.deleteObject(s3Config.getBucketName(), parquetWriter.getOutputFilePath()); - } - } - - @Override - public void closeNonCurrentStagingFileWriters() throws Exception { - parquetWriter.close(false); - } - - @Override - public String getCurrentFile() { - return ""; + public void copyS3CsvFileIntoTable(JdbcDatabase database, + String s3FileLocation, + String schema, + String tableName, + S3DestinationConfig s3Config) throws SQLException { + // Needed to implement it for S3StreamCopier + // Everything is handled in generateMergeStatement, since we just need to do one big copy with all the files } - - /** - * The staging data location is s3:////. This method - * creates an {@link S3DestinationConfig} whose bucket path is /. - */ - static S3DestinationConfig getStagingS3DestinationConfig(final S3DestinationConfig config, final String stagingFolder) { - return S3DestinationConfig.create(config) - .withBucketPath(String.join("/", config.getBucketPath(), stagingFolder)) - .withFormatConfig(new S3ParquetFormatConfig(MAPPER.createObjectNode())) - .get(); - } - } diff --git a/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksStreamCopierFactory.java b/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksStreamCopierFactory.java index 684c1aeef56e6..994563934a7ba 100644 --- a/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksStreamCopierFactory.java +++ b/airbyte-integrations/connectors/destination-databricks/src/main/java/io/airbyte/integrations/destination/databricks/DatabricksStreamCopierFactory.java @@ -10,6 +10,7 @@ import io.airbyte.integrations.destination.jdbc.SqlOperations; import io.airbyte.integrations.destination.jdbc.copy.StreamCopier; import io.airbyte.integrations.destination.jdbc.copy.StreamCopierFactory; +import io.airbyte.integrations.destination.jdbc.copy.s3.S3CopyConfig; import io.airbyte.integrations.destination.s3.writer.ProductionWriterFactory; import io.airbyte.integrations.destination.s3.writer.S3WriterFactory; import io.airbyte.protocol.models.AirbyteStream; @@ -30,11 +31,12 @@ public StreamCopier create(final String configuredSchema, final AirbyteStream stream = configuredStream.getStream(); final String schema = StreamCopierFactory.getSchema(stream.getNamespace(), configuredSchema, nameTransformer); final AmazonS3 s3Client = databricksConfig.getS3DestinationConfig().getS3Client(); + final S3CopyConfig s3Config = new S3CopyConfig(databricksConfig.isPurgeStagingData(), databricksConfig.getS3DestinationConfig()); final S3WriterFactory writerFactory = new ProductionWriterFactory(); final Timestamp uploadTimestamp = new Timestamp(System.currentTimeMillis()); return new DatabricksStreamCopier(stagingFolder, schema, configuredStream, s3Client, database, - databricksConfig, nameTransformer, sqlOperations, writerFactory, uploadTimestamp); + databricksConfig, nameTransformer, sqlOperations, writerFactory, uploadTimestamp, s3Config); } catch (final Exception e) { throw new RuntimeException(e); } diff --git a/airbyte-integrations/connectors/destination-databricks/src/main/resources/spec.json b/airbyte-integrations/connectors/destination-databricks/src/main/resources/spec.json index c7fc3a2593937..3f14d4512b052 100644 --- a/airbyte-integrations/connectors/destination-databricks/src/main/resources/spec.json +++ b/airbyte-integrations/connectors/destination-databricks/src/main/resources/spec.json @@ -1,9 +1,9 @@ { "documentationUrl": "https://docs.airbyte.io/integrations/destinations/databricks", "supportsIncremental": true, - "supportsNormalization": false, - "supportsDBT": false, - "supported_destination_sync_modes": ["overwrite", "append"], + "supportsNormalization": true, + "supportsDBT": true, + "supported_destination_sync_modes": ["overwrite", "append", "append_dedup"], "connectionSpecification": { "$schema": "http://json-schema.org/draft-07/schema#", "title": "Databricks Lakehouse Destination Spec", diff --git a/airbyte-integrations/connectors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/csv/S3CsvWriter.java b/airbyte-integrations/connectors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/csv/S3CsvWriter.java index cce2da71e33f1..68fed3ae8f79a 100644 --- a/airbyte-integrations/connectors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/csv/S3CsvWriter.java +++ b/airbyte-integrations/connectors/destination-s3/src/main/java/io/airbyte/integrations/destination/s3/csv/S3CsvWriter.java @@ -36,6 +36,7 @@ public class S3CsvWriter extends BaseS3Writer implements DestinationFileWriter { private final CSVPrinter csvPrinter; private final String objectKey; private final String gcsFileLocation; + private final String outputFilename; private S3CsvWriter(final S3DestinationConfig config, final AmazonS3 s3Client, @@ -52,7 +53,7 @@ private S3CsvWriter(final S3DestinationConfig config, this.csvSheetGenerator = csvSheetGenerator; final String fileSuffix = "_" + UUID.randomUUID(); - final String outputFilename = BaseS3Writer.getOutputFilename(uploadTimestamp, fileSuffix, S3Format.CSV); + this.outputFilename = BaseS3Writer.getOutputFilename(uploadTimestamp, fileSuffix, S3Format.CSV); this.objectKey = String.join("/", outputPrefix, outputFilename); LOGGER.info("Full S3 path for stream '{}': s3://{}/{}", stream.getName(), config.getBucketName(), @@ -72,6 +73,11 @@ private S3CsvWriter(final S3DestinationConfig config, this.csvPrinter = new CSVPrinter(new PrintWriter(outputStream, true, StandardCharsets.UTF_8), csvSettings); } + + public String getOutputFilename() { + return outputFilename; + } + public static class Builder { private final S3DestinationConfig config; diff --git a/airbyte-webapp/src/components/EntityTable/utils.tsx b/airbyte-webapp/src/components/EntityTable/utils.tsx index bc0fb11611c99..fb6509d78b349 100644 --- a/airbyte-webapp/src/components/EntityTable/utils.tsx +++ b/airbyte-webapp/src/components/EntityTable/utils.tsx @@ -18,8 +18,8 @@ export function getEntityTableData< const connectType = type === "source" ? "destination" : "source"; const mappedEntities = entities.map((entityItem) => { - const entitySoDId = entityItem[`${type}Id` as keyof SoD] as unknown as string; - const entitySoDName = entityItem[`${type}Name` as keyof SoD] as unknown as string; + const entitySoDId = (entityItem[`${type}Id` as keyof SoD] as unknown) as string; + const entitySoDName = (entityItem[`${type}Name` as keyof SoD] as unknown) as string; const entityConnections = connections.filter( (connectionItem) => connectionItem[`${type}Id` as "sourceId" | "destinationId"] === entitySoDId ); diff --git a/airbyte-webapp/src/config/ConfigServiceProvider.tsx b/airbyte-webapp/src/config/ConfigServiceProvider.tsx index f1bb41d974623..450a94cf272fa 100644 --- a/airbyte-webapp/src/config/ConfigServiceProvider.tsx +++ b/airbyte-webapp/src/config/ConfigServiceProvider.tsx @@ -19,7 +19,7 @@ export function useConfig(): T { throw new Error("useConfig must be used within a ConfigProvider"); } - return useMemo(() => configService.config as unknown as T, [configService.config]); + return useMemo(() => (configService.config as unknown) as T, [configService.config]); } const ConfigServiceInner: React.FC<{ diff --git a/airbyte-webapp/src/core/domain/catalog/fieldUtil.ts b/airbyte-webapp/src/core/domain/catalog/fieldUtil.ts index 1cc35828884a1..7807093e4c645 100644 --- a/airbyte-webapp/src/core/domain/catalog/fieldUtil.ts +++ b/airbyte-webapp/src/core/domain/catalog/fieldUtil.ts @@ -60,7 +60,8 @@ function getDestinationNamespace(opt: NamespaceOptions | NamespaceOptionsCustomF case NamespaceDefinitionType.destination: return destinationSetting; case NamespaceDefinitionType.customformat: - default: // Default is never hit, but typescript prefers it declared + default: + // Default is never hit, but typescript prefers it declared if (!opt.sourceNamespace?.trim()) { return destinationSetting; } diff --git a/airbyte-webapp/src/core/jsonSchema/types.ts b/airbyte-webapp/src/core/jsonSchema/types.ts index b2bbedba53284..639564bdc3348 100644 --- a/airbyte-webapp/src/core/jsonSchema/types.ts +++ b/airbyte-webapp/src/core/jsonSchema/types.ts @@ -23,6 +23,7 @@ export type AirbyteJSONSchema = { : JSONSchema7[Property] extends JSONSchema7Definition | JSONSchema7Definition[] ? AirbyteJSONSchemaDefinition | AirbyteJSONSchemaDefinition[] : JSONSchema7[Property]; -} & AirbyteJSONSchemaProps; +} & + AirbyteJSONSchemaProps; export type AirbyteJSONSchemaDefinition = AirbyteJSONSchema | boolean; diff --git a/airbyte-webapp/src/core/request/AirbyteClient.ts b/airbyte-webapp/src/core/request/AirbyteClient.ts index edc01276b56ec..41d6f98aa9a97 100644 --- a/airbyte-webapp/src/core/request/AirbyteClient.ts +++ b/airbyte-webapp/src/core/request/AirbyteClient.ts @@ -456,8 +456,7 @@ export interface FieldTransform { updateFieldSchema?: FieldSchemaUpdate; } -export type StreamTransformTransformType = - typeof StreamTransformTransformType[keyof typeof StreamTransformTransformType]; +export type StreamTransformTransformType = typeof StreamTransformTransformType[keyof typeof StreamTransformTransformType]; // eslint-disable-next-line @typescript-eslint/no-redeclare export const StreamTransformTransformType = { diff --git a/airbyte-webapp/src/hooks/services/Analytics/useAnalyticsService.tsx b/airbyte-webapp/src/hooks/services/Analytics/useAnalyticsService.tsx index 4268c96f439a9..51660e05d622c 100644 --- a/airbyte-webapp/src/hooks/services/Analytics/useAnalyticsService.tsx +++ b/airbyte-webapp/src/hooks/services/Analytics/useAnalyticsService.tsx @@ -26,10 +26,10 @@ const AnalyticsServiceProvider = ({ }) => { const [analyticsContext, { set, setAll, remove }] = useMap(initialContext); - const analyticsService: AnalyticsService = useMemo( - () => new AnalyticsService(analyticsContext, version), - [version, analyticsContext] - ); + const analyticsService: AnalyticsService = useMemo(() => new AnalyticsService(analyticsContext, version), [ + version, + analyticsContext, + ]); const handleAddContextProps = (props: AnalyticsContext) => { Object.entries(props).forEach((value) => set(...value)); diff --git a/airbyte-webapp/src/hooks/services/ConfirmationModal/ConfirmationModalService.tsx b/airbyte-webapp/src/hooks/services/ConfirmationModal/ConfirmationModalService.tsx index 52b3a50a9b016..94aed9d127e61 100644 --- a/airbyte-webapp/src/hooks/services/ConfirmationModal/ConfirmationModalService.tsx +++ b/airbyte-webapp/src/hooks/services/ConfirmationModal/ConfirmationModalService.tsx @@ -9,7 +9,9 @@ import { ConfirmationModalOptions, ConfirmationModalServiceApi, ConfirmationModa const ConfirmationModalServiceContext = React.createContext(undefined); -export const useConfirmationModalService: (confirmationModal?: ConfirmationModalOptions) => { +export const useConfirmationModalService: ( + confirmationModal?: ConfirmationModalOptions +) => { openConfirmationModal: (confirmationModal: ConfirmationModalOptions) => void; closeConfirmationModal: () => void; } = (confirmationModal) => { diff --git a/airbyte-webapp/src/hooks/services/Feature/FeatureService.test.tsx b/airbyte-webapp/src/hooks/services/Feature/FeatureService.test.tsx index 87aa7a6fba199..d4b6b6f5068cd 100644 --- a/airbyte-webapp/src/hooks/services/Feature/FeatureService.test.tsx +++ b/airbyte-webapp/src/hooks/services/Feature/FeatureService.test.tsx @@ -17,9 +17,9 @@ const wrapper: React.FC = ({ children }) => ( {children} diff --git a/airbyte-webapp/src/hooks/services/Feature/FeatureService.tsx b/airbyte-webapp/src/hooks/services/Feature/FeatureService.tsx index cbf53962af248..2e5fff426f50e 100644 --- a/airbyte-webapp/src/hooks/services/Feature/FeatureService.tsx +++ b/airbyte-webapp/src/hooks/services/Feature/FeatureService.tsx @@ -23,10 +23,10 @@ export const FeatureService = ({ children }: { children: React.ReactNode }) => { }; }, []); - const features = useMemo( - () => [...instanceWideFeatures, ...additionFeatures], - [instanceWideFeatures, additionFeatures] - ); + const features = useMemo(() => [...instanceWideFeatures, ...additionFeatures], [ + instanceWideFeatures, + additionFeatures, + ]); const featureService = useMemo( () => ({ diff --git a/airbyte-webapp/src/hooks/services/useConnectionHook.tsx b/airbyte-webapp/src/hooks/services/useConnectionHook.tsx index d26f656bd49e9..cd4f5e2a1977a 100644 --- a/airbyte-webapp/src/hooks/services/useConnectionHook.tsx +++ b/airbyte-webapp/src/hooks/services/useConnectionHook.tsx @@ -57,10 +57,10 @@ export interface ListConnection { function useWebConnectionService() { const config = useConfig(); const middlewares = useDefaultRequestMiddlewares(); - return useInitService( - () => new WebBackendConnectionService(config.apiUrl, middlewares), - [config.apiUrl, middlewares] - ); + return useInitService(() => new WebBackendConnectionService(config.apiUrl, middlewares), [ + config.apiUrl, + middlewares, + ]); } function useConnectionService() { diff --git a/airbyte-webapp/src/hooks/services/useConnector.tsx b/airbyte-webapp/src/hooks/services/useConnector.tsx index 2f67754028d7c..02a39779ccee3 100644 --- a/airbyte-webapp/src/hooks/services/useConnector.tsx +++ b/airbyte-webapp/src/hooks/services/useConnector.tsx @@ -35,10 +35,9 @@ const useConnector = (): ConnectorService => { const newSourceDefinitions = useMemo(() => sourceDefinitions.filter(Connector.hasNewerVersion), [sourceDefinitions]); - const newDestinationDefinitions = useMemo( - () => destinationDefinitions.filter(Connector.hasNewerVersion), - [destinationDefinitions] - ); + const newDestinationDefinitions = useMemo(() => destinationDefinitions.filter(Connector.hasNewerVersion), [ + destinationDefinitions, + ]); const updateAllSourceVersions = async () => { await Promise.all( diff --git a/airbyte-webapp/src/hooks/services/useConnectorAuth.tsx b/airbyte-webapp/src/hooks/services/useConnectorAuth.tsx index 239dcef4516b8..5649c3a615def 100644 --- a/airbyte-webapp/src/hooks/services/useConnectorAuth.tsx +++ b/airbyte-webapp/src/hooks/services/useConnectorAuth.tsx @@ -51,14 +51,14 @@ export function useConnectorAuth(): { // TODO: move to separate initFacade and use refs instead const requestAuthMiddleware = useDefaultRequestMiddlewares(); - const sourceAuthService = useMemo( - () => new SourceAuthService(apiUrl, requestAuthMiddleware), - [apiUrl, requestAuthMiddleware] - ); - const destinationAuthService = useMemo( - () => new DestinationAuthService(apiUrl, requestAuthMiddleware), - [apiUrl, requestAuthMiddleware] - ); + const sourceAuthService = useMemo(() => new SourceAuthService(apiUrl, requestAuthMiddleware), [ + apiUrl, + requestAuthMiddleware, + ]); + const destinationAuthService = useMemo(() => new DestinationAuthService(apiUrl, requestAuthMiddleware), [ + apiUrl, + requestAuthMiddleware, + ]); return { getConsentUrl: async ( diff --git a/airbyte-webapp/src/hooks/useTypesafeReducer.ts b/airbyte-webapp/src/hooks/useTypesafeReducer.ts index 9cd34532496d5..9cbc630d28248 100644 --- a/airbyte-webapp/src/hooks/useTypesafeReducer.ts +++ b/airbyte-webapp/src/hooks/useTypesafeReducer.ts @@ -13,7 +13,7 @@ function useTypesafeReducer(initialState) - .handleAction(actions.authInited, (state): AuthServiceState => { - return { - ...state, - inited: true, - }; - }) - .handleAction(actions.loggedIn, (state, action): AuthServiceState => { - return { - ...state, - currentUser: action.payload.user, - emailVerified: action.payload.emailVerified, - inited: true, - loading: false, - loggedOut: false, - }; - }) - .handleAction(actions.emailVerified, (state, action): AuthServiceState => { - return { - ...state, - emailVerified: action.payload, - }; - }) - .handleAction(actions.loggedOut, (state): AuthServiceState => { - return { - ...state, - currentUser: null, - emailVerified: false, - loggedOut: true, - }; - }); + .handleAction( + actions.authInited, + (state): AuthServiceState => { + return { + ...state, + inited: true, + }; + } + ) + .handleAction( + actions.loggedIn, + (state, action): AuthServiceState => { + return { + ...state, + currentUser: action.payload.user, + emailVerified: action.payload.emailVerified, + inited: true, + loading: false, + loggedOut: false, + }; + } + ) + .handleAction( + actions.emailVerified, + (state, action): AuthServiceState => { + return { + ...state, + emailVerified: action.payload, + }; + } + ) + .handleAction( + actions.loggedOut, + (state): AuthServiceState => { + return { + ...state, + currentUser: null, + emailVerified: false, + loggedOut: true, + }; + } + ); diff --git a/airbyte-webapp/src/packages/cloud/views/credits/CreditsPage/components/UsagePerConnectionTable.tsx b/airbyte-webapp/src/packages/cloud/views/credits/CreditsPage/components/UsagePerConnectionTable.tsx index 6948103aa0f32..8040295bd704c 100644 --- a/airbyte-webapp/src/packages/cloud/views/credits/CreditsPage/components/UsagePerConnectionTable.tsx +++ b/airbyte-webapp/src/packages/cloud/views/credits/CreditsPage/components/UsagePerConnectionTable.tsx @@ -100,10 +100,10 @@ const UsagePerConnectionTable: React.FC = ({ credi [sortBy, sortOrder] ); - const sortingData = React.useMemo( - () => creditConsumptionWithPercent.sort(sortData), - [sortData, creditConsumptionWithPercent] - ); + const sortingData = React.useMemo(() => creditConsumptionWithPercent.sort(sortData), [ + sortData, + creditConsumptionWithPercent, + ]); const columns = React.useMemo( () => [ diff --git a/airbyte-webapp/src/pages/ConnectionPage/pages/ConnectionItemPage/components/ReplicationView.tsx b/airbyte-webapp/src/pages/ConnectionPage/pages/ConnectionItemPage/components/ReplicationView.tsx index 6eb75031cc3f7..4f04c0cd0663c 100644 --- a/airbyte-webapp/src/pages/ConnectionPage/pages/ConnectionItemPage/components/ReplicationView.tsx +++ b/airbyte-webapp/src/pages/ConnectionPage/pages/ConnectionItemPage/components/ReplicationView.tsx @@ -62,10 +62,10 @@ export const ReplicationView: React.FC = ({ onAfterSaveSch const { connection: initialConnection, refreshConnectionCatalog } = useConnectionLoad(connectionId); - const [{ value: connectionWithRefreshCatalog, loading: isRefreshingCatalog }, refreshCatalog] = useAsyncFn( - refreshConnectionCatalog, - [connectionId] - ); + const [ + { value: connectionWithRefreshCatalog, loading: isRefreshingCatalog }, + refreshCatalog, + ] = useAsyncFn(refreshConnectionCatalog, [connectionId]); const connection = useMemo(() => { if (activeUpdatingSchemaMode && connectionWithRefreshCatalog) { diff --git a/airbyte-webapp/src/pages/ConnectionPage/pages/CreationFormPage/CreationFormPage.tsx b/airbyte-webapp/src/pages/ConnectionPage/pages/CreationFormPage/CreationFormPage.tsx index 31c3100da8b00..d22cace1a6420 100644 --- a/airbyte-webapp/src/pages/ConnectionPage/pages/CreationFormPage/CreationFormPage.tsx +++ b/airbyte-webapp/src/pages/ConnectionPage/pages/CreationFormPage/CreationFormPage.tsx @@ -213,13 +213,11 @@ export const CreationFormPage: React.FC = () => { }, ]; - const titleId: string = ( - { - [EntityStepsTypes.CONNECTION]: "connection.newConnectionTitle", - [EntityStepsTypes.DESTINATION]: "destinations.newDestinationTitle", - [EntityStepsTypes.SOURCE]: "sources.newSourceTitle", - } as Record - )[type]; + const titleId: string = ({ + [EntityStepsTypes.CONNECTION]: "connection.newConnectionTitle", + [EntityStepsTypes.DESTINATION]: "destinations.newDestinationTitle", + [EntityStepsTypes.SOURCE]: "sources.newSourceTitle", + } as Record)[type]; return ( <> diff --git a/airbyte-webapp/src/pages/OnboardingPage/components/DestinationStep.tsx b/airbyte-webapp/src/pages/OnboardingPage/components/DestinationStep.tsx index c20f3c29160f9..f2777795b3845 100644 --- a/airbyte-webapp/src/pages/OnboardingPage/components/DestinationStep.tsx +++ b/airbyte-webapp/src/pages/OnboardingPage/components/DestinationStep.tsx @@ -18,8 +18,9 @@ interface Props { const DestinationStep: React.FC = ({ onNextStep, onSuccess }) => { const [destinationDefinitionId, setDestinationDefinitionId] = useState(null); const { setDocumentationUrl, setDocumentationPanelOpen } = useDocumentationPanelContext(); - const { data: destinationDefinitionSpecification, isLoading } = - useGetDestinationDefinitionSpecificationAsync(destinationDefinitionId); + const { data: destinationDefinitionSpecification, isLoading } = useGetDestinationDefinitionSpecificationAsync( + destinationDefinitionId + ); const { destinationDefinitions } = useDestinationDefinitionList(); const [successRequest, setSuccessRequest] = useState(false); const [error, setError] = useState<{ diff --git a/airbyte-webapp/src/pages/OnboardingPage/components/SourceStep.tsx b/airbyte-webapp/src/pages/OnboardingPage/components/SourceStep.tsx index d8afc4252e04e..fcb9ac2c0a378 100644 --- a/airbyte-webapp/src/pages/OnboardingPage/components/SourceStep.tsx +++ b/airbyte-webapp/src/pages/OnboardingPage/components/SourceStep.tsx @@ -33,8 +33,9 @@ const SourceStep: React.FC = ({ onNextStep, onSuccess }) => { const getSourceDefinitionById = (id: string) => sourceDefinitions.find((item) => item.sourceDefinitionId === id); - const { data: sourceDefinitionSpecification, isLoading } = - useGetSourceDefinitionSpecificationAsync(sourceDefinitionId); + const { data: sourceDefinitionSpecification, isLoading } = useGetSourceDefinitionSpecificationAsync( + sourceDefinitionId + ); useEffect(() => { return () => { diff --git a/airbyte-webapp/src/services/connector/DestinationDefinitionService.ts b/airbyte-webapp/src/services/connector/DestinationDefinitionService.ts index 27cfb708d107d..d5612f59102a2 100644 --- a/airbyte-webapp/src/services/connector/DestinationDefinitionService.ts +++ b/airbyte-webapp/src/services/connector/DestinationDefinitionService.ts @@ -21,10 +21,10 @@ function useGetDestinationDefinitionService(): DestinationDefinitionService { const requestAuthMiddleware = useDefaultRequestMiddlewares(); - return useInitService( - () => new DestinationDefinitionService(apiUrl, requestAuthMiddleware), - [apiUrl, requestAuthMiddleware] - ); + return useInitService(() => new DestinationDefinitionService(apiUrl, requestAuthMiddleware), [ + apiUrl, + requestAuthMiddleware, + ]); } export interface DestinationDefinitionReadWithLatestTag extends DestinationDefinitionRead { diff --git a/airbyte-webapp/src/services/connector/DestinationDefinitionSpecificationService.tsx b/airbyte-webapp/src/services/connector/DestinationDefinitionSpecificationService.tsx index 83a27001fa037..dd68788b4616e 100644 --- a/airbyte-webapp/src/services/connector/DestinationDefinitionSpecificationService.tsx +++ b/airbyte-webapp/src/services/connector/DestinationDefinitionSpecificationService.tsx @@ -20,10 +20,10 @@ function useGetService() { const { apiUrl } = useConfig(); const requestAuthMiddleware = useDefaultRequestMiddlewares(); - return useInitService( - () => new DestinationDefinitionSpecificationService(apiUrl, requestAuthMiddleware), - [apiUrl, requestAuthMiddleware] - ); + return useInitService(() => new DestinationDefinitionSpecificationService(apiUrl, requestAuthMiddleware), [ + apiUrl, + requestAuthMiddleware, + ]); } export const useGetDestinationDefinitionSpecification = (id: string): DestinationDefinitionSpecificationRead => { diff --git a/airbyte-webapp/src/services/connector/SourceDefinitionService.ts b/airbyte-webapp/src/services/connector/SourceDefinitionService.ts index ec0f1c6f72ca8..afff2a0dd6319 100644 --- a/airbyte-webapp/src/services/connector/SourceDefinitionService.ts +++ b/airbyte-webapp/src/services/connector/SourceDefinitionService.ts @@ -21,10 +21,10 @@ function useGetSourceDefinitionService(): SourceDefinitionService { const requestAuthMiddleware = useDefaultRequestMiddlewares(); - return useInitService( - () => new SourceDefinitionService(apiUrl, requestAuthMiddleware), - [apiUrl, requestAuthMiddleware] - ); + return useInitService(() => new SourceDefinitionService(apiUrl, requestAuthMiddleware), [ + apiUrl, + requestAuthMiddleware, + ]); } export interface SourceDefinitionReadWithLatestTag extends SourceDefinitionRead { diff --git a/airbyte-webapp/src/services/connector/SourceDefinitionSpecificationService.tsx b/airbyte-webapp/src/services/connector/SourceDefinitionSpecificationService.tsx index ea31fe2e0f67a..2f93b98a926b6 100644 --- a/airbyte-webapp/src/services/connector/SourceDefinitionSpecificationService.tsx +++ b/airbyte-webapp/src/services/connector/SourceDefinitionSpecificationService.tsx @@ -19,10 +19,10 @@ function useGetService(): SourceDefinitionSpecificationService { const { apiUrl } = useConfig(); const requestAuthMiddleware = useDefaultRequestMiddlewares(); - return useInitService( - () => new SourceDefinitionSpecificationService(apiUrl, requestAuthMiddleware), - [apiUrl, requestAuthMiddleware] - ); + return useInitService(() => new SourceDefinitionSpecificationService(apiUrl, requestAuthMiddleware), [ + apiUrl, + requestAuthMiddleware, + ]); } export const useGetSourceDefinitionSpecification = (id: string) => { diff --git a/airbyte-webapp/src/views/Connection/CatalogTree/CatalogSection.tsx b/airbyte-webapp/src/views/Connection/CatalogTree/CatalogSection.tsx index 80f87e85eec16..a48af7fb2bce1 100644 --- a/airbyte-webapp/src/views/Connection/CatalogTree/CatalogSection.tsx +++ b/airbyte-webapp/src/views/Connection/CatalogTree/CatalogSection.tsx @@ -52,10 +52,9 @@ const CatalogSectionInner: React.FC = ({ [updateStream, streamNode] ); - const onSelectSyncMode = useCallback( - (data: DropDownRow.IDataItem) => updateStreamWithConfig(data.value), - [updateStreamWithConfig] - ); + const onSelectSyncMode = useCallback((data: DropDownRow.IDataItem) => updateStreamWithConfig(data.value), [ + updateStreamWithConfig, + ]); const onSelectStream = useCallback( () => @@ -80,15 +79,13 @@ const CatalogSectionInner: React.FC = ({ [config?.primaryKey, updateStreamWithConfig] ); - const onCursorSelect = useCallback( - (cursorField: string[]) => updateStreamWithConfig({ cursorField }), - [updateStreamWithConfig] - ); + const onCursorSelect = useCallback((cursorField: string[]) => updateStreamWithConfig({ cursorField }), [ + updateStreamWithConfig, + ]); - const onPkUpdate = useCallback( - (newPrimaryKey: string[][]) => updateStreamWithConfig({ primaryKey: newPrimaryKey }), - [updateStreamWithConfig] - ); + const onPkUpdate = useCallback((newPrimaryKey: string[][]) => updateStreamWithConfig({ primaryKey: newPrimaryKey }), [ + updateStreamWithConfig, + ]); const pkRequired = config?.destinationSyncMode === DestinationSyncMode.append_dedup; const cursorRequired = config?.syncMode === SyncMode.incremental; @@ -120,10 +117,9 @@ const CatalogSectionInner: React.FC = ({ const flattenedFields = useMemo(() => flatten(fields), [fields]); - const primitiveFields = useMemo( - () => flattenedFields.filter(SyncSchemaFieldObject.isPrimitive), - [flattenedFields] - ); + const primitiveFields = useMemo(() => flattenedFields.filter(SyncSchemaFieldObject.isPrimitive), [ + flattenedFields, + ]); const configErrors = getIn(errors, `schema.streams[${streamNode.id}].config`); const hasError = configErrors && Object.keys(configErrors).length > 0; diff --git a/airbyte-webapp/src/views/Connection/CatalogTree/components/BulkHeader.tsx b/airbyte-webapp/src/views/Connection/CatalogTree/components/BulkHeader.tsx index 8d4612019bda6..bd61094777f29 100644 --- a/airbyte-webapp/src/views/Connection/CatalogTree/components/BulkHeader.tsx +++ b/airbyte-webapp/src/views/Connection/CatalogTree/components/BulkHeader.tsx @@ -73,10 +73,9 @@ export const BulkHeader: React.FC = ({ destinationSupportedSync [selectedBatchNodes, destinationSupportedSyncModes] ); - const primitiveFields: SyncSchemaField[] = useMemo( - () => calculateSharedFields(selectedBatchNodes), - [selectedBatchNodes] - ); + const primitiveFields: SyncSchemaField[] = useMemo(() => calculateSharedFields(selectedBatchNodes), [ + selectedBatchNodes, + ]); if (!isActive) { return null; diff --git a/airbyte-webapp/src/views/Connection/ConnectionForm/ConnectionForm.tsx b/airbyte-webapp/src/views/Connection/ConnectionForm/ConnectionForm.tsx index 0c11955e35992..baa361763704c 100644 --- a/airbyte-webapp/src/views/Connection/ConnectionForm/ConnectionForm.tsx +++ b/airbyte-webapp/src/views/Connection/ConnectionForm/ConnectionForm.tsx @@ -170,9 +170,9 @@ const ConnectionForm: React.FC = ({ const onFormSubmit = useCallback( async (values: FormikConnectionFormValues, formikHelpers: FormikHelpers) => { - const formValues: ConnectionFormValues = connectionValidationSchema.cast(values, { + const formValues: ConnectionFormValues = (connectionValidationSchema.cast(values, { context: { isRequest: true }, - }) as unknown as ConnectionFormValues; + }) as unknown) as ConnectionFormValues; formValues.operations = mapFormPropsToOperation(values, connection.operations, workspace.workspaceId); diff --git a/airbyte-webapp/src/views/Connection/ConnectionForm/calculateInitialCatalog.test.ts b/airbyte-webapp/src/views/Connection/ConnectionForm/calculateInitialCatalog.test.ts index 604749ec5c6e4..789c1db323d46 100644 --- a/airbyte-webapp/src/views/Connection/ConnectionForm/calculateInitialCatalog.test.ts +++ b/airbyte-webapp/src/views/Connection/ConnectionForm/calculateInitialCatalog.test.ts @@ -28,9 +28,9 @@ describe("calculateInitialCatalog", () => { const { id, ...restProps } = mockSyncSchemaStream; const values = calculateInitialCatalog( - { + ({ streams: [restProps], - } as unknown as SyncSchema, + } as unknown) as SyncSchema, [], false ); @@ -44,7 +44,7 @@ describe("calculateInitialCatalog", () => { const { config, stream } = mockSyncSchemaStream; const values = calculateInitialCatalog( - { + ({ streams: [ { id: "1", @@ -55,7 +55,7 @@ describe("calculateInitialCatalog", () => { config, }, ], - } as unknown as SyncSchema, + } as unknown) as SyncSchema, [], false ); diff --git a/airbyte-webapp/src/views/Connection/ConnectionForm/components/SyncCatalogField.tsx b/airbyte-webapp/src/views/Connection/ConnectionForm/components/SyncCatalogField.tsx index ca70f384fe3e4..e5ba60b761c39 100644 --- a/airbyte-webapp/src/views/Connection/ConnectionForm/components/SyncCatalogField.tsx +++ b/airbyte-webapp/src/views/Connection/ConnectionForm/components/SyncCatalogField.tsx @@ -191,10 +191,9 @@ const SyncCatalogField: React.FC = ({ [streams, onChangeSchema] ); - const sortedSchema = useMemo( - () => streams.sort(naturalComparatorBy((syncStream) => syncStream.stream?.name ?? "")), - [streams] - ); + const sortedSchema = useMemo(() => streams.sort(naturalComparatorBy((syncStream) => syncStream.stream?.name ?? "")), [ + streams, + ]); const filteredStreams = useMemo(() => { const filters: Array<(s: SyncSchemaStream) => boolean> = [ diff --git a/airbyte-webapp/src/views/Connection/ConnectionForm/formConfig.tsx b/airbyte-webapp/src/views/Connection/ConnectionForm/formConfig.tsx index 2852f086004ee..1c2fd4dc267b1 100644 --- a/airbyte-webapp/src/views/Connection/ConnectionForm/formConfig.tsx +++ b/airbyte-webapp/src/views/Connection/ConnectionForm/formConfig.tsx @@ -207,8 +207,8 @@ const getInitialNormalization = ( operations?: Array, isEditMode?: boolean ): NormalizationType => { - const initialNormalization = - operations?.find(isNormalizationTransformation)?.operatorConfiguration?.normalization?.option; + const initialNormalization = operations?.find(isNormalizationTransformation)?.operatorConfiguration?.normalization + ?.option; return initialNormalization ? NormalizationType[initialNormalization] diff --git a/airbyte-webapp/src/views/Connector/ServiceForm/components/Controls/ConnectorServiceTypeControl.tsx b/airbyte-webapp/src/views/Connector/ServiceForm/components/Controls/ConnectorServiceTypeControl.tsx index 9ba80a5edf6dc..50ce71ba9206c 100644 --- a/airbyte-webapp/src/views/Connector/ServiceForm/components/Controls/ConnectorServiceTypeControl.tsx +++ b/airbyte-webapp/src/views/Connector/ServiceForm/components/Controls/ConnectorServiceTypeControl.tsx @@ -222,10 +222,10 @@ const ConnectorServiceTypeControl: React.FC = [analytics, formType, formatMessage] ); - const selectedService = React.useMemo( - () => availableServices.find((s) => Connector.id(s) === field.value), - [field.value, availableServices] - ); + const selectedService = React.useMemo(() => availableServices.find((s) => Connector.id(s) === field.value), [ + field.value, + availableServices, + ]); const handleSelect = useCallback( (item: DropDownRow.IDataItem | null) => { diff --git a/airbyte-webapp/src/views/Connector/ServiceForm/components/Sections/auth/useOauthFlowAdapter.tsx b/airbyte-webapp/src/views/Connector/ServiceForm/components/Sections/auth/useOauthFlowAdapter.tsx index 5758c37310f23..a39fc47d4eb81 100644 --- a/airbyte-webapp/src/views/Connector/ServiceForm/components/Sections/auth/useOauthFlowAdapter.tsx +++ b/airbyte-webapp/src/views/Connector/ServiceForm/components/Sections/auth/useOauthFlowAdapter.tsx @@ -11,7 +11,9 @@ import { useServiceForm } from "../../../serviceFormContext"; import { ServiceFormValues } from "../../../types"; import { makeConnectionConfigurationPath, serverProvidedOauthPaths } from "../../../utils"; -function useFormikOauthAdapter(connector: ConnectorDefinitionSpecification): { +function useFormikOauthAdapter( + connector: ConnectorDefinitionSpecification +): { loading: boolean; done?: boolean; run: () => Promise; @@ -47,11 +49,9 @@ function useFormikOauthAdapter(connector: ConnectorDefinitionSpecification): { done, run: async () => { const oauthInputProperties = - ( - connector?.advancedAuth?.oauthConfigSpecification?.oauthUserInputFromConnectorConfigSpecification as { - properties: Array<{ path_in_connector_config: string[] }>; - } - )?.properties ?? {}; + (connector?.advancedAuth?.oauthConfigSpecification?.oauthUserInputFromConnectorConfigSpecification as { + properties: Array<{ path_in_connector_config: string[] }>; + })?.properties ?? {}; if (!isEmpty(oauthInputProperties)) { const oauthInputFields = diff --git a/airbyte-webapp/src/views/Connector/ServiceForm/serviceFormContext.tsx b/airbyte-webapp/src/views/Connector/ServiceForm/serviceFormContext.tsx index 3937e364c6a3e..0d4f8a2f7555a 100644 --- a/airbyte-webapp/src/views/Connector/ServiceForm/serviceFormContext.tsx +++ b/airbyte-webapp/src/views/Connector/ServiceForm/serviceFormContext.tsx @@ -59,10 +59,10 @@ const ServiceFormContextProvider: React.FC<{ const { hasFeature } = useFeatureService(); const { serviceType } = values; - const selectedService = useMemo( - () => availableServices.find((s) => Connector.id(s) === serviceType), - [availableServices, serviceType] - ); + const selectedService = useMemo(() => availableServices.find((s) => Connector.id(s) === serviceType), [ + availableServices, + serviceType, + ]); const isAuthFlowSelected = useMemo( () => diff --git a/airbyte-workers/src/main/java/io/airbyte/workers/normalization/DefaultNormalizationRunner.java b/airbyte-workers/src/main/java/io/airbyte/workers/normalization/DefaultNormalizationRunner.java index 9850255c19a3e..d7df29e4a7c4a 100644 --- a/airbyte-workers/src/main/java/io/airbyte/workers/normalization/DefaultNormalizationRunner.java +++ b/airbyte-workers/src/main/java/io/airbyte/workers/normalization/DefaultNormalizationRunner.java @@ -51,7 +51,8 @@ public enum DestinationType { POSTGRES, REDSHIFT, SNOWFLAKE, - CLICKHOUSE + CLICKHOUSE, + DATABRICKS } public DefaultNormalizationRunner(final WorkerConfigs workerConfigs, diff --git a/airbyte-workers/src/main/java/io/airbyte/workers/normalization/NormalizationRunnerFactory.java b/airbyte-workers/src/main/java/io/airbyte/workers/normalization/NormalizationRunnerFactory.java index 33d1f0f4c4b7c..e990420213a38 100644 --- a/airbyte-workers/src/main/java/io/airbyte/workers/normalization/NormalizationRunnerFactory.java +++ b/airbyte-workers/src/main/java/io/airbyte/workers/normalization/NormalizationRunnerFactory.java @@ -24,6 +24,7 @@ public class NormalizationRunnerFactory { ImmutablePair.of(BASE_NORMALIZATION_IMAGE_NAME, DefaultNormalizationRunner.DestinationType.BIGQUERY)) .put("airbyte/destination-clickhouse", ImmutablePair.of("airbyte/normalization-clickhouse", DestinationType.CLICKHOUSE)) .put("airbyte/destination-clickhouse-strict-encrypt", ImmutablePair.of("airbyte/normalization-clickhouse", DestinationType.CLICKHOUSE)) + .put("airbyte/destination-databricks", ImmutablePair.of("airbyte/normalization-databricks", DestinationType.DATABRICKS)) .put("airbyte/destination-mssql", ImmutablePair.of("airbyte/normalization-mssql", DestinationType.MSSQL)) .put("airbyte/destination-mssql-strict-encrypt", ImmutablePair.of("airbyte/normalization-mssql", DestinationType.MSSQL)) .put("airbyte/destination-mysql", ImmutablePair.of("airbyte/normalization-mysql", DestinationType.MYSQL))