Skip to content

Latest commit

 

History

History
597 lines (473 loc) · 30.1 KB

CONTRIBUTING.md

File metadata and controls

597 lines (473 loc) · 30.1 KB

Contributing to elasticsearch

Elasticsearch is an open source project and we love to receive contributions from our community — you! There are many ways to contribute, from writing tutorials or blog posts, improving the documentation, submitting bug reports and feature requests or writing code which can be incorporated into Elasticsearch itself.

Bug reports

If you think you have found a bug in Elasticsearch, first make sure that you are testing against the latest version of Elasticsearch - your issue may already have been fixed. If not, search our issues list on GitHub in case a similar issue has already been opened.

It is very helpful if you can prepare a reproduction of the bug. In other words, provide a small test case which we can run to confirm your bug. It makes it easier to find the problem and to fix it. Test cases should be provided as curl commands which we can copy and paste into a terminal to run it locally, for example:

# delete the index
curl -XDELETE localhost:9200/test

# insert a document
curl -XPUT localhost:9200/test/test/1 -d '{
 "title": "test document"
}'

# this should return XXXX but instead returns YYY
curl ....

Provide as much information as you can. You may think that the problem lies with your query, when actually it depends on how your data is indexed. The easier it is for us to recreate your problem, the faster it is likely to be fixed.

Feature requests

If you find yourself wishing for a feature that doesn't exist in Elasticsearch, you are probably not alone. There are bound to be others out there with similar needs. Many of the features that Elasticsearch has today have been added because our users saw the need. Open an issue on our issues list on GitHub which describes the feature you would like to see, why you need it, and how it should work.

Contributing code and documentation changes

If you have a bugfix or new feature that you would like to contribute to Elasticsearch, please find or open an issue about it first. Talk about what you would like to do. It may be that somebody is already working on it, or that there are particular issues that you should know about before implementing the change.

We enjoy working with contributors to get their code accepted. There are many approaches to fixing a problem and it is important to find the best approach before writing too much code.

Note that it is unlikely the project will merge refactors for the sake of refactoring. These types of pull requests have a high cost to maintainers in reviewing and testing with little to no tangible benefit. This especially includes changes generated by tools. For example, converting all generic interface instances to use the diamond operator.

The process for contributing to any of the Elastic repositories is similar. Details for individual projects can be found below.

Fork and clone the repository

You will need to fork the main Elasticsearch code or documentation repository and clone it to your local machine. See github help page for help.

Further instructions for specific projects are given below.

Submitting your changes

Once your changes and tests are ready to submit for review:

  1. Test your changes

    Run the test suite to make sure that nothing is broken. See the TESTING file for help running tests.

  2. Sign the Contributor License Agreement

    Please make sure you have signed our Contributor License Agreement. We are not asking you to assign copyright to us, but to give us the right to distribute your code without restriction. We ask this of all contributors in order to assure our users of the origin and continuing existence of the code. You only need to sign the CLA once.

  3. Rebase your changes

    Update your local repository with the most recent code from the main Elasticsearch repository, and rebase your branch on top of the latest master branch. We prefer your initial changes to be squashed into a single commit. Later, if we ask you to make changes, add them as separate commits. This makes them easier to review. As a final step before merging we will either ask you to squash all commits yourself or we'll do it for you.

  4. Submit a pull request

    Push your local changes to your forked copy of the repository and submit a pull request. In the pull request, choose a title which sums up the changes that you have made, and in the body provide more details about what your changes do. Also mention the number of the issue where discussion has taken place, eg "Closes #123".

Then sit back and wait. There will probably be discussion about the pull request and, if any changes are needed, we would love to work with you to get your pull request merged into Elasticsearch.

Please adhere to the general guideline that you should never force push to a publicly shared branch. Once you have opened your pull request, you should consider your branch publicly shared. Instead of force pushing you can just add incremental commits; this is generally easier on your reviewers. If you need to pick up changes from master, you can merge master into your branch. A reviewer might ask you to rebase a long-running pull request in which case force pushing is okay for that request. Note that squashing at the end of the review process should also not be done, that can be done when the pull request is integrated via GitHub.

Contributing to the Elasticsearch codebase

Repository: https://github.com/elastic/elasticsearch

JDK 13 is required to build Elasticsearch. You must have a JDK 13 installation with the environment variable JAVA_HOME referencing the path to Java home for your JDK 13 installation. By default, tests use the same runtime as JAVA_HOME. However, since Elasticsearch supports JDK 11, the build supports compiling with JDK 13 and testing on a JDK 11 runtime; to do this, set RUNTIME_JAVA_HOME pointing to the Java home of a JDK 11 installation. Note that this mechanism can be used to test against other JDKs as well, this is not only limited to JDK 11.

Note: It is also required to have JAVA8_HOME, JAVA9_HOME, JAVA10_HOME and JAVA11_HOME, and JAVA12_HOME available so that the tests can pass.

Warning: do not use sdkman for Java installations which do not have proper jrunscript for jdk distributions.

Elasticsearch uses the Gradle wrapper for its build. You can execute Gradle using the wrapper via the gradlew script on Unix systems or gradlew.bat script on Windows in the root of the repository. The examples below show the usage on Unix.

We support development in the Eclipse and IntelliJ IDEs. For Eclipse, the minimum version that we support is 4.13. For IntelliJ, the minimum version that we support is IntelliJ 2017.2.

Docker is required for building some Elasticsearch artifacts and executing certain test suites. You can run Elasticsearch without building all the artifacts with:

./gradlew :run

You can access Elasticsearch with:

curl -u elastic:password localhost:9200

Configuring IDEs And Running Tests

Eclipse users can automatically configure their IDE: ./gradlew eclipse then File: Import: Gradle : Existing Gradle Project. Additionally you will want to ensure that Eclipse is using 2048m of heap by modifying eclipse.ini accordingly to avoid GC overhead and OOM errors.

IntelliJ users can automatically configure their IDE: ./gradlew idea then File->New Project From Existing Sources. Point to the root of the source directory, select Import project from external model->Gradle, enable Use auto-import. In order to run tests directly from IDEA 2017.2 and above, it is required to disable the IDEA run launcher in order to avoid idea_rt.jar causing "jar hell". This can be achieved by adding the -Didea.no.launcher=true JVM option. Alternatively, idea.no.launcher=true can be set in the idea.properties file which can be accessed under Help > Edit Custom Properties (this will require a restart of IDEA). For IDEA 2017.3 and above, in addition to the JVM option, you will need to go to Run->Edit Configurations->...->Defaults->JUnit and verify that the Shorten command line setting is set to user-local default: none. You may also need to remove ant-javafx.jar from your classpath if that is reported as a source of jar hell.

To run an instance of elasticsearch from the source code run ./gradlew run

The Elasticsearch codebase makes heavy use of Java asserts and the test runner requires that assertions be enabled within the JVM. This can be accomplished by passing the flag -ea to the JVM on startup.

For IntelliJ, go to Run->Edit Configurations...->Defaults->JUnit->VM options and input -ea.

For Eclipse, go to Preferences->Java->Installed JREs and add -ea to VM Arguments.

Some tests related to locale testing also require the flag -Djava.locale.providers to be set. Set the VM options/VM arguments for IntelliJ or Eclipse like describe above to use -Djava.locale.providers=SPI,COMPAT.

Java Language Formatting Guidelines

Java files in the Elasticsearch codebase are formatted with the Eclipse JDT formatter, using the Spotless Gradle plugin. This plugin is configured on a project-by-project basis, via build.gradle in the root of the repository. The formatting check can be run explicitly with:

./gradlew spotlessJavaCheck

The code can be formatted with:

./gradlew spotlessApply

These tasks can also be run for specific subprojects, e.g.

./gradlew server:spotlessJavaCheck

Please follow these formatting guidelines:

  • Java indent is 4 spaces
  • Line width is 140 characters
  • Lines of code surrounded by // tag::NAME and // end::NAME comments are included in the documentation and should only be 76 characters wide not counting leading indentation. Such regions of code are not formatted automatically as it is not possible to change the line length rule of the formatter for part of a file. Please format such sections sympathetically with the rest of the code, while keeping lines to maximum length of 76 characters.
  • Wildcard imports (import foo.bar.baz.*) are forbidden and will cause the build to fail. This can be done automatically by your IDE:
    • Eclipse: Preferences->Java->Code Style->Organize Imports. There are two boxes labeled "Number of (static )? imports needed for .*". Set their values to 99999 or some other absurdly high value.
    • IntelliJ: Preferences/Settings->Editor->Code Style->Java->Imports. There are two configuration options: Class count to use import with '*' and Names count to use static import with '*'. Set their values to 99999 or some other absurdly high value.
  • If absolutely necessary, you can disable formatting for regions of code with the // tag::NAME and // end::NAME directives, but note that these are intended for use in documentation, so please make it clear what you have done, and only do this where the benefit clearly outweighs the decrease in consistency.
  • Note that Javadoc and block comments i.e. /* ... */ are not formatted, but line comments i.e // ... are.
  • There is an implicit rule that negative boolean expressions should use the form foo == false instead of !foo for better readability of the code. While this isn't strictly enforced, if might get called out in PR reviews as something to change.

Editor / IDE Support

Eclipse IDEs can import the file [.eclipseformat.xml] directly.

IntelliJ IDEs can import the same settings file, and / or use the Eclipse Code Formatter plugin.

You can also tell Spotless to format a specific file from the command line.

Formatting failures

Sometimes Spotless will report a "misbehaving rule which can't make up its mind" and will recommend enabling the paddedCell() setting. If you enabled this setting and run the format check again, Spotless will write files to $PROJECT/build/spotless-diagnose-java/ to aid diagnosis. It writes different copies of the formatted files, so that you can see how they differ and infer what is the problem.

The paddedCell() option is disabled for normal operation so that any misbehaviour is detected, and not just suppressed. You can enabled the option from the command line by running Gradle with -Dspotless.paddedcell.

Javadoc

Good Javadoc can help with navigating and understanding code. Elasticsearch has some guidelines around when to write Javadoc and when not to, but note that we don't want to be overly prescriptive. The intent of these guidelines is to be helpful, not to turn writing code into a chore.

The short version

  1. Always add Javadoc to new code.
  2. Add Javadoc to existing code if you can.
  3. Document the "why", not the "how", unless that's important to the "why".
  4. Don't document anything trivial or obvious (e.g. getters and setters). In other words, the Javadoc should add some value.

The long version

  1. If you add a new Java package, please also add package-level Javadoc that explains what the package is for. This can just be a reference to a more foundational / parent package if appropriate. An example would be a package hierarchy for a new feature or plugin - the package docs could explain the purpose of the feature, any caveats, and possibly some examples of configuration and usage.
  2. New classes and interfaces must have class-level Javadoc that describes their purpose. There are a lot of classes in the Elasticsearch repository, and it's easier to navigate when you can quickly find out what is the purpose of a class. This doesn't apply to inner classes or interfaces, unless you expect them to be explicitly used outside their parent class.
  3. New public methods must have Javadoc, because they form part of the contract between the class and its consumers. Similarly, new abstract methods must have Javadoc because they are part of the contract between a class and its subclasses. It's important that contributors know why they need to implement a method, and the Javadoc should make this clear. You don't need to document a method if it's overriding an abstract method (either from an abstract superclass or an interface), unless your implementation is doing something "unexpected" e.g. deviating from the intent of the original method.
  4. Following on from the above point, please add docs to existing public methods if you are editing them, or to abstract methods if you can.
  5. Non-public, non-abstract methods don't require Javadoc, but if you feel that adding some would make it easier for other developers to understand the code, or why it's written in a particular way, then please do so.
  6. Properties don't need to have Javadoc, but please add some if there's something useful to say.
  7. Javadoc should not go into low-level implementation details unless this is critical to understanding the code e.g. documenting the subtleties of the implementation of a private method. The point here is that implementations will change over time, and the Javadoc is less likely to become out-of-date if it only talks about the what is the purpose of the code, not what it does.
  8. Examples in Javadoc can be very useful, so feel free to add some if you can reasonably do so i.e. if it takes a whole page of code to set up an example, then Javadoc probably isn't the right place for it. Longer or more elaborate examples are probably better suited to the package docs.
  9. Test methods are a good place to add Javadoc, because you can use it to succinctly describe e.g. preconditions, actions and expectations of the test, more easily that just using the test name alone. Please consider documenting your tests in this way.
  10. Sometimes you shouldn't add Javadoc:
    1. Where it adds no value, for example where a method's implementation is trivial such as with getters and setters, or a method just delegates to another object.
    2. However, you should still add Javadoc if there are caveats around calling a method that are not immediately obvious from reading the method's implementation in isolation.
    3. You can omit Javadoc for simple classes, e.g. where they are a simple container for some data. However, please consider whether a reader might still benefit from some additional background, for example about why the class exists at all.
  11. Not all comments need to be Javadoc. Sometimes it will make more sense to add comments in a method's body, for example due to important implementation decisions or "gotchas". As a general guide, if some information forms part of the contract between a method and its callers, then it should go in the Javadoc, otherwise you might consider using regular comments in the code. Remember as well that Elasticsearch has extensive user documentation, and it is not the role of Javadoc to replace that.
  12. Please still try to make class, method or variable names as descriptive and concise as possible, as opposed to relying solely on Javadoc to describe something.
  13. Use @link and @see to add references, either to related resources in the codebase or to relevant external resources.
  14. If you need help writing Javadoc, just ask!

Finally, use your judgement! Base your decisions on what will help other developers - including yourself, when you come back to some code 3 months in the future, having forgotten how it works.

License Headers

We require license headers on all Java files. With the exception of the top-level x-pack directory, all contributed code should have the following license header unless instructed otherwise:

/*
 * Licensed to Elasticsearch under one or more contributor
 * license agreements. See the NOTICE file distributed with
 * this work for additional information regarding copyright
 * ownership. Elasticsearch licenses this file to you under
 * the Apache License, Version 2.0 (the "License"); you may
 * not use this file except in compliance with the License.
 * You may obtain a copy of the License at
 *
 *    http://www.apache.org/licenses/LICENSE-2.0
 *
 * Unless required by applicable law or agreed to in writing,
 * software distributed under the License is distributed on an
 * "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
 * KIND, either express or implied.  See the License for the
 * specific language governing permissions and limitations
 * under the License.
 */

The top-level x-pack directory contains code covered by the Elastic license. Community contributions to this code are welcome, and should have the following license header unless instructed otherwise:

/*
 * Copyright Elasticsearch B.V. and/or licensed to Elasticsearch B.V. under one
 * or more contributor license agreements. Licensed under the Elastic License;
 * you may not use this file except in compliance with the Elastic License.
 */

It is important that the only code covered by the Elastic licence is contained within the top-level x-pack directory. The build will fail its pre-commit checks if contributed code does not have the appropriate license headers.

You may find it helpful to configure your IDE to automatically insert the appropriate license header depending on the part of the project to which you are contributing.

IntelliJ: Copyright & Scope Profiles

To have IntelliJ insert the correct license, it is necessary to create to copyright profiles. These may potentially be called apache2 and commercial. These can be created in Preferences/Settings->Editor->Copyright->Copyright Profiles. To associate these profiles to their respective directories, two "Scopes" will need to be created. These can be created in Preferences/Settings->Appearances & Behavior->Scopes. When creating scopes, be sure to choose the shared scope type. Create a scope, apache2, with the associated pattern of !file[group:x-pack]:*/. This pattern will exclude all the files contained in the x-pack directory. The other scope, commercial, will have the inverse pattern of file[group:x-pack]:*/. The two scopes, together, should account for all the files in the project. To associate the scopes with their copyright-profiles, go into Preferences/Settings->Editor>Copyright and use the + to add the associations apache2/apache2 and commercial/commercial.

Configuring these options in IntelliJ can be quite buggy, so do not be alarmed if you have to open/close the settings window and/or restart IntelliJ to see your changes take effect.

Creating A Distribution

Run all build commands from within the root directory:

cd elasticsearch/

To build a darwin-tar distribution, run this command:

./gradlew -p distribution/archives/darwin-tar assemble --parallel

You will find the distribution under: ./distribution/archives/darwin-tar/build/distributions/

To create all build artifacts (e.g., plugins and Javadocs) as well as distributions in all formats, run this command:

./gradlew assemble --parallel

The package distributions (Debian and RPM) can be found under: ./distribution/packages/(deb|rpm|oss-deb|oss-rpm)/build/distributions/

The archive distributions (tar and zip) can be found under: ./distribution/archives/(darwin-tar|linux-tar|windows-zip|oss-darwin-tar|oss-linux-tar|oss-windows-zip)/build/distributions/

Running The Full Test Suite

Before submitting your changes, run the test suite to make sure that nothing is broken, with:

./gradlew check

If your changes affect only the documentation, run:

./gradlew -p docs check

For more information about testing code examples in the documentation, see https://github.com/elastic/elasticsearch/blob/master/docs/README.asciidoc

Project layout

This repository is split into many top level directories. The most important ones are:

docs

Documentation for the project.

distribution

Builds our tar and zip archives and our rpm and deb packages.

libs

Libraries used to build other parts of the project. These are meant to be internal rather than general purpose. We have no plans to semver their APIs or accept feature requests for them. We publish them to maven central because they are dependencies of our plugin test framework, high level rest client, and jdbc driver but they really aren't general purpose enough to belong in maven central. We're still working out what to do here.

modules

Features that are shipped with Elasticsearch by default but are not built in to the server. We typically separate features from the server because they require permissions that we don't believe all of Elasticsearch should have or because they depend on libraries that we don't believe all of Elasticsearch should depend on.

For example, reindex requires the connect permission so it can perform reindex-from-remote but we don't believe that the all of Elasticsearch should have the "connect". For another example, Painless is implemented using antlr4 and asm and we don't believe that all of Elasticsearch should have access to them.

plugins

Officially supported plugins to Elasticsearch. We decide that a feature should be a plugin rather than shipped as a module because we feel that it is only important to a subset of users, especially if it requires extra dependencies.

The canonical example of this is the ICU analysis plugin. It is important for folks who want the fairly language neutral ICU analyzer but the library to implement the analyzer is 11MB so we don't ship it with Elasticsearch by default.

Another example is the discovery-gce plugin. It is vital to folks running in GCP but useless otherwise and it depends on a dozen extra jars.

qa

Honestly this is kind of in flux and we're not 100% sure where we'll end up. Right now the directory contains

  • Tests that require multiple modules or plugins to work
  • Tests that form a cluster made up of multiple versions of Elasticsearch like full cluster restart, rolling restarts, and mixed version tests
  • Tests that test the Elasticsearch clients in "interesting" places like the wildfly project.
  • Tests that test Elasticsearch in funny configurations like with ingest disabled
  • Tests that need to do strange things like install plugins that thrown uncaught Throwables or add a shutdown hook But we're not convinced that all of these things belong in the qa directory. We're fairly sure that tests that require multiple modules or plugins to work should just pick a "home" plugin. We're fairly sure that the multi-version tests do belong in qa. Beyond that, we're not sure. If you want to add a new qa project, open a PR and be ready to discuss options.

server

The server component of Elasticsearch that contains all of the modules and plugins. Right now things like the high level rest client depend on the server but we'd like to fix that in the future.

test

Our test framework and test fixtures. We use the test framework for testing the server, the plugins, and modules, and pretty much everything else. We publish the test framework so folks who develop Elasticsearch plugins can use it to test the plugins. The test fixtures are external processes that we start before running specific tests that rely on them.

For example, we have an hdfs test that uses mini-hdfs to test our repository-hdfs plugin.

x-pack

Commercially licensed code that integrates with the rest of Elasticsearch. The docs subdirectory functions just like the top level docs subdirectory and the qa subdirectory functions just like the top level qa subdirectory. The plugin subdirectory contains the x-pack module which runs inside the Elasticsearch process.

Gradle Build

We use Gradle to build Elasticsearch because it is flexible enough to not only build and package Elasticsearch, but also orchestrate all of the ways that we have to test Elasticsearch.

Configurations

Gradle organizes dependencies and build artifacts into "configurations" and allows you to use these configurations arbitrarily. Here are some of the most common configurations in our build and how we use them:

`compile`
Code that is on the classpath at both compile and runtime.
`runtime`
Code that is not on the classpath at compile time but is on the classpath at runtime. We mostly use this configuration to make sure that we do not accidentally compile against dependencies of our dependencies also known as "transitive" dependencies".
`compileOnly`
Code that is on the classpath at compile time but that should not be shipped with the project because it is "provided" by the runtime somehow. Elasticsearch plugins use this configuration to include dependencies that are bundled with Elasticsearch's server.
`bundle`
Only available in projects with the shadow plugin, dependencies with this configuration are bundled into the jar produced by the build. Since IDEs do not understand this configuration we rig them to treat dependencies in this configuration as `compile` dependencies.
`testCompile`
Code that is on the classpath for compiling tests that are part of this project but not production code. The canonical example of this is `junit`.

Contributing as part of a class

In general Elasticsearch is happy to accept contributions that were created as part of a class but strongly advise against making the contribution as part of the class. So if you have code you wrote for a class feel free to submit it.

Please, please, please do not assign contributing to Elasticsearch as part of a class. If you really want to assign writing code for Elasticsearch as an assignment then the code contributions should be made to your private clone and opening PRs against the primary Elasticsearch clone must be optional, fully voluntary, not for a grade, and without any deadlines.

Because:

  • While the code review process is likely very educational, it can take wildly varying amounts of time depending on who is available, where the change is, and how deep the change is. There is no way to predict how long it will take unless we rush.
  • We do not rush reviews without a very, very good reason. Class deadlines aren't a good enough reason for us to rush reviews.
  • We deeply discourage opening a PR you don't intend to work through the entire code review process because it wastes our time.
  • We don't have the capacity to absorb an entire class full of new contributors, especially when they are unlikely to become long time contributors.

Finally, we require that you run ./gradlew check before submitting a non-documentation contribution. This is mentioned above, but it is worth repeating in this section because it has come up in this context.