Automatically generate datanode.conf.example #19141

todvora · 2024-04-23T16:04:04Z

Generate the datanode.conf.example file with all possible configuration options. Also generate the csv documentation for the same.

/nocl

Description

This PR adds two maven tasks that generate datanode.conf.example and datanode-conf-docs.csv files. The conf.example will be included in our dist packages. The csv may be used for documentation purposes.

The datanode assembly now includes this generated file instead of the manually created.

Motivation and Context

Automate and enforce documentation of configuration options.

How Has This Been Tested?

Manually

Screenshots (if appropriate):

#####################################
# GRAYLOG DATANODE CONFIGURATION FILE
#####################################
#
# This is the Graylog DataNode configuration file. The file has to use ISO 8859-1/Latin-1 character encoding.
# Characters that cannot be directly represented in this encoding can be written using Unicode escapes
# as defined in https://docs.oracle.com/javase/specs/jls/se8/html/jls-3.html#jls-3.3, using the \u prefix.
# For example, \u002c.
#
# * Entries are generally expected to be a single line of the form, one of the following:
#
# propertyName=propertyValue
# propertyName:propertyValue
#
# * White space that appears between the property name and property value is ignored,
#   so the following are equivalent:
#
# name=Stephen
# name = Stephen
#
# * White space at the beginning of the line is also ignored.
#
# * Lines that start with the comment characters ! or # are ignored. Blank lines are also ignored.
#
# * The property value is generally terminated by the end of the line. White space following the
#   property value is not ignored, and is treated as part of the property value.
#
# * A property value can span several lines if each line is terminated by a backslash (‘\’) character.
#   For example:
#
# targetCities=\
#         Detroit,\
#         Chicago,\
#         Los Angeles
#
#   This is equivalent to targetCities=Detroit,Chicago,Los Angeles (white space at the beginning of lines is ignored).
#
# * The characters newline, carriage return, and tab can be inserted with characters \n, \r, and \t, respectively.
#
# * The backslash character must be escaped as a double backslash. For example:
#
# path=c:\\docs\\doc1
#


# You MUST set a secret to secure/pepper the stored user passwords here. Use at least 64 characters.
# Generate one by using for example: pwgen -N 1 -s 96
# ATTENTION: This value must be the same on all Graylog and Datanode nodes in the cluster.
# Changing this value after installation will render all user sessions and encrypted values
# in the database invalid. (e.g. encrypted access tokens)
password_secret = 

# Do not perform any preflight checks when starting Datanode.
#skip_preflight_checks = false

# How many milliseconds should datanode wait for termination of all tasks during the shutdown.
#shutdown_timeout = 30000

# Directory where Datanode will search for an opensearch distribution.
#opensearch_location = dist

# Data directory of the embedded opensearch. Contains indices of the opensearch.
# May be pointed to an existing opensearch directory during in-place migration to Datanode
#opensearch_data_location = datanode/data

# Logs directory of the embedded opensearch
#opensearch_logs_location = datanode/logs

# Configuration directory of the embedded opensearch. This is the directory where the opensearch
# process will store its configuration files. Caution, each start of the Datanode will regenerate
# the complete content of the directory!
#opensearch_config_location = datanode/config

# Source directory of the additional configuration files for the Datanode. Additional certificates can be provided here.
#config_location = 

# How many log entries of the opensearch process should Datanode hold in memory and make accessible via API calls.
#process_logs_buffer_size = 500

# Unique name of this Datanode instance. use this, if your node name should be different from the hostname
# that's found by programmatically looking it up.
#node_name = 

# Comma separated list of opensearch nodes that are eligible as manager nodes.
#initial_cluster_manager_nodes = 

# Opensearch heap memory. Initial and maxmium heap must be identical for OpenSearch, otherwise the boot fails.
# So it's only one config option.
#opensearch_heap = 1g

# HTTP port on which the embedded opensearch listens
#opensearch_http_port = 9200

# Transport port on which the embedded opensearch listens
#opensearch_transport_port = 9300

# Provides a list of the addresses of the master-eligible nodes in the cluster.
#opensearch_discovery_seed_hosts = []

# Binds an OpenSearch node to an address. Use 0.0.0.0 to include all available network interfaces,
# or specify an IP address assigned to a specific interface.
#opensearch_network_host = 

# Relative path (to config_location) to a keystore used for opensearch transport layer TLS
#transport_certificate = 

# Password for a keystore defined in transport_certificate
#transport_certificate_password = 

# Relative path (to config_location) to a keystore used for opensearch REST layer TLS
#http_certificate = 

# Password for a keystore defined in http_certificate
#http_certificate_password = 

# communication between Graylog and OpenSearch is secured by JWT.
# This configuration defines interval between token regenerations.
#indexer_jwt_auth_token_caching_duration = 60 seconds

# communication between Graylog and OpenSearch is secured by JWT.
# This configuration defines validity interval of JWT tokens.
#indexer_jwt_auth_token_expiration_duration = 180 seconds

# The auto-generated node ID will be stored in this file and read after restarts. It is a good idea
# to use an absolute file path here if you are starting Graylog DataNode from init scripts or similar.
#node_id_file = data/node-id

# HTTP bind address. The network interface used by the Graylog DataNode to bind all services.
#bind_address = 0.0.0.0

# HTTP port. The port where the DataNode REST api is listening
#datanode_http_port = 8999

# Name of the cluster that the embedded opensearch will form. Should be the same for all Datanodes in one cluster.
#clustername = datanode-cluster

# This configuration should be used if you want to connect to this Graylog DataNode's REST API
# and it is available on another network interface than $http_bind_address,
# for example if the machine has multiple network interfaces or is behind a NAT gateway.
#http_publish_uri = 

# Enable GZIP support for HTTP interface. This compresses API responses and therefore helps to reduce
# overall round trip times.
#http_enable_gzip = true

# The maximum size of the HTTP request headers in bytes
#http_max_header_size = 8192

# The size of the thread pool used exclusively for serving the HTTP interface.
#http_thread_pool_size = 64

# Cache size for searchable snaphots
#node_search_cache_size = 10gb

# Filesystem path where searchable snapshots should be stored
#path_repo = 

# This setting limits the number of clauses a Lucene BooleanQuery can have.
#opensearch_indices_query_bool_max_clause_count = 32768

# The list of the opensearch node’s roles.
#node_roles = [cluster_manager, data, ingest, remote_cluster_client, search]

# Configures verbosity of embedded opensearch logs.
# Possible values OFF, FATAL, ERROR, WARN, INFO, DEBUG, and TRACE, default is INFO
#opensearch_logger_org_opensearch = 

# Configures opensearch audit log storage type. See https://opensearch.org/docs/2.13/security/audit-logs/storage-types/
#opensearch_plugins_security_audit_type = 

# Increase this value according to the maximum connections your MongoDB server can handle from a single client
# if you encounter MongoDB connection problems.
#mongodb_max_connections = 1000

# MongoDB connection string. See https://docs.mongodb.com/manual/reference/connection-string/ for details
#mongodb_uri = mongodb://localhost/graylog

# Maximum number of attempts to connect to MongoDB on boot for the version probe.
# Default 0 means retry indefinitely until a connection can be established
#mongodb_version_probe_attempts = 0

# allowed TLS protocols for system wide TLS enabled servers. (e.g. message inputs, http interface)
# Setting this to an empty value, leaves it up to system libraries and the used JDK to chose a default.
#enabled_tls_protocols = 

# S3 repository access key for searchable snapshots
#s3_client_default_access_key = 

# S3 repository secret key for searchable snapshots
#s3_client_default_secret_key = 

# S3 repository protocol for searchable snapshots
#s3_client_default_protocol = http

# S3 repository endpoint for searchable snapshots
#s3_client_default_endpoint = 

# S3 repository region for searchable snapshots
#s3_client_default_region = us-east-2

# S3 repository path-style access for searchable snapshots
#s3_client_default_path_style_access = true

Types of changes

Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Refactoring (non-breaking change)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have added tests to cover my changes.

…onfiguration options

moesterheld · 2024-04-26T07:22:52Z

data-node/src/main/java/org/graylog/datanode/docs/ConfigurationDocsGenerator.java

+                .map(f -> toConfigurationEntry(f, configurationBean));
+    }
+
+    private static List<Object> getDatanodeConfigurationBeans() {


Maybe we should already move that out of the generator to be able to move the generator to a server package more easily?

Do you have any simple idea how/where to get this?

moesterheld · 2024-04-26T07:37:19Z

data-node/src/main/java/org/graylog/datanode/docs/printers/ConfigFileDocsPrinter.java

+
+public class ConfigFileDocsPrinter implements DocsPrinter {
+
+    public static final String DATANODE_CONFIG_HEADER = """


Same here for the header. Maybe move it out of the printer, and possibly read it from a resource to avoid the huge string?

There is only a tiny difference between graylog.conf and datanode.conf header. Maybe it would be enough to provide string headline and use the whole header as a template?

Sounds good.

todvora · 2024-05-13T08:29:14Z

@bernd what do you think? Is this a valid and usable approach? Can we merge it? Thanks!

bernd · 2024-05-13T11:28:57Z

@bernd what do you think? Is this a valid and usable approach? Can we merge it? Thanks!

@todvora, I like the general approach! 👍 We should improve a few details before we ship the generated config, though.

Our server config uses a space character around the =. I think that makes it more readable. The generated config is not doing that yet. (mongodb_uri=mongodb://localhost/graylog vs mongodb_uri = mongodb://localhost/graylog)
The generator currently doesn't support ordering. In the server config, we put important settings that users must set at the top of the file. That makes them easy to spot.
In the server config we comment the setting with its default value without a space between the comment character and the setting name. That makes the setting easier to spot.
The generator currently doesn't support sections to create groups of related settings.

todvora · 2024-05-14T12:06:03Z

Our server config uses a space character around the =. I think that makes it more readable

Easy to implement, done.

The generator currently doesn't support ordering. In the server config, we put important settings that users must set at the top of the file. That makes them easy to spot.

That happens already. Properties that are mandatory and don't have default value, these that users need to fill in, are ordered at the top of the configuration file. Otherwise the order follows order of the properties in the java config beans. Reordering in java leads to reodering in the config file. Seems natural way to handle this.

In the server config we comment the setting with its default value without a space between the comment character and the setting name. That makes the setting easier to spot.

Fixed

The generator currently doesn't support sections to create groups of related settings.

Indeed. IDK how to add this in the current situation without significant overhead in the java config files. From my POV, the sections should correspond to java configuration classes: one class == one section. Then we can add a heading and group description to the class itself. But for now, at least in the datanode, we have (almost)everything cramped into one configuration file. Given this configuration part:

#### OpenSearch JWT token usage
#
# communication between Graylog and OpenSearch is secured by JWT. These are the defaults used for the token usage
# adjust them, if you have special needs.
#
# indexer_jwt_auth_token_caching_duration = 60s
# indexer_jwt_auth_token_expiration_duration = 180s

I see a JwtTokenConfiguration.java class with its own header OpenSearch JWT token usage and communication between Graylog and OpenSearch is secured by JWT. These are the defaults used for the token usage. Adjust them, if you have special needs. as description.

This would be rather simple to implement but would require significant changes in the configuration classes and all of their usages. Is that something we'd consider? I personally think it would offer benefits for the code base as well. Now we drag the whole configuration everywhere, or even worse from the maintenance perspective, use named injects, which make any refactoring a lot harder and error prone.

My suggestion - for now go without sections. Split datanode configuration to more specific config beans. Add section documentation to the beans and see if that will work well for us.

todvora · 2024-05-15T07:11:07Z

Updated the generated configuration in the PR description.

bernd · 2024-05-16T07:25:23Z

Our server config uses a space character around the =. I think that makes it more readable

Easy to implement, done.

👍 Thanks!

The generator currently doesn't support ordering. In the server config, we put important settings that users must set at the top of the file. That makes them easy to spot.

That happens already. Properties that are mandatory and don't have default value, these that users need to fill in, are ordered at the top of the configuration file. Otherwise the order follows order of the properties in the java config beans. Reordering in java leads to reodering in the config file. Seems natural way to handle this.

Ah cool. Sorry, I have missed that. I think my eye caught the node_id_file setting, which is currently at the top of the server conf file and is now somewhere down below. Old habits. 😄

In the server config we comment the setting with its default value without a space between the comment character and the setting name. That makes the setting easier to spot.

Fixed

👍 Thanks!

The generator currently doesn't support sections to create groups of related settings.

Indeed. IDK how to add this in the current situation without significant overhead in the java config files. From my POV, the sections should correspond to java configuration classes: one class == one section. Then we can add a heading and group description to the class itself. But for now, at least in the datanode, we have (almost)everything cramped into one configuration file. Given this configuration part:
#### OpenSearch JWT token usage
#
# communication between Graylog and OpenSearch is secured by JWT. These are the defaults used for the token usage
# adjust them, if you have special needs.
#
# indexer_jwt_auth_token_caching_duration = 60s
# indexer_jwt_auth_token_expiration_duration = 180s
I see a JwtTokenConfiguration.java class with its own header OpenSearch JWT token usage and communication between Graylog and OpenSearch is secured by JWT. These are the defaults used for the token usage. Adjust them, if you have special needs. as description.

This would be rather simple to implement but would require significant changes in the configuration classes and all of their usages. Is that something we'd consider? I personally think it would offer benefits for the code base as well. Now we drag the whole configuration everywhere, or even worse from the maintenance perspective, use named injects, which make any refactoring a lot harder and error prone.

My suggestion - for now go without sections. Split datanode configuration to more specific config beans. Add section documentation to the beans and see if that will work well for us.

I think we could add an optional "section" value to the @Documentation annotation. It's not perfect because it's repetitive, but it would be an easy way to emulate the structure of the server config file.
It's okay to generate the file without sections for the Data Node config. But before we auto-generate the server config, we should add a way to use sections.

Automatically generate datanode.conf.example + csv documentation of c…

db97490

…onfiguration options

todvora requested review from moesterheld and bernd April 23, 2024 16:06

todvora marked this pull request as draft April 25, 2024 11:40

todvora added 3 commits April 25, 2024 14:12

code cleanup, tests

228862a

Merge branch 'master' into feature/automatic-datanode.conf.example

a8656ca

Merge branch 'master' into feature/automatic-datanode.conf.example

1b959c0

todvora marked this pull request as ready for review April 25, 2024 12:39

moesterheld reviewed Apr 26, 2024

View reviewed changes

todvora added 2 commits April 29, 2024 09:48

Merge branch 'master' into feature/automatic-datanode.conf.example

fe6ce97

Merge branch 'master' into feature/automatic-datanode.conf.example

c031465

todvora added 2 commits May 14, 2024 13:27

add spaces around = in generated config example

4bf1ff6

adjust spaces in generated config files, add tests for ordering

ffed46e

Merge branch 'master' into feature/automatic-datanode.conf.example

dfc5cda

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Automatically generate datanode.conf.example #19141

Automatically generate datanode.conf.example #19141

todvora commented Apr 23, 2024 •

edited

moesterheld Apr 26, 2024

todvora Apr 26, 2024

moesterheld Apr 26, 2024

todvora Apr 26, 2024

bernd May 16, 2024

todvora commented May 13, 2024

bernd commented May 13, 2024

todvora commented May 14, 2024

todvora commented May 15, 2024

bernd commented May 16, 2024


		public class ConfigFileDocsPrinter implements DocsPrinter {

		public static final String DATANODE_CONFIG_HEADER = """

Automatically generate datanode.conf.example #19141

Are you sure you want to change the base?

Automatically generate datanode.conf.example #19141

Conversation

todvora commented Apr 23, 2024 • edited

Description

Motivation and Context

How Has This Been Tested?

Screenshots (if appropriate):

Types of changes

Checklist:

moesterheld Apr 26, 2024

Choose a reason for hiding this comment

todvora Apr 26, 2024

Choose a reason for hiding this comment

moesterheld Apr 26, 2024

Choose a reason for hiding this comment

todvora Apr 26, 2024

Choose a reason for hiding this comment

bernd May 16, 2024

Choose a reason for hiding this comment

todvora commented May 13, 2024

bernd commented May 13, 2024

todvora commented May 14, 2024

todvora commented May 15, 2024

bernd commented May 16, 2024

todvora commented Apr 23, 2024 •

edited