Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CSHARP-4440: Incorporate MongoDB.Labs.Search library #989

Merged
merged 1 commit into from Jan 19, 2023

Conversation

BorisDog
Copy link
Contributor

No description provided.

@JamesKovacs JamesKovacs requested review from DmitryLukyanov and removed request for JamesKovacs December 13, 2022 17:33
Copy link
Contributor

@dgolub dgolub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a couple of minor suggestions. Otherwise, the changes look great.

src/MongoDB.Driver/Search/SearchRange.cs Outdated Show resolved Hide resolved
src/MongoDB.Driver/Search/SearchDefinitions.cs Outdated Show resolved Hide resolved
{ _range.IsMaxInclusive ? "lte" : "lt", () => ToBsonValue(_range.Max.Value), _range.Max != null },
};

private static BsonValue ToBsonValue(TField value) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is probably enforceable at compile time using the new generic math stuff that they added in the latest C# version, but I'm assuming that we can't use that for backward compatibility reasons?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah backward compatibility, I wish we could use that.

/// <param name="path">Indexed geo type field or fields to search.</param>
/// <param name="score">The score modifier.</param>
/// <returns>A geo within search definition.</returns>
public SearchDefinition<TDocument> GeoWithin<TCoordinates, TField>(
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally we would not need GeoWithin(GeoJsonGeometry...) overloads as we got GeoWithinGeometry class (and GeoWithin<> can expose implicit cast from GeoJsonGeometry operator). But for some reason implicit casting fails here.

@rstam @dgolub any creative ideas are welcome!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC generic type inference fails.

Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is my first pass at reviewing.

Overall this looks great.

I do think we need to be more careful about naming things in a way that doesn't risk name collisions now or in the future.

For example, you named a class SearchCountOptions to avoid colliding with the existing CountOptions class. This is good.

We should use the same naming convention for all our search related classes, both for consistent naming and to avoid future possible name collisions.

So for example, rename FuzzyOptions to SearchFuzzyOptions and HighlightOptions to SearchHighlightOptions and similarly for all our new classes.

I realize that all these new classes are in a .Search sub-namespace and thus are technically unambiguous in the face of name collisions, but only when prefixing them with full or partial namespaces. Let's just avoid potential collisions in the first place and adopt a consistent naming convention, even if it does make the type names slightly longer. Shorter is not always better when it creates ambiguity (either real ambiguity, as in a compile time error, or conceptual ambiguity such QueryDefinition vs SearchQueryDefinition, where the first doesn't sound like it has anything to do with Search and sounds more like it might apply to Find).

Once we've worked through these renaming suggestions I would like to review again in more detail.

Good work overall!

source,
PipelineStageDefinitionBuilder.Search(searchDefinition, highlight, indexName, count, returnStoredSource));
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need a SearchMeta here also?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch, added.

/// <summary>
/// The order in which to search for tokens in an autocomplete search definition.
/// </summary>
public enum AutocompleteTokenOrder
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to SearchAutocompleteTokenOrder

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// Fluent interface for compound search definitions.
/// </summary>
/// <typeparam name="TDocument">The type of the document.</typeparam>
public abstract class CompoundFluent<TDocument>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to CompoundSearchDefinitionFluent

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this an abstract class? Seems like this and the Impl class could just as well be a single class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.
I think the original intention was to split the interface and implementation for better maintainability. The reason for this not being an proper interface is the implicit conversion operator to SearchDefinition, for supporting expressions like Search(Builders.Search.Compound(...)).
But I agree that this can be single class, merged.

/// <summary>
/// Options for fuzzy search.
/// </summary>
public sealed class FuzzyOptions
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to SearchFuzzyOptions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

/// Options for highlighting.
/// </summary>
/// <typeparam name="TDocument">The type of the document.</typeparam>
public sealed class HighlightOptions<TDocument>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to SearchHighlightOptions

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

get { return __filter; }
}
/// <summary>Gets a <see cref="PathDefinition{TDocument}"/>.</summary>
public static PathDefinitionBuilder<TDocument> Path { get; } = new PathDefinitionBuilder<TDocument>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename property to SearchPath

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

get { return __index; }
}
/// <summary>Gets a <see cref="ScoreDefinitionBuilder{TDocument}"/>.</summary>
public static ScoreDefinitionBuilder<TDocument> Score { get; } = new ScoreDefinitionBuilder<TDocument>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename property to SearchScore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

get { return __projection; }
}
/// <summary>Gets a <see cref="ScoreFunctionBuilder{TDocument}"/>.</summary>
public static ScoreFunctionBuilder<TDocument> ScoreFunction { get; } = new ScoreFunctionBuilder<TDocument>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename property to SearchScoreFunction

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

get { return __update; }
}
/// <summary> Gets a <see cref="SpanDefinitionBuilder{TDocument}"/>.</summary>
public static SpanDefinitionBuilder<TDocument> Span { get; } = new SpanDefinitionBuilder<TDocument>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename property to SearchSpan

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

string indexName = null,
SearchCountOptions count = null,
bool returnStoredSource = false)
{
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wondering whether we have the best ordering of parameters here.

For example, if indexName is the most commonly used optional parameter, maybe it should be the first optional parameter.

I don't actually know what parameters are more likely to be used than others. I just want to make sure we get the optimal parameter ordering right the first time, because we can't change it later.

If we do change the order let's be sure to change it the same way everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question, I am not sure how to approach this.
My guess is that current ordering is a good guess, highlight would be used more often than indexName, etc. Also except for indexName this is the ordering in $search documentation.

Copy link
Contributor Author

@BorisDog BorisDog left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a big fun of longer names, especially as Search namespace makes this a non issue. In our case actual names are less important, as mostly Builders are expected to be used. Also most of the classes are internal.
I don't think this change will have noticeable impact on our API, so made this change.

/// Fluent interface for compound search definitions.
/// </summary>
/// <typeparam name="TDocument">The type of the document.</typeparam>
public abstract class CompoundFluent<TDocument>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.
I think the original intention was to split the interface and implementation for better maintainability. The reason for this not being an proper interface is the implicit conversion operator to SearchDefinition, for supporting expressions like Search(Builders.Search.Compound(...)).
But I agree that this can be single class, merged.

/// <summary>
/// Options for fuzzy search.
/// </summary>
public sealed class FuzzyOptions
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Renamed.

/// search within.
/// </summary>
/// <typeparam name="TCoordinates">The type of the coordinates.</typeparam>
public abstract class GeoWithin<TCoordinates> where TCoordinates : GeoJsonCoordinates
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just try to be as close as possible Atlas Search parameters namings.
Maybe GeoWithinGeoJson, or GeoWithinQuery?

/// Options for highlighting.
/// </summary>
/// <typeparam name="TDocument">The type of the document.</typeparam>
public sealed class HighlightOptions<TDocument>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// Base class for search paths.
/// </summary>
/// <typeparam name="TDocument">The type of the document.</typeparam>
public abstract class PathDefinition<TDocument>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{
// This constructor should always be called from a fluent interface that
// ensures that the parameters are not null and copies the lists, so there is
// no need to do any of that here.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, rephrased.

/// Base class for span clauses.
/// </summary>
/// <typeparam name="TDocument"></typeparam>
public abstract class SpanDefinition<TDocument>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// A builder for a span clause.
/// </summary>
/// <typeparam name="TDocument">The type of the document.</typeparam>
public sealed class SpanDefinitionBuilder<TDocument>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Term(query, new ExpressionFieldDefinition<TDocument>(path));
}

internal sealed class FirstSpanDefinition<TDocument> : SpanDefinition<TDocument>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

new UnaryScoreFunction<TDocument>("log1p", operand);
}

internal sealed class PathScoreFunction<TDocument> : ScoreFunction<TDocument>
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BorisDog BorisDog requested a review from rstam December 24, 2022 00:09
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Apologies that there are many comments.

Overall this looks great and these are just small comments to fine tune and polish the result.

private static ProjectionDefinitionBuilder<TDocument> __projection = new ProjectionDefinitionBuilder<TDocument>();
private static SortDefinitionBuilder<TDocument> __sort = new SortDefinitionBuilder<TDocument>();
private static UpdateDefinitionBuilder<TDocument> __update = new UpdateDefinitionBuilder<TDocument>();
/// <summary>Gets a <see cref="FilterDefinitionBuilder{TDocument}"/>.</summary>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think the extra nesting is helpful.

@@ -923,6 +924,53 @@ public static IMongoQueryable<TSource> Sample<TSource>(this IMongoQueryable<TSou
Expression.Constant(count)));
}

/// <summary>
/// Appends a $search stage to the LINQ pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add period at end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// Flag that specifies whether to perform a full document lookup on the backend database
/// or return only stored source fields directly from Atlas Search.
/// </param>
/// <returns>The fluent aggregate interface.</returns>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    /// <returns>The queryable with a new stage appended.</returns>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, done.

}

/// <summary>
/// Appends a $searchMeta stage to the LINQ pipeline
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

add period at end

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// <param name="searchDefinition">The search definition.</param>
/// <param name="indexName">The index name.</param>
/// <param name="count">The count options.</param>
/// <returns>The fluent aggregate interface.</returns>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

    /// <returns>The queryable with a new stage appended.</returns>

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{
public class MongoQueryableTests
{
private readonly IMongoQueryable<Person> _queryable;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realize you probably copied this technique from some other tests, but using a constructor to setup "part" of the test method state is an antipattern.

Better to use a helper method that can be called from each test method. Often not every test method needs the same setup, but even if they all do it's still better to call a helper method.

Suggested refactoring below:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I will consider this in the next round.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to switch from constructor setup to CreateSubject helper method.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

public void Search()
{
var query = _queryable
.Where(x => x.FirstName == "Alexandra")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the Where?

I don't see how it has anything to do with this test, and doesn't Search always have to be the first stage anyway?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

.ToString();

query.Should().EndWith("Aggregate([{ \"$match\" : { \"fn\" : \"Alexandra\" } }, { \"$search\" : { \"text\" : { \"query\" : \"Alex\", \"path\" : \"fn\" } } }])");
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would refactor this test and others in this file like this:

[Fact]
public void Search()
{
    var subject = CreateSubject();

    var query = subject
        .Search(Builders<Person>.Search.Text("Alex", x => x.FirstName));

    query.ToString().Should().EndWith("Aggregate([{ \"$search\" : { \"text\" : { \"query\" : \"Alex\", \"path\" : \"fn\" } } }])");
}
  • Call a helper method to create the subject rather than relying on the constructor to setup shared state
  • Put a blank line between the Arrange/Act/Assert sections
  • Remove the call to Where
  • Move the call to ToString from the Act section to the Assert section

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Helper method:

    private IMongoQueryable<Person> CreateSubject()
    {
        var client = DriverTestConfiguration.Linq3Client;
        var database = client.GetDatabase(DriverTestConfiguration.DatabaseNamespace.DatabaseName);
        var collection = database.GetCollection<Person>(DriverTestConfiguration.CollectionNamespace.CollectionName);
        return collection.AsQueryable();
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

7.5);

[Fact]
public void Autocomplete()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Standard practice is for each test method to test one thing (or variations of one thing if it's a [Theory].

However, if you want to buck that practice here for expediency I won't object.

@@ -172,6 +173,140 @@ public void UnionWith_should_throw_when_withCollection_is_null()
argumentNullException.ParamName.Should().Be("withCollection");
}

[Fact]
public void Search_should_add_expected_stage()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put in alphabetical order?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BorisDog BorisDog requested a review from rstam January 12, 2023 01:27
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can see the finish line!

/// search within.
/// </summary>
/// <typeparam name="TCoordinates">The type of the coordinates.</typeparam>
public abstract class GeoWithinArea<TCoordinates> where TCoordinates : GeoJsonCoordinates
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Don't forget to rename file to match.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{
/// <summary>
/// Base class for objects specifying GeoWithin query
/// search within.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/// Base class for area argument for GeoWithin queries.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

return new(_operatorType.ToCamelCase(), renderedArgs);
}

private protected virtual BsonDocument RenderOperator(IBsonSerializer<TDocument> documentSerializer, IBsonSerializerRegistry serializerRegistry) => new();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's rename this method RenderArguments because that's what it is actually rendering.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

/// One or more documents that Atlas Search uses to extract representative terms for.
/// </param>
/// <returns>A more like this search definition.</returns>
public SearchDefinition<TDocument> MoreLikeThis(IEnumerable<TDocument> like) =>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we think we should remove this overload.

Remember we are committing to a public API.

I may not fully understand how "more like this" is normally used, but my guess is that you hardly ever want to provide an ENTIRE TDocument as the template. Seems to me this is a form of "query by example" were you usually only provide a partial document which specifies what properties are relevant that you want more like.

@JamesKovacs ?

@@ -0,0 +1,139 @@
/* Copyright 2016-present MongoDB Inc.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use 2010-present on all new files.

You could check with James to see if it matters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

new GeoJson2DGeographicCoordinates(-152.446289, 22.065278),
new GeoJson2DGeographicCoordinates(-156.09375, 17.811456),
new GeoJson2DGeographicCoordinates(-161.323242, 22.512557)
})));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing blank line.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.


AssertRendered(subject.Single(x => x.FirstName), new BsonString("fn"));
AssertRendered(subject.Single("FirstName"), new BsonString("fn"));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missing blank line

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

var result = pipeline.Search(builder.Text("bar", "foo"));
var stages = RenderStages(result, BsonDocumentSerializer.Instance);
stages[0].Should().Be(
BsonDocument.Parse("{ $search: { text: { query: 'foo', path: 'bar' } } }"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Simplify to:

        var stages = RenderStages(result, BsonDocumentSerializer.Instance);
        stages[0].Should().Be("{ $search: { text: { query: 'bar', path: 'foo' } } }");

No need for BsonDocument.Parse

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for all other new tests immediately below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, I should have noticed this in other tests. Done.

public void Search_should_add_expected_stage()
{
var pipeline = new EmptyPipelineDefinition<BsonDocument>();
var builder = new SearchDefinitionBuilder<BsonDocument>();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line after Arrange section.

Same for other new tests below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

{
var pipeline = new EmptyPipelineDefinition<BsonDocument>();
var builder = new SearchDefinitionBuilder<BsonDocument>();
var result = pipeline.Search(builder.Text("bar", "foo"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a blank line after Act section.

Same for other new tests below.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BorisDog BorisDog requested a review from rstam January 13, 2023 23:44
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like we're only a few issues away from being done.

Besides the requested changes here we have the following design issues to discuss with @JamesKovacs

  • GeoShape parameter order
  • MoreLike with partial documents

double doubleValue => (BsonValue)doubleValue,
DateTime dateTimeValue => (BsonValue)dateTimeValue,
_ => throw new InvalidCastException()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a fan of redundant casts either. But they can make code easier to read and maintain when the implicit conversions are not immediately obvious, specially when the compiler has to apply two or more implicit conversions.

But do you know off your head what BsonValue type all those different sized integral values convert to?

I had to go look.

The documentation for Range says only Int32, Int64, double and DateTime are valid.

I feel a little explicit clarity would help. Now you're forcing me to go verify that all those implicit conversions are correct.

I'm particularly concerned about uint. Should be treated as an int or a long. What does an implicit conversion to BsonValue do? Who knows? There isn't even an implicit conversion from uint to BsonValue so there must be some intermediate implicit conversion the compiler is applying.

I will test to verify.

double doubleValue => (BsonValue)doubleValue,
DateTime dateTimeValue => (BsonValue)dateTimeValue,
_ => throw new InvalidCastException()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For example If we add operator BsonValue(byte value)

This isn't about how BsonValue wants to implicit convert... it's about what Range needs.

double doubleValue => (BsonValue)doubleValue,
DateTime dateTimeValue => (BsonValue)dateTimeValue,
_ => throw new InvalidCastException()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, I've experimentally confirmed that most of the implicit conversions fo integers do what is needed for Range:

byte => BsonInt32
sbyte => BsonInt32
short => BsonInt32
ushort => BsonInt32
int => BsonInt32
uint => BsonInt64
long => BsonInt64
ulong => BsonInt64

uint is questionable. It probably should be BsonInt32 to match how we serialize uint.

So you probably need at least that one cast if nothing else. If a cast to int feels redundant to you how about:

uint v => (BsonInt32)v,

short v => v,
ushort v => v,
int v => v,
uint v => v,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
uint v => v,
uint v => (BsonInt32)v,

{
{ "type", "string" },
{ "path", _path.Render(documentSerializer, serializerRegistry) },
{ "numBuckets", _numBuckets, _numBuckets != null }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

_numBuckets is Nullable

I'll have to investigate whether this works or not.

{
{ "type", "string" },
{ "path", _path.Render(documentSerializer, serializerRegistry) },
{ "numBuckets", _numBuckets, _numBuckets != null }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok good it works. Though in a bit of an odd way.

There is an implicit conversion from int? to BsonValue which checks that _numBuckets is not null (same as checking HasValue) before accessing the Value property.

The only odd thing is that when int? is null the implict conversion returns a BsonNull.Value which would be invalid for this field, but even though the implicit conversion happened, the numBuckets field is not added to the document because the condition is false.

In other words, when _numBuckets is null this is equivalent to:

{ "numBuckets", BsonNull.Value, false }

We've always used the lambda form before to avoid evaluation of numBuckets when it doesn't have a value, but interesting to learn that it works in a weird way (at least it doesn't throw an exception as I feared it might).

/// <summary>
/// A search count result set.
/// </summary>
public sealed class SearchMetaCountResult
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I don't like the idea of having classes that "look" invalid and only happen to work because of tricks the serializer performs.

@JamesKovacs ?

If James is OK with this I will let it go.

var subject = CreateSubject();

var query = subject
.Where(x => x.FirstName == "Alexandra")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the call to Where?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I thought I did, done.

public SearchDefinition<TDocument> GeoShape<TCoordinates>(
GeoJsonGeometry<TCoordinates> geometry,
SearchPathDefinition<TDocument> path,
GeoShapeRelation relation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need to edit your comment? There are multiple polygon parameters.

I see nothing wrong with adding more (possibly optional) parameters at the end.

The following looks fine to me:

GeoShape(point, GeoShapeRelation.Within, polygon, padding, threshold)

public SearchDefinition<TDocument> GeoShape<TCoordinates>(
GeoJsonGeometry<TCoordinates> geometry,
SearchPathDefinition<TDocument> path,
GeoShapeRelation relation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that in our own documentation the order is path, relation, geometry:

https://www.mongodb.com/docs/atlas/atlas-search/geoShape/

public SearchDefinition<TDocument> GeoShape<TCoordinates>(
GeoJsonGeometry<TCoordinates> geometry,
SearchPathDefinition<TDocument> path,
GeoShapeRelation relation,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the example from our documentation:

{
    "$search": {
       "index": <index name>, // optional, defaults to "default"
       "geoShape": {
          "path": "<field-to-search>",
          "relation": "contains | disjoint | intersects | within",
          "geometry": <GeoJSON-object>,
          "score": <score-options>
       }
}

@BorisDog BorisDog requested a review from rstam January 16, 2023 19:50
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Small changes requested. Let me know what you think of them.


return new("like", likeArray);
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this approach in general, but I think it is too restrictive. I could see myself creating a smaller POCO to use for query by example calls to MoreLike, so TDocument and BsonDocument are not the only two valid choices for the like values. For example, I might declare a PersonQBE class that is a subset of Person and use that with MoreByLike. If I have to use BsonDocument I could get lots of things wrong.

While it is true that allowing arbitrary TLike classes could be misused, the same is already true by allowing BsonDocument.

Would you consider simplifying this class to:

internal sealed class MoreLikeThisSearchDefinition<TDocument, TLike> : OperatorSearchDefinition<TDocument>
{
    private readonly TLike[] _like;

    public MoreLikeThisSearchDefinition(IEnumerable<TLike> like)
        : base(OperatorType.MoreLikeThis)
    {
        _like = Ensure.IsNotNull(like, nameof(like)).ToArray();
    }

    private protected override BsonDocument RenderArguments(IBsonSerializer<TDocument> documentSerializer, IBsonSerializerRegistry serializerRegistry)
    {
        var likeSerializer = serializerRegistry.GetSerializer<TLike>();
        var likeArray = new BsonArray(_like.Select(e => e.ToBsonDocument(likeSerializer)));
        return new("like", likeArray);
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point and good suggestion, done.

/// </param>
/// <returns>A more like this search definition.</returns>
public SearchDefinition<TDocument> MoreLikeThis(IEnumerable<TDocument> like) =>
new MoreLikeThisSearchDefinition<TDocument, TDocument>(like);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider the more flexible:

public SearchDefinition<TDocument> MoreLikeThis<TLike>(IEnumerable<TLike> like) =>
    new MoreLikeThisSearchDefinition<TDocument, TLike>(like);

and

public SearchDefinition<TDocument> MoreLikeThis<TLike>(params TLike[] like) =>
    MoreLikeThis((IEnumerable<TLike>)like);

which would make the BsonDocument overloads unnecessary and allow them to be removed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

@BorisDog BorisDog requested a review from rstam January 17, 2023 18:43
Copy link
Contributor

@rstam rstam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Minor comments.

Still pending are some questions for @JamesKovacs, specially the one about the order of the parameters for GeoSearch.

var t when t == typeof(BsonDocument) => null,
var t when t == typeof(TDocument) => (IBsonSerializer<TLike>)documentSerializer,
_ => serializerRegistry.GetSerializer<TLike>()
};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a really good way to compute the likeSerializer.

Nice!


export ATLAS_SEARCH_TESTS_ENABLED=true

powershell.exe .\\build.ps1 --target TestAtlasSearch
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a reason you don't want to use the more standard --target=TestAtlasSearch form?

That's what the Cake documentation uses in its examples:

https://cakebuild.net/docs/writing-builds/running-targets#passing-a-target-to-the-script-1

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No reason. Copied this from other Atlas tests scripts. Changed.

/// </param>
/// <returns>A more like this search definition.</returns>
public SearchDefinition<TDocument> MoreLikeThis<TLike>(params TLike[] like) =>
new MoreLikeThisSearchDefinition<TDocument, TLike>(like);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Normally we delegate to a single method for the implementation:

        MoreLikeThis((IEnumerable<TLike>)like);

to avoid code duplication.

In this case the code duplication is small so if you don't want to delegate I won't insist.

I don't think we want to establish a new pattern of duplicating code, even when the duplication is minor.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, changed.

/// Gets or sets the document field which returned a match.
/// </summary>
[BsonElement("path")]
public string Path { get; private set; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These classes that appear to be invalid because they are impossible to instantiate concern me.

The invisible hackery that makes this work (for deserialization ONLY) is basically inscrutable to most users.

@JamesKovacs you want to venture an opinion?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can use the init modifier (by introducing the internal IsExternalInit)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a property is marked init, can a user of our driver create an instance of the class? I know that we could do new SearchHighlight { Path = "foo" }, but could someone write this in their own test?

While you shouldn't mock what you don't own, I could see benefit of being able to create known return values for testing purposes by end users.

Expression<Func<TDocument, object>> path,
int? maxCharsToExamine = null,
int? maxNumPassages = null) :
this(new ExpressionFieldDefinition<TDocument>(path), maxCharsToExamine, maxNumPassages)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested formatting:

    public SearchHighlightOptions(
        Expression<Func<TDocument, object>> path,
        int? maxCharsToExamine = null,
        int? maxNumPassages = null)
            : this(new ExpressionFieldDefinition<TDocument>(path), maxCharsToExamine, maxNumPassages)
    {
    }

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

#region static

private static readonly GeoJsonPolygon<GeoJson2DGeographicCoordinates> __testPolygon =
new(new(new(new GeoJson2DGeographicCoordinates[]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sticking to this unreadable hyper-shortened form?

I'll LGTM as-is if you insist but this is not readable!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is more readable :) !
The idea is to skip the multi-line boilerplate code, which does not provide any useful into for regular tests maintenance. The only important bit of information are the coordinates themselves.
This way the whole tests class is simpler and more readable.

In the rare case the geo object needs to be edited, it's can be easily done with VS intellisense. Seems that mouse hovering or pressing F12 is a reasonable price to pay.

Copy link
Contributor

@JamesKovacs JamesKovacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See my previous inline comments. To summarize:

  1. GeoShape param order should be path, relation, geometry.
  2. Using init for SearchHighlight and related classes sounds like a good solution as long as users of our driver can instantiate instances of those classes.

@rstam
Copy link
Contributor

rstam commented Jan 17, 2023

If using init on older TFs poses a risk we can just use public and be safe.

Sure we'd rather the class be immutable, but if the user mutates it, it can't hurt us.

@JamesKovacs
Copy link
Contributor

Summary of my init investigations. TL;DR: We can't use init.

The init magic is done by the compiler. You can use it on earlier TFMs using the IsExternalInit trick however you must use a compiler compatible with init. And this is where we run into problems...

I defined a library Lib with an init property. TFM is netstandard2.0 and LangVersion is 9.0. I then created a console app App and referenced Lib. App is netcoreapp3.1 with LangVersion 8.0. I instantiated an instance of the init-only containing class from Lib. When I compiled, I got the following error:

error CS8400: Feature 'init-only setters' is not available in C# 8.0. Please use language version 9.0 or greater.

If you change the LangVersion of App to 9.0, then everything works. The problem is that if we use init in our driver we will require our users to upgrade their toolchain to C# 9 or later. If they’re still using netcoreapp3.1 or earlier, they might not be able to upgrade their toolchain.

The safest option is to make the setters public on SearchHighlight and related so that users can instantiate their own instances for testing purposes.

/// <param name="path">document field which returned a match.</param>
/// <param name="score">Score assigned to this result.</param>
/// <param name="texts">Objects containing the matching text and the surrounding text.</param>
public SearchHighlight(string path, double score, SearchHighlightText[] texts)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adopted the ctors approach. Thanks @rstam for making this work.

Copy link
Contributor

@JamesKovacs JamesKovacs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@BorisDog BorisDog merged commit 9189a58 into mongodb:master Jan 19, 2023
dnickless pushed a commit to dnickless/mongo-csharp-driver that referenced this pull request Aug 24, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants