Skip to content

Commit

Permalink
Merge #87606 #93158
Browse files Browse the repository at this point in the history
87606: sql: new heuristic-based completion engine r=rytaft,ZhouXing19 a=knz

This PR extends the server-side completion logic available under SHOW COMPLETIONS.

Fixes #37355.
Intended for use together with #86457.

The statement now returns 5 columns: completion, category, description, start, end.
This change is backward-compatible with the CLI code in previous versions, which was not checking the number of columns.

It works roughly as follows:

1. first the input is scanned into a token array.

2. then the token array is translated to a *sketch*: a string with the
   same length as the token array, where each character corresponds to an
   input token and an extra character indicating where the completion was
   requested.

3. then the completion rules are applied in sequence. Each rule
   inspects the sketch (and/or the token sequence) and decides whether
   to do nothing (skip to next rule), or to run some SQL which will
   return a batch of completion candidates.

   Each method operates exclusively on the token sequence, and does
   not require the overall SQL syntax to be valid or complete.

Also, the completion engine executes in a streaming fashion, so that
there is no need to accumulate all the completion candidates in RAM
before returning them to the client. This prevents excessive
memory usage server-side, and offers the client an option to cancel
the search once enough suggestions have been received.

Code organization:

- new package `sql/compengine`: the core completion engine, also
  where sketches are computed.

  Suggested reading: `api.go` in the new package.

- new package `sql/comprules`: the actual completion
  methods/heuristics, where sketches are mapped into SQL queries
  that provide the completion candidates.

  Suggested reading: the test cases under `testdata`.

Some more words about sketches:

For example, `SHOW COMPLETIONS AT OFFSET 10 FOR 'select myc from
sc.mytable;'` produces the sketch `"ii'ii.i;"` where `i` indicates an
identifier-like token and `'` indicates the cursor position.

The purpose of the sketch is to simplify the pattern matching
performed by the completion heuristics, and enables the application of
regular expressions on the token sequence.

For example, a heuristic to complete schema-qualified relations in
the current database, or a database-qualified relation, would use
the regexp `[^.;]i\.(['_]|i')`: an identifier not following a
period or a semicolon, followed by a period, followed by the
completion cursor (with the cursor either directly after the period
or on an identifier after the period).

Supersedes #45186.



93158: dev: fix bep temporary dir path r=rickystewart a=healthy-pod

Release note: None
Epic: none

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: healthy-pod <ahmad@cockroachlabs.com>
  • Loading branch information
3 people committed Dec 6, 2022
3 parents a62d449 + 28317c5 + 20aa7da commit a0c086b
Show file tree
Hide file tree
Showing 64 changed files with 3,869 additions and 323 deletions.
1 change: 1 addition & 0 deletions BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -119,6 +119,7 @@ exports_files([
# gazelle:exclude pkg/sql/lexbase/keywords.go
# gazelle:exclude pkg/sql/lexbase/tokens.go
# gazelle:exclude pkg/sql/lexbase/reserved_keywords.go
# gazelle:exclude pkg/sql/scanner/token_names_test.go
# gazelle:exclude pkg/sql/schemachanger/scexec/mocks_generated_test.go
# gazelle:exclude pkg/cmd/prereqs/testdata
# gazelle:exclude pkg/testutils/**/testdata/**
Expand Down
15 changes: 14 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -790,7 +790,8 @@ SQLPARSER_TARGETS = \
pkg/sql/parser/help_messages.go \
pkg/sql/lexbase/tokens.go \
pkg/sql/lexbase/keywords.go \
pkg/sql/lexbase/reserved_keywords.go
pkg/sql/lexbase/reserved_keywords.go \
pkg/sql/scanner/token_names_test.go

PROTOBUF_TARGETS := bin/.go_protobuf_sources bin/.gw_protobuf_sources

Expand Down Expand Up @@ -1507,6 +1508,18 @@ pkg/sql/parser/gen/sql.go.tmp: pkg/sql/parser/gen/sql-gen.y bin/.bootstrap
echo "$$ret"; exit 1; \
fi

pkg/sql/scanner/token_names_test.go: pkg/sql/parser/gen/sql.go.tmp
(echo "// Code generated by make. DO NOT EDIT."; \
echo "// GENERATED FILE DO NOT EDIT"; \
echo; \
echo "package scanner"; \
echo; \
echo "var tokenNames = map[int]string{"; \
grep '^const [A-Z][_A-Z0-9]* ' $^ | \
awk '{printf("%d: \"%s\",\n", $$4, $$2)}' && \
echo "}" )> $@.tmp || rm $@.tmp
mv -f $@.tmp $@

# The lex package needs to know about all tokens, because the encode
# functions and lexing predicates need to know about keywords, and
# keywords map to the token constants. Therefore, generate the
Expand Down
2 changes: 1 addition & 1 deletion dev
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ fi
set -euo pipefail

# Bump this counter to force rebuilding `dev` on all machines.
DEV_VERSION=63
DEV_VERSION=64

THIS_DIR=$(cd "$(dirname "$0")" && pwd)
BINARY_DIR=$THIS_DIR/bin/dev-versions
Expand Down
10 changes: 8 additions & 2 deletions pkg/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -345,14 +345,15 @@ ALL_TESTS = [
"//pkg/sql/colflow:colflow_disallowed_imports_test",
"//pkg/sql/colflow:colflow_test",
"//pkg/sql/colmem:colmem_test",
"//pkg/sql/compengine:compengine_test",
"//pkg/sql/comprules:comprules_test",
"//pkg/sql/contention/contentionutils:contentionutils_test",
"//pkg/sql/contention/txnidcache:txnidcache_test",
"//pkg/sql/contention:contention_test",
"//pkg/sql/contentionpb:contentionpb_test",
"//pkg/sql/copy:copy_test",
"//pkg/sql/covering:covering_test",
"//pkg/sql/decodeusername:decodeusername_test",
"//pkg/sql/delegate:delegate_test",
"//pkg/sql/descmetadata:descmetadata_test",
"//pkg/sql/distsql:distsql_test",
"//pkg/sql/doctor:doctor_test",
Expand Down Expand Up @@ -1486,6 +1487,10 @@ GO_TARGETS = [
"//pkg/sql/colflow:colflow_test",
"//pkg/sql/colmem:colmem",
"//pkg/sql/colmem:colmem_test",
"//pkg/sql/compengine:compengine",
"//pkg/sql/compengine:compengine_test",
"//pkg/sql/comprules:comprules",
"//pkg/sql/comprules:comprules_test",
"//pkg/sql/consistencychecker:consistencychecker",
"//pkg/sql/contention/contentionutils:contentionutils",
"//pkg/sql/contention/contentionutils:contentionutils_test",
Expand All @@ -1501,7 +1506,6 @@ GO_TARGETS = [
"//pkg/sql/decodeusername:decodeusername",
"//pkg/sql/decodeusername:decodeusername_test",
"//pkg/sql/delegate:delegate",
"//pkg/sql/delegate:delegate_test",
"//pkg/sql/descmetadata:descmetadata",
"//pkg/sql/descmetadata:descmetadata_test",
"//pkg/sql/distsql:distsql",
Expand Down Expand Up @@ -2694,6 +2698,8 @@ GET_X_DATA_TARGETS = [
"//pkg/sql/colflow:get_x_data",
"//pkg/sql/colflow/colrpc:get_x_data",
"//pkg/sql/colmem:get_x_data",
"//pkg/sql/compengine:get_x_data",
"//pkg/sql/comprules:get_x_data",
"//pkg/sql/consistencychecker:get_x_data",
"//pkg/sql/contention:get_x_data",
"//pkg/sql/contention/contentionutils:get_x_data",
Expand Down
5 changes: 5 additions & 0 deletions pkg/cli/clisqlshell/BUILD.bazel
Original file line number Diff line number Diff line change
Expand Up @@ -66,12 +66,17 @@ go_test(
"//pkg/cli/clicfg",
"//pkg/cli/clisqlclient",
"//pkg/cli/clisqlexec",
"//pkg/security/securityassets",
"//pkg/security/securitytest",
"//pkg/security/username",
"//pkg/server",
"//pkg/sql/lexbase",
"//pkg/sql/scanner",
"//pkg/testutils/serverutils",
"//pkg/util/leaktest",
"//pkg/util/log",
"@com_github_cockroachdb_datadriven//:datadriven",
"@com_github_knz_bubbline//computil",
"@com_github_stretchr_testify//assert",
],
)
Expand Down
161 changes: 149 additions & 12 deletions pkg/cli/clisqlshell/complete.go
Original file line number Diff line number Diff line change
Expand Up @@ -12,14 +12,67 @@ package clisqlshell

import (
"fmt"
"sort"
"strconv"
"strings"
"unicode/utf8"

"github.com/cockroachdb/cockroach/pkg/cli/clierror"
"github.com/knz/bubbline"
"github.com/knz/bubbline/computil"
"github.com/knz/bubbline/editline"
)

// completions is the interface between the shell and the bubbline
// completion infra.
type completions struct {
categories []string
compEntries map[string][]compCandidate
}

var _ bubbline.Completions = (*completions)(nil)

// NumCategories is part of the bubbline.Completions interface.
func (c *completions) NumCategories() int { return len(c.categories) }

// CategoryTitle is part of the bubbline.Completions interface.
func (c *completions) CategoryTitle(cIdx int) string { return c.categories[cIdx] }

// NumEntries is part of the bubbline.Completions interface.
func (c *completions) NumEntries(cIdx int) int { return len(c.compEntries[c.categories[cIdx]]) }

// Entry is part of the bubbline.Completions interface.
func (c *completions) Entry(cIdx, eIdx int) bubbline.Entry {
return &c.compEntries[c.categories[cIdx]][eIdx]
}

// Candidate is part of the bubbline.Completions interface.
func (c *completions) Candidate(e bubbline.Entry) bubbline.Candidate { return e.(*compCandidate) }

// compCandidate represents one completion candidate.
type compCandidate struct {
completion string
desc string
moveRight int
deleteLeft int
}

var _ bubbline.Entry = (*compCandidate)(nil)

// Title is part of the bubbline.Entry interface.
func (c *compCandidate) Title() string { return c.completion }

// Description is part of the bubbline.Entry interface.
func (c *compCandidate) Description() string { return c.desc }

// Replacement is part of the bubbline.Candidate interface.
func (c *compCandidate) Replacement() string { return c.completion }

// MoveRight is part of the bubbline.Candidate interface.
func (c *compCandidate) MoveRight() int { return c.moveRight }

// DeleteLeft is part of the bubbline.Candidate interface.
func (c *compCandidate) DeleteLeft() int { return c.deleteLeft }

// getCompletions implements the editline AutoComplete interface.
func (b *bubblineReader) getCompletions(
v [][]rune, line, col int,
Expand All @@ -29,11 +82,11 @@ func (b *bubblineReader) getCompletions(
return "", comps
}

sql, offset := computil.Flatten(v, line, col)
sql, boffset := computil.Flatten(v, line, col)

if col > 1 && v[line][col-1] == '?' && v[line][col-2] == '?' {
// This is a syntax check.
sql = strings.TrimSuffix(sql[:offset], "\n")
sql = strings.TrimSuffix(sql[:boffset], "\n")
helpText, err := b.sql.serverSideParse(sql)
if helpText != "" {
// We have a completion suggestion. Use that.
Expand All @@ -48,30 +101,76 @@ func (b *bubblineReader) getCompletions(
}

// TODO(knz): do not read all the rows - stop after a maximum.
rows, err := b.sql.runShowCompletions(sql, offset)
rows, err := b.sql.runShowCompletions(sql, boffset)
if err != nil {
var buf strings.Builder
clierror.OutputError(&buf, err, true /*showSeverity*/, false /*verbose*/)
msg = buf.String()
return msg, comps
}
// TODO(knz): Extend this logic once the advanced completion engine
// is finalized: https://github.com/cockroachdb/cockroach/pull/87606
candidates := make([]string, 0, len(rows))

compByCategory := make(map[string][]compCandidate)
roffset := runeOffset(v, line, col)
for _, row := range rows {
candidates = append(candidates, row[0])
c := compCandidate{completion: row[0]}
category := "completions"
if len(row) >= 5 {
// New-gen server-side completion engine.
category = row[1]
c.desc = row[2]
var err error
i, err := strconv.Atoi(row[3])
if err != nil {
var buf strings.Builder
clierror.OutputError(&buf, err, true /*showSeverity*/, false /*verbose*/)
msg = buf.String()
return msg, comps
}
j, err := strconv.Atoi(row[4])
if err != nil {
var buf strings.Builder
clierror.OutputError(&buf, err, true /*showSeverity*/, false /*verbose*/)
msg = buf.String()
return msg, comps
}
start := byteOffsetToRuneOffset(sql, roffset, boffset /* cursor */, i)
end := byteOffsetToRuneOffset(sql, roffset, boffset /* cursor */, j)
c.moveRight = end - roffset
c.deleteLeft = end - start
} else {
// Previous CockroachDB versions with only keyword completion.
// It does not return the start/end markers so we need to
// provide our own.
//
// TODO(knz): Delete this code when the previous completion code
// is not available any more.
_, start, end := computil.FindWord(v, line, col)
c.moveRight = end - roffset
c.deleteLeft = end - start
}
compByCategory[category] = append(compByCategory[category], c)
}

if len(candidates) > 0 {
_, wstart, wend := computil.FindWord(v, line, col)
comps = editline.SimpleWordsCompletion(candidates, "keywords", col, wstart, wend)
if len(compByCategory) == 0 {
return msg, comps
}

// TODO(knz): select an "interesting" category order,
// as recommended by andrei.
categories := make([]string, 0, len(compByCategory))
for k := range compByCategory {
categories = append(categories, k)
}
sort.Strings(categories)
comps = &completions{
categories: categories,
compEntries: compByCategory,
}
return msg, comps
}

// runeOffset converts the 2D rune cursor to a 1D offset from the
// start. The result can be used by the byteOffsetToRuneOffset
// start of the text. The result can be used by the byteOffsetToRuneOffset
// conversion function.
func runeOffset(v [][]rune, line, col int) int {
roffset := 0
Expand All @@ -92,3 +191,41 @@ func runeOffset(v [][]rune, line, col int) int {
}
return roffset
}

// byteOffsetToRuneOffset converts a byte offset into the SQL string,
// as produced by the SHOW COMPLETIONS statement, into a rune offset
// suitable for the bubble rune-based addressing. We use the cursor as
// a starting point to avoid scanning the SQL string from the beginning.
func byteOffsetToRuneOffset(sql string, runeCursor, byteCursor int, byteOffset int) int {
byteDistance := byteOffset - byteCursor
switch {
case byteDistance == 0:
return runeCursor

case byteDistance > 0:
// offset to the right of the cursor. Search forward.
result := runeCursor
s := sql[byteCursor:]
for i := range s {
if i >= byteDistance {
break
}
result++
}
return result

default:
// offset to the left of the cursor. Search backward.
result := runeCursor
s := sql[:byteCursor]
for {
if len(s) == 0 || len(s)-byteCursor <= byteDistance {
break
}
_, sz := utf8.DecodeLastRuneInString(s)
s = s[:len(s)-sz]
result--
}
return result
}
}

0 comments on commit a0c086b

Please sign in to comment.