Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement CustomRegex detector #950

Merged
merged 9 commits into from Dec 14, 2022

Conversation

mcastorina
Copy link
Collaborator

This detector uses webhooks for verification.

@mcastorina mcastorina marked this pull request as ready for review December 5, 2022 22:13
@mcastorina mcastorina requested a review from a team as a code owner December 5, 2022 22:13
@mcastorina mcastorina force-pushed the thog-849-custom-regex-webhook-detector branch from 4b99b42 to 7bd4245 Compare December 8, 2022 17:01
Copy link
Collaborator

@ahrav ahrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is gonna be huge win for the product. Nice job

type customRegex *custom_detectorspb.CustomRegex
// The maximum number of matches from one chunk. This const is used when
// permutating each regex match to protect the scanner from doing too much work
// for poorly defined regexps.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very useful comment 🙌

func ValidateVerifyEndpoint(endpoint string, unsafe bool) error {
if len(endpoint) == 0 {
return fmt.Errorf("no endpoint")
func NewWebhookCustomRegex(pb *custom_detectorspb.CustomRegex) (*customRegexWebhook, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

question: is this used anywhere? I don't see any reference to it. Also it looks like we are returning an unexported type from an exported function, is that on purpose? Or could we maybe make them both exported.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not used anywhere yet. The intention is to use it in the engine once the functionality is implemented here.

I did intentionally return an unexported type from the exported function to control initialization. The idea being if a variable exists as that type, the values must have been validated.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add some comments to make that clear for the function.

// productIndices produces a permutation of indices for each length. Example:
// productIndices(3, 2) -> [[0 0] [1 0] [2 0] [0 1] [1 1] [2 1]]. It returns
// a slice of length no larger than maxTotalMatches.
func productIndices(lengths ...int) [][]int {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggestion: If we do some precomputations up front we get a nice little speed up.

// Find the max length and total number of permutations.
	maxLength := 0
	t := 1
	for _, l := range lengths {
		t *= l
		if l > maxLength {
			maxLength = l
		}
	}
	result := make([][]int, 0, t)
	for r := range result {
		result[r] = make([]int, 0, maxLength)
	}

	for _, length := range lengths {
		var nextResult [][]int
		for i := 0; i < length; i++ {
			// Append index to all existing results.
			for _, curResult := range result {
				nextResult = append(nextResult, append(curResult, i))
				if len(nextResult) >= maxTotalMatches {
					return nextResult
				}
			}
		}
		result = nextResult
	}
	return result

Using:

func BenchmarkProductIndices(b *testing.B) {
	for i := 0; i < b.N; i++ {
		_ = productIndices(3, 2, 6)
	}
}

Screen Shot 2022-12-13 at 10 23 24 AM
Screen Shot 2022-12-13 at 10 23 34 AM

With just 2,3,2 as the input:
Screen Shot 2022-12-13 at 10 05 01 AM
Screen Shot 2022-12-13 at 10 27 28 AM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, that's way better!

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ended up implementing a different algorithm that's a bit faster and still passes the tests:

» go test -bench . ./pkg/custom_detectors
BenchmarkProductIndices-8         512116              2325 ns/op

» go test -bench . ./pkg/custom_detectors
BenchmarkProductIndices-8        1355827               890.8 ns/op

I think the 126 ns/op you were getting was because it was always returning an empty result.. (also explains the 1 alloc).

@mcastorina mcastorina merged commit 861ad05 into main Dec 14, 2022
@mcastorina mcastorina deleted the thog-849-custom-regex-webhook-detector branch December 14, 2022 16:26
javajawa added a commit to mewbotorg/mewbot that referenced this pull request Dec 26, 2022
Update trufflehog from 3.19.0 to 3.21.0.

- Bump github.com/xanzy/go-gitlab from 0.76.0 to 0.77.0 by @​dependabot in trufflesecurity/trufflehog#981
- Bump golang.org/x/crypto from 0.3.0 to 0.4.0 by @​dependabot in trufflesecurity/trufflehog#982
- Add configuration parsing and custom detectors to engine by @​mcastorina in trufflesecurity/trufflehog#968
- Add custom regex detector docs by @​mcastorina in trufflesecurity/trufflehog#983
- Remove custom log leveler by @​mcastorina in trufflesecurity/trufflehog#985
- Bump github.com/xanzy/go-gitlab from 0.74.0 to 0.76.0 by @​dependabot in trufflesecurity/trufflehog#934
- Bump github.com/bill-rich/disk-buffer-reader from v0.1.6 to v0.1.7 by @​bill-rich in trufflesecurity/trufflehog#970
- Bump go.mongodb.org/mongo-driver from 1.11.0 to 1.11.1 by @​dependabot in trufflesecurity/trufflehog#971
- Bump github.com/getsentry/sentry-go from 0.15.0 to 0.16.0 by @​dependabot in trufflesecurity/trufflehog#973
- [bug] - Handle error when scanning s3 bucket. by @​ahrav in trufflesecurity/trufflehog#969
- Bump github.com/go-git/go-git/v5 from 5.4.2 to 5.5.1 by @​dependabot in trufflesecurity/trufflehog#972
- Bump github.com/envoyproxy/protoc-gen-validate from 0.6.13 to 0.9.1 by @​dependabot in trufflesecurity/trufflehog#963
- Add more logging for git sources by @​mcastorina in trufflesecurity/trufflehog#974
- Add s3 object count to trace logs by @​bill-rich in trufflesecurity/trufflehog#975
- Implement CustomRegex detector by @​mcastorina in trufflesecurity/trufflehog#950
- Use Todoist's REST API v2 by @​goncalossilva in trufflesecurity/trufflehog#978
- Allow using a glob for include list. by @​ahrav in trufflesecurity/trufflehog#977
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants