diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 2813a1676..38e2f8d11 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -11,7 +11,7 @@ on: # `schedule` event. By specifying any permission explicitly all others are set # to none. By using the principle of least privilege the damage a compromised # workflow can do (because of an injection or compromised third party tool or -# action) is restricted. Currently the worklow doesn't need any additional +# action) is restricted. Currently, the workflow doesn't need any additional # permission except for pulling the code. Adding labels to issues, commenting # on pull-requests, etc. may need additional permissions: # diff --git a/CHANGELOG.md b/CHANGELOG.md index b5f31bec0..819a9b5df 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -25,7 +25,7 @@ The new word boundary assertions are: * `\<` or `\b{start}`: a Unicode start-of-word boundary (`\W|\A` on the left, `\w` on the right). * `\>` or `\b{end}`: a Unicode end-of-word boundary (`\w` on the left, `\W|\z` -on the right)). +on the right). * `\b{start-half}`: half of a Unicode start-of-word boundary (`\W|\A` on the left). * `\b{end-half}`: half of a Unicode end-of-word boundary (`\W|\z` on the @@ -139,7 +139,7 @@ Bug fixes: * [BUG #934](https://github.com/rust-lang/regex/issues/934): Fix a performance bug where high contention on a single regex led to massive -slow downs. +slow-downs. 1.9.4 (2023-08-26) @@ -382,14 +382,14 @@ New features: Permit many more characters to be escaped, even if they have no significance. More specifically, any ASCII character except for `[0-9A-Za-z<>]` can now be escaped. Also, a new routine, `is_escapeable_character`, has been added to -`regex-syntax` to query whether a character is escapeable or not. +`regex-syntax` to query whether a character is escapable or not. * [FEATURE #547](https://github.com/rust-lang/regex/issues/547): Add `Regex::captures_at`. This fills a hole in the API, but doesn't otherwise introduce any new expressive power. * [FEATURE #595](https://github.com/rust-lang/regex/issues/595): Capture group names are now Unicode-aware. They can now begin with either a `_` or any "alphabetic" codepoint. After the first codepoint, subsequent codepoints -can be any sequence of alpha-numeric codepoints, along with `_`, `.`, `[` and +can be any sequence of alphanumeric codepoints, along with `_`, `.`, `[` and `]`. Note that replacement syntax has not changed. * [FEATURE #810](https://github.com/rust-lang/regex/issues/810): Add `Match::is_empty` and `Match::len` APIs. @@ -433,7 +433,7 @@ Fix a number of issues with printing `Hir` values as regex patterns. * [BUG #610](https://github.com/rust-lang/regex/issues/610): Add explicit example of `foo|bar` in the regex syntax docs. * [BUG #625](https://github.com/rust-lang/regex/issues/625): -Clarify that `SetMatches::len` does not (regretably) refer to the number of +Clarify that `SetMatches::len` does not (regrettably) refer to the number of matches in the set. * [BUG #660](https://github.com/rust-lang/regex/issues/660): Clarify "verbose mode" in regex syntax documentation. @@ -820,7 +820,7 @@ Bug fixes: 1.3.1 (2019-09-04) ================== -This is a maintenance release with no changes in order to try to work-around +This is a maintenance release with no changes in order to try to work around a [docs.rs/Cargo issue](https://github.com/rust-lang/docs.rs/issues/400). @@ -855,7 +855,7 @@ This release does a bit of house cleaning. Namely: Rust project. * Teddy has been removed from the `regex` crate, and is now part of the `aho-corasick` crate. - [See `aho-corasick`'s new `packed` sub-module for details](https://docs.rs/aho-corasick/0.7.6/aho_corasick/packed/index.html). + [See `aho-corasick`'s new `packed` submodule for details](https://docs.rs/aho-corasick/0.7.6/aho_corasick/packed/index.html). * The `utf8-ranges` crate has been deprecated, with its functionality moving into the [`utf8` sub-module of `regex-syntax`](https://docs.rs/regex-syntax/0.6.11/regex_syntax/utf8/index.html). @@ -863,7 +863,7 @@ This release does a bit of house cleaning. Namely: little we need inside of `regex-syntax` itself. In general, this is part of an ongoing (long term) effort to make optimizations -in the regex engine easier to reason about. The current code is too convoluted +in the regex engine easier to reason about. The current code is too convoluted, and thus it is very easy to introduce new bugs. This simplification effort is the primary motivation behind re-working the `aho-corasick` crate to not only bundle algorithms like Teddy, but to also provide regex-like match semantics @@ -1065,7 +1065,7 @@ need or want to use these APIs. New features: * [FEATURE #493](https://github.com/rust-lang/regex/pull/493): - Add a few lower level APIs for amortizing allocation and more fine grained + Add a few lower level APIs for amortizing allocation and more fine-grained searching. Bug fixes: @@ -1111,7 +1111,7 @@ of the regex library should be able to migrate to 1.0 by simply bumping the version number. The important changes are as follows: * We adopt Rust 1.20 as the new minimum supported version of Rust for regex. - We also tentativley adopt a policy that permits bumping the minimum supported + We also tentatively adopt a policy that permits bumping the minimum supported version of Rust in minor version releases of regex, but no patch releases. That is, with respect to semver, we do not strictly consider bumping the minimum version of Rust to be a breaking change, but adopt a conservative @@ -1198,7 +1198,7 @@ Bug fixes: 0.2.8 (2018-03-12) ================== -Bug gixes: +Bug fixes: * [BUG #454](https://github.com/rust-lang/regex/pull/454): Fix a bug in the nest limit checker being too aggressive. @@ -1219,7 +1219,7 @@ New features: * Full support for intersection, difference and symmetric difference of character classes. These can be used via the `&&`, `--` and `~~` binary operators within classes. -* A Unicode Level 1 conformat implementation of `\p{..}` character classes. +* A Unicode Level 1 conformant implementation of `\p{..}` character classes. Things like `\p{scx:Hira}`, `\p{age:3.2}` or `\p{Changes_When_Casefolded}` now work. All property name and value aliases are supported, and properties are selected via loose matching. e.g., `\p{Greek}` is the same as @@ -1342,7 +1342,7 @@ Bug fixes: 0.2.1 ===== One major bug with `replace_all` has been fixed along with a couple of other -touchups. +touch-ups. * [BUG #312](https://github.com/rust-lang/regex/issues/312): Fix documentation for `NoExpand` to reference correct lifetime parameter. @@ -1491,7 +1491,7 @@ A number of bugs have been fixed: * Fix bug #277. * [PR #270](https://github.com/rust-lang/regex/pull/270): Fixes bugs #264, #268 and an unreported where the DFA cache size could be - drastically under estimated in some cases (leading to high unexpected memory + drastically underestimated in some cases (leading to high unexpected memory usage). 0.1.73 diff --git a/README.md b/README.md index f1e4c404a..dffb86e35 100644 --- a/README.md +++ b/README.md @@ -171,7 +171,7 @@ assert!(matches.matched(6)); ### Usage: regex internals as a library The [`regex-automata` directory](./regex-automata/) contains a crate that -exposes all of the internal matching engines used by the `regex` crate. The +exposes all the internal matching engines used by the `regex` crate. The idea is that the `regex` crate exposes a simple API for 99% of use cases, but `regex-automata` exposes oodles of customizable behaviors. @@ -192,7 +192,7 @@ recommended for general use. ### Crate features -This crate comes with several features that permit tweaking the trade off +This crate comes with several features that permit tweaking the trade-off between binary size, compilation time and runtime performance. Users of this crate can selectively disable Unicode tables, or choose from a variety of optimizations performed by this crate to disable. @@ -230,7 +230,7 @@ searches are "fast" in practice. While the first interpretation is pretty unambiguous, the second one remains nebulous. While nebulous, it guides this crate's architecture and the sorts of -the trade offs it makes. For example, here are some general architectural +the trade-offs it makes. For example, here are some general architectural statements that follow as a result of the goal to be "fast": * When given the choice between faster regex searches and faster _Rust compile diff --git a/UNICODE.md b/UNICODE.md index 60db0aad1..2b62567f1 100644 --- a/UNICODE.md +++ b/UNICODE.md @@ -207,21 +207,21 @@ Finally, Unicode word boundaries can be disabled, which will cause ASCII word boundaries to be used instead. That is, `\b` is a Unicode word boundary while `(?-u)\b` is an ASCII-only word boundary. This can occasionally be beneficial if performance is important, since the implementation of Unicode word -boundaries is currently sub-optimal on non-ASCII text. +boundaries is currently suboptimal on non-ASCII text. ## RL1.5 Simple Loose Matches [UTS#18 RL1.5](https://unicode.org/reports/tr18/#Simple_Loose_Matches) -The regex crate provides full support for case insensitive matching in +The regex crate provides full support for case-insensitive matching in accordance with RL1.5. That is, it uses the "simple" case folding mapping. The "simple" mapping was chosen because of a key convenient property: every "simple" mapping is a mapping from exactly one code point to exactly one other -code point. This makes case insensitive matching of character classes, for +code point. This makes case-insensitive matching of character classes, for example, straight-forward to implement. -When case insensitive mode is enabled (e.g., `(?i)[a]` is equivalent to `a|A`), +When case-insensitive mode is enabled (e.g., `(?i)[a]` is equivalent to `a|A`), then all characters classes are case folded as well. @@ -248,7 +248,7 @@ Given Rust's strong ties to UTF-8, the following guarantees are also provided: * All matches are reported on valid UTF-8 code unit boundaries. That is, any match range returned by the public regex API is guaranteed to successfully slice the string that was searched. -* By consequence of the above, it is impossible to match surrogode code points. +* By consequence of the above, it is impossible to match surrogate code points. No support for UTF-16 is provided, so this is never necessary. Note that when Unicode mode is disabled, the fundamental atom of matching is diff --git a/record/compile-test/README.md b/record/compile-test/README.md index 7291d5d37..1afd992f1 100644 --- a/record/compile-test/README.md +++ b/record/compile-test/README.md @@ -1,5 +1,5 @@ This directory contains the results of compilation tests. Specifically, -the results are from testing both the from scratch compilation time and +the results are from testing both the from-scratch compilation time and relative binary size increases of various features for both the `regex` and `regex-automata` crates. diff --git a/regex-automata/README.md b/regex-automata/README.md index c12b07012..cb6e86c9f 100644 --- a/regex-automata/README.md +++ b/regex-automata/README.md @@ -66,7 +66,7 @@ Below is an outline of how `unsafe` is used in this crate. * `util::pool::Pool` makes use of `unsafe` to implement a fast path for accessing an element of the pool. The fast path applies to the first thread -that uses the pool. In effect, the fast path is fast because it avoid a mutex +that uses the pool. In effect, the fast path is fast because it avoids a mutex lock. `unsafe` is also used in the no-std version of `Pool` to implement a spin lock for synchronization. * `util::lazy::Lazy` uses `unsafe` to implement a variant of @@ -112,6 +112,6 @@ In the end, I do still somewhat consider this crate an experiment. It is unclear whether the strong boundaries between components will be an impediment to ongoing development or not. De-coupling tends to lead to slower development in my experience, and when you mix in the added cost of not introducing -breaking changes all of the time, things can get quite complicated. But, I +breaking changes all the time, things can get quite complicated. But, I don't think anyone has ever release the internals of a regex engine as a library before. So it will be interesting to see how it plays out! diff --git a/regex-automata/src/dfa/automaton.rs b/regex-automata/src/dfa/automaton.rs index fcfcf2997..189700d83 100644 --- a/regex-automata/src/dfa/automaton.rs +++ b/regex-automata/src/dfa/automaton.rs @@ -2202,7 +2202,7 @@ where /// /// Specifically, this tries to succinctly distinguish the different types of /// states: dead states, quit states, accelerated states, start states and -/// match states. It even accounts for the possible overlappings of different +/// match states. It even accounts for the possible overlapping of different /// state types. pub(crate) fn fmt_state_indicator( f: &mut core::fmt::Formatter<'_>, diff --git a/regex-automata/src/dfa/dense.rs b/regex-automata/src/dfa/dense.rs index fd96bc878..0a384cf72 100644 --- a/regex-automata/src/dfa/dense.rs +++ b/regex-automata/src/dfa/dense.rs @@ -2810,7 +2810,7 @@ impl OwnedDFA { } // Collect all our non-DEAD start states into a convenient set and - // confirm there is no overlap with match states. In the classicl DFA + // confirm there is no overlap with match states. In the classical DFA // construction, start states can be match states. But because of // look-around, we delay all matches by a byte, which prevents start // states from being match states. @@ -3461,7 +3461,7 @@ impl TransitionTable> { // Normally, to get a fresh state identifier, we would just // take the index of the next state added to the transition // table. However, we actually perform an optimization here - // that premultiplies state IDs by the stride, such that they + // that pre-multiplies state IDs by the stride, such that they // point immediately at the beginning of their transitions in // the transition table. This avoids an extra multiplication // instruction for state lookup at search time. @@ -4509,7 +4509,7 @@ impl> MatchStates { + (self.pattern_ids().len() * PatternID::SIZE) } - /// Valides that the match state info is itself internally consistent and + /// Validates that the match state info is itself internally consistent and /// consistent with the recorded match state region in the given DFA. fn validate(&self, dfa: &DFA) -> Result<(), DeserializeError> { if self.len() != dfa.special.match_len(dfa.stride()) { @@ -4767,7 +4767,7 @@ impl<'a, T: AsRef<[u32]>> Iterator for StateIter<'a, T> { /// An immutable representation of a single DFA state. /// -/// `'a` correspondings to the lifetime of a DFA's transition table. +/// `'a` corresponding to the lifetime of a DFA's transition table. pub(crate) struct State<'a> { id: StateID, stride2: usize, diff --git a/regex-automata/src/dfa/determinize.rs b/regex-automata/src/dfa/determinize.rs index 19f99f5d6..7a49c2453 100644 --- a/regex-automata/src/dfa/determinize.rs +++ b/regex-automata/src/dfa/determinize.rs @@ -466,7 +466,7 @@ impl<'a> Runner<'a> { ) -> Result<(StateID, bool), BuildError> { // Compute the look-behind assertions that are true in this starting // configuration, and the determine the epsilon closure. While - // computing the epsilon closure, we only follow condiional epsilon + // computing the epsilon closure, we only follow conditional epsilon // transitions that satisfy the look-behind assertions in 'look_have'. let mut builder_matches = self.get_state_builder().into_matches(); util::determinize::set_lookbehind_from_start( diff --git a/regex-automata/src/dfa/mod.rs b/regex-automata/src/dfa/mod.rs index fd58cac23..b8c292c7b 100644 --- a/regex-automata/src/dfa/mod.rs +++ b/regex-automata/src/dfa/mod.rs @@ -271,7 +271,7 @@ memory.) Conversely, compiling the same regex without Unicode support, e.g., `(?-u)\w{50}`, takes under 1 millisecond and about 15KB of memory. For this reason, you should only use Unicode character classes if you absolutely need them! (They are enabled by default though.) -* This module does not support Unicode word boundaries. ASCII word bondaries +* This module does not support Unicode word boundaries. ASCII word boundaries may be used though by disabling Unicode or selectively doing so in the syntax, e.g., `(?-u:\b)`. There is also an option to [heuristically enable Unicode word boundaries](crate::dfa::dense::Config::unicode_word_boundary), diff --git a/regex-automata/src/dfa/onepass.rs b/regex-automata/src/dfa/onepass.rs index e62bbd383..c1a1f5c5b 100644 --- a/regex-automata/src/dfa/onepass.rs +++ b/regex-automata/src/dfa/onepass.rs @@ -926,7 +926,7 @@ impl<'a> InternalBuilder<'a> { /// /// A one-pass DFA can be built from an NFA that is one-pass. An NFA is /// one-pass when there is never any ambiguity about how to continue a search. -/// For example, `a*a` is not one-pass becuase during a search, it's not +/// For example, `a*a` is not one-pass because during a search, it's not /// possible to know whether to continue matching the `a*` or to move on to /// the single `a`. However, `a*b` is one-pass, because for every byte in the /// input, it's always clear when to move on from `a*` to `b`. diff --git a/regex-automata/src/dfa/special.rs b/regex-automata/src/dfa/special.rs index a831df5c5..197323116 100644 --- a/regex-automata/src/dfa/special.rs +++ b/regex-automata/src/dfa/special.rs @@ -43,7 +43,7 @@ macro_rules! err { // some other match state, even when searching an empty string.) // // These are not mutually exclusive categories. Namely, the following -// overlappings can occur: +// overlapping can occur: // // * {dead, start} - If a DFA can never lead to a match and it is minimized, // then it will typically compile to something where all starting IDs point @@ -62,7 +62,7 @@ macro_rules! err { // though from the perspective of the DFA, they are equivalent. (Indeed, // minimization special cases them to ensure they don't get merged.) The // purpose of keeping them distinct is to use the quit state as a sentinel to -// distguish between whether a search finished successfully without finding +// distinguish between whether a search finished successfully without finding // anything or whether it gave up before finishing. // // So the main problem we want to solve here is the *fast* detection of whether diff --git a/regex-automata/src/hybrid/dfa.rs b/regex-automata/src/hybrid/dfa.rs index bd9179b19..92956911f 100644 --- a/regex-automata/src/hybrid/dfa.rs +++ b/regex-automata/src/hybrid/dfa.rs @@ -1247,7 +1247,7 @@ impl DFA { /// the unknown transition. Otherwise, trying to use the "unknown" state /// ID will just result in transitioning back to itself, and thus never /// terminating. (This is technically a special exemption to the state ID - /// validity rules, but is permissible since this routine is guarateed to + /// validity rules, but is permissible since this routine is guaranteed to /// never mutate the given `cache`, and thus the identifier is guaranteed /// to remain valid.) /// @@ -1371,7 +1371,7 @@ impl DFA { /// the unknown transition. Otherwise, trying to use the "unknown" state /// ID will just result in transitioning back to itself, and thus never /// terminating. (This is technically a special exemption to the state ID - /// validity rules, but is permissible since this routine is guarateed to + /// validity rules, but is permissible since this routine is guaranteed to /// never mutate the given `cache`, and thus the identifier is guaranteed /// to remain valid.) /// @@ -1857,7 +1857,7 @@ pub struct Cache { bytes_searched: usize, /// The progress of the current search. /// - /// This is only non-`None` when callers utlize the `Cache::search_start`, + /// This is only non-`None` when callers utilize the `Cache::search_start`, /// `Cache::search_update` and `Cache::search_finish` APIs. /// /// The purpose of recording search progress is to be able to make a diff --git a/regex-automata/src/hybrid/id.rs b/regex-automata/src/hybrid/id.rs index 662e3c98f..43d5b5ba0 100644 --- a/regex-automata/src/hybrid/id.rs +++ b/regex-automata/src/hybrid/id.rs @@ -30,7 +30,7 @@ /// setting for start states to be tagged. The reason for this is /// that a DFA search loop is usually written to execute a prefilter once it /// enters a start state. But if there is no prefilter, this handling can be -/// quite diastrous as the DFA may ping-pong between the special handling code +/// quite disastrous as the DFA may ping-pong between the special handling code /// and a possible optimized hot path for handling untagged states. When start /// states aren't specialized, then they are untagged and remain in the hot /// path. diff --git a/regex-automata/src/meta/wrappers.rs b/regex-automata/src/meta/wrappers.rs index 6cb19ba0d..95d0e07b1 100644 --- a/regex-automata/src/meta/wrappers.rs +++ b/regex-automata/src/meta/wrappers.rs @@ -383,7 +383,7 @@ impl OnePassEngine { // that we either have at least one explicit capturing group or // there's a Unicode word boundary somewhere. If we don't have // either of these things, then the lazy DFA will almost certainly - // be useable and be much faster. The only case where it might + // be usable and be much faster. The only case where it might // not is if the lazy DFA isn't utilizing its cache effectively, // but in those cases, the underlying regex is almost certainly // not one-pass or is too big to fit within the current one-pass @@ -886,7 +886,7 @@ impl DFAEngine { // Enabling this is necessary for ensuring we can service any // kind of 'Input' search without error. For the full DFA, this // can be quite costly. But since we have such a small bound - // on the size of the DFA, in practice, any multl-regexes are + // on the size of the DFA, in practice, any multi-regexes are // probably going to blow the limit anyway. .starts_for_each_pattern(true) .byte_classes(info.config().get_byte_classes()) diff --git a/regex-automata/src/nfa/mod.rs b/regex-automata/src/nfa/mod.rs index 0c36f598a..d52f163c7 100644 --- a/regex-automata/src/nfa/mod.rs +++ b/regex-automata/src/nfa/mod.rs @@ -13,7 +13,7 @@ dense representations use more memory, but are faster to traverse. (Sometimes these lines are blurred. For example, an `NFA` might choose to represent a particular state in a dense fashion, and a DFA can be built using a sparse representation via [`sparse::DFA`](crate::dfa::sparse::DFA). -* NFAs have espilon transitions and DFAs don't. In practice, this means that +* NFAs have epsilon transitions and DFAs don't. In practice, this means that handling a single byte in a haystack with an NFA at search time may require visiting multiple NFA states. In a DFA, each byte only requires visiting a single state. Stated differently, NFAs require a variable number of CPU diff --git a/regex-automata/src/nfa/thompson/compiler.rs b/regex-automata/src/nfa/thompson/compiler.rs index 2d2172957..8ccea089d 100644 --- a/regex-automata/src/nfa/thompson/compiler.rs +++ b/regex-automata/src/nfa/thompson/compiler.rs @@ -1039,7 +1039,7 @@ impl Compiler { /// Compile an alternation of the given HIR values. /// /// This is like 'c_alt_iter', but it accepts a slice of HIR values instead - /// of an iterator of compiled NFA subgraphs. The point of accepting a + /// of an iterator of compiled NFA sub-graphs. The point of accepting a /// slice here is that it opens up some optimization opportunities. For /// example, if all of the HIR values are literals, then this routine might /// re-shuffle them to make NFA epsilon closures substantially faster. @@ -1498,7 +1498,7 @@ impl Compiler { /// /// A more comprehensive compression scheme can be accomplished by using /// a range trie to efficiently sort a reverse sequence of UTF-8 byte - /// rqanges, and then use Daciuk's algorithm via `Utf8Compiler`. + /// ranges, and then use Daciuk's algorithm via `Utf8Compiler`. /// /// This is the technique used when "NFA shrinking" is disabled. /// @@ -1700,7 +1700,7 @@ pub(crate) struct ThompsonRef { pub(crate) end: StateID, } -/// A UTF-8 compiler based on Daciuk's algorithm for compilining minimal DFAs +/// A UTF-8 compiler based on Daciuk's algorithm for compiling minimal DFAs /// from a lexicographically sorted sequence of strings in linear time. /// /// The trick here is that any Unicode codepoint range can be converted to diff --git a/regex-automata/src/nfa/thompson/error.rs b/regex-automata/src/nfa/thompson/error.rs index 3c2fa8a21..e29006586 100644 --- a/regex-automata/src/nfa/thompson/error.rs +++ b/regex-automata/src/nfa/thompson/error.rs @@ -13,7 +13,7 @@ use crate::util::{ /// method via the `std::error::Error` trait. This error only occurs when using /// convenience routines for building an NFA directly from a pattern string. /// -/// Otherwise, errors typically occur when a limit has been breeched. For +/// Otherwise, errors typically occur when a limit has been breached. For /// example, if the total heap usage of the compiled NFA exceeds the limit /// set by [`Config::nfa_size_limit`](crate::nfa::thompson::Config), then /// building the NFA will fail. diff --git a/regex-automata/src/nfa/thompson/mod.rs b/regex-automata/src/nfa/thompson/mod.rs index cf426736d..dc7effef1 100644 --- a/regex-automata/src/nfa/thompson/mod.rs +++ b/regex-automata/src/nfa/thompson/mod.rs @@ -32,7 +32,7 @@ It is perhaps worth expanding a bit more on what it means to go through the Crucially, the size and amount of work done in this step is proportional to the size of the original string. No optimization or Unicode handling is done at this point. This means that parsing into an `Ast` has very predictable costs. -Moreover, an `Ast` can be roundtripped back to its original pattern string as +Moreover, an `Ast` can be round-tripped back to its original pattern string as written. * Translating an `Ast` into an `Hir` is a process by which the structured representation is simplified down to its most fundamental components. diff --git a/regex-automata/src/nfa/thompson/nfa.rs b/regex-automata/src/nfa/thompson/nfa.rs index 1f57f8ebd..faee161fd 100644 --- a/regex-automata/src/nfa/thompson/nfa.rs +++ b/regex-automata/src/nfa/thompson/nfa.rs @@ -679,7 +679,7 @@ impl NFA { /// use regex_automata::{nfa::thompson::NFA, PatternID}; /// /// let nfa = NFA::new(r"(a)(?Pb)(c)(d)(?Pe)")?; - /// // The first is the implicit group that is always unnammed. The next + /// // The first is the implicit group that is always unnamed. The next /// // 5 groups are the explicit groups found in the concrete syntax above. /// let expected = vec![None, None, Some("foo"), None, None, Some("bar")]; /// let got: Vec> = @@ -1302,7 +1302,7 @@ impl Inner { // I left the 'Dense' state type in place in case we // want to revisit this, but I suspect the real way // to make forward progress is a more fundamental - // rearchitecting of how data in the NFA is laid out. + // re-architecting of how data in the NFA is laid out. // I think we should consider a single contiguous // allocation instead of all this indirection and // potential heap allocations for every state. But this diff --git a/regex-automata/src/nfa/thompson/pikevm.rs b/regex-automata/src/nfa/thompson/pikevm.rs index 0128c151a..5df367259 100644 --- a/regex-automata/src/nfa/thompson/pikevm.rs +++ b/regex-automata/src/nfa/thompson/pikevm.rs @@ -341,7 +341,7 @@ impl Builder { /// /// The `PikeVM` is generally the most "powerful" regex engine in this crate. /// "Powerful" in this context means that it can handle any regular expression -/// that is parseable by `regex-syntax` and any size haystack. Regretably, +/// that is parseable by `regex-syntax` and any size haystack. Regrettably, /// the `PikeVM` is also simultaneously often the _slowest_ regex engine in /// practice. This results in an annoying situation where one generally tries /// to pick any other regex engine (or perhaps none at all) before being diff --git a/regex-automata/src/util/alphabet.rs b/regex-automata/src/util/alphabet.rs index 22b5a7644..bc0de914e 100644 --- a/regex-automata/src/util/alphabet.rs +++ b/regex-automata/src/util/alphabet.rs @@ -83,7 +83,7 @@ enum UnitKind { /// Represents a byte value, or more typically, an equivalence class /// represented as a byte value. U8(u8), - /// Represents the "end of input" sentinel. We regretably use a `u16` + /// Represents the "end of input" sentinel. We regrettably use a `u16` /// here since the maximum sentinel value is `256`. Thankfully, we don't /// actually store a `Unit` anywhere, so this extra space shouldn't be too /// bad. diff --git a/regex-automata/src/util/captures.rs b/regex-automata/src/util/captures.rs index 05db6a993..8e9a6ea97 100644 --- a/regex-automata/src/util/captures.rs +++ b/regex-automata/src/util/captures.rs @@ -700,13 +700,13 @@ impl Captures { /// let replacement = "year=$year, month=$month, day=$day"; /// /// // This matches the first pattern. - /// let hay = "On 14-03-2010, I became a Tenneessee lamb."; + /// let hay = "On 14-03-2010, I became a Tennessee lamb."; /// re.captures(&mut cache, hay, &mut caps); /// let result = caps.interpolate_string(hay, replacement); /// assert_eq!("year=2010, month=03, day=14", result); /// /// // And this matches the second pattern. - /// let hay = "On 2010-03-14, I became a Tenneessee lamb."; + /// let hay = "On 2010-03-14, I became a Tennessee lamb."; /// re.captures(&mut cache, hay, &mut caps); /// let result = caps.interpolate_string(hay, replacement); /// assert_eq!("year=2010, month=03, day=14", result); @@ -748,14 +748,14 @@ impl Captures { /// let replacement = "year=$year, month=$month, day=$day"; /// /// // This matches the first pattern. - /// let hay = "On 14-03-2010, I became a Tenneessee lamb."; + /// let hay = "On 14-03-2010, I became a Tennessee lamb."; /// re.captures(&mut cache, hay, &mut caps); /// let mut dst = String::new(); /// caps.interpolate_string_into(hay, replacement, &mut dst); /// assert_eq!("year=2010, month=03, day=14", dst); /// /// // And this matches the second pattern. - /// let hay = "On 2010-03-14, I became a Tenneessee lamb."; + /// let hay = "On 2010-03-14, I became a Tennessee lamb."; /// re.captures(&mut cache, hay, &mut caps); /// let mut dst = String::new(); /// caps.interpolate_string_into(hay, replacement, &mut dst); @@ -808,13 +808,13 @@ impl Captures { /// let replacement = b"year=$year, month=$month, day=$day"; /// /// // This matches the first pattern. - /// let hay = b"On 14-03-2010, I became a Tenneessee lamb."; + /// let hay = b"On 14-03-2010, I became a Tennessee lamb."; /// re.captures(&mut cache, hay, &mut caps); /// let result = caps.interpolate_bytes(hay, replacement); /// assert_eq!(&b"year=2010, month=03, day=14"[..], result); /// /// // And this matches the second pattern. - /// let hay = b"On 2010-03-14, I became a Tenneessee lamb."; + /// let hay = b"On 2010-03-14, I became a Tennessee lamb."; /// re.captures(&mut cache, hay, &mut caps); /// let result = caps.interpolate_bytes(hay, replacement); /// assert_eq!(&b"year=2010, month=03, day=14"[..], result); @@ -856,14 +856,14 @@ impl Captures { /// let replacement = b"year=$year, month=$month, day=$day"; /// /// // This matches the first pattern. - /// let hay = b"On 14-03-2010, I became a Tenneessee lamb."; + /// let hay = b"On 14-03-2010, I became a Tennessee lamb."; /// re.captures(&mut cache, hay, &mut caps); /// let mut dst = vec![]; /// caps.interpolate_bytes_into(hay, replacement, &mut dst); /// assert_eq!(&b"year=2010, month=03, day=14"[..], dst); /// /// // And this matches the second pattern. - /// let hay = b"On 2010-03-14, I became a Tenneessee lamb."; + /// let hay = b"On 2010-03-14, I became a Tennessee lamb."; /// re.captures(&mut cache, hay, &mut caps); /// let mut dst = vec![]; /// caps.interpolate_bytes_into(hay, replacement, &mut dst); @@ -918,7 +918,7 @@ impl Captures { /// let mut cache = re.create_cache(); /// let mut caps = re.create_captures(); /// - /// let hay = "On 2010-03-14, I became a Tenneessee lamb."; + /// let hay = "On 2010-03-14, I became a Tennessee lamb."; /// re.captures(&mut cache, hay, &mut caps); /// assert!(caps.is_match()); /// let (full, [year, month, day]) = caps.extract(hay); @@ -974,7 +974,7 @@ impl Captures { /// let mut cache = re.create_cache(); /// let mut caps = re.create_captures(); /// - /// let hay = b"On 2010-03-14, I became a Tenneessee lamb."; + /// let hay = b"On 2010-03-14, I became a Tennessee lamb."; /// re.captures(&mut cache, hay, &mut caps); /// assert!(caps.is_match()); /// let (full, [year, month, day]) = caps.extract_bytes(hay); @@ -1751,7 +1751,7 @@ impl GroupInfo { /// use regex_automata::{nfa::thompson::NFA, PatternID}; /// /// let nfa = NFA::new(r"(a)(?Pb)(c)(d)(?Pe)")?; - /// // The first is the implicit group that is always unnammed. The next + /// // The first is the implicit group that is always unnamed. The next /// // 5 groups are the explicit groups found in the concrete syntax above. /// let expected = vec![None, None, Some("foo"), None, None, Some("bar")]; /// let got: Vec> = diff --git a/regex-automata/src/util/determinize/mod.rs b/regex-automata/src/util/determinize/mod.rs index ba32991d0..eab0107c9 100644 --- a/regex-automata/src/util/determinize/mod.rs +++ b/regex-automata/src/util/determinize/mod.rs @@ -269,7 +269,7 @@ pub(crate) fn next( // guarantee here, but it's subtle. In particular, a Thompson // NFA guarantees that each pattern has exactly one match // state. Moreover, since we're iterating over the NFA state - // IDs in a set, we are guarateed not to have any duplicative + // IDs in a set, we are guaranteed not to have any duplicative // match states. Thus, it is impossible to add the same pattern // ID more than once. // diff --git a/regex-automata/src/util/determinize/state.rs b/regex-automata/src/util/determinize/state.rs index effa6f44d..6d4f1cd37 100644 --- a/regex-automata/src/util/determinize/state.rs +++ b/regex-automata/src/util/determinize/state.rs @@ -772,7 +772,7 @@ fn write_varu32(data: &mut Vec, mut n: u32) { /// /// https://developers.google.com/protocol-buffers/docs/encoding#varints fn read_varu32(data: &[u8]) -> (u32, usize) { - // N.B. We can assume correctness here since we know that all varuints are + // N.B. We can assume correctness here since we know that all var-u32 are // written with write_varu32. Hence, the 'as' uses and unchecked arithmetic // is all okay. let mut n: u32 = 0; diff --git a/regex-automata/src/util/lazy.rs b/regex-automata/src/util/lazy.rs index 0d0b4fb2a..c5903381e 100644 --- a/regex-automata/src/util/lazy.rs +++ b/regex-automata/src/util/lazy.rs @@ -122,7 +122,7 @@ mod lazy { create: F, // This indicates to the compiler that this type can drop T. It's not // totally clear how the absence of this marker could lead to trouble, - // but putting here doesn't have any downsides so we hedge until somone + // but putting here doesn't have any downsides so we hedge until someone // can from the Unsafe Working Group can tell us definitively that we // don't need it. // diff --git a/regex-automata/src/util/look.rs b/regex-automata/src/util/look.rs index 73e51c0f6..9e11ef4bc 100644 --- a/regex-automata/src/util/look.rs +++ b/regex-automata/src/util/look.rs @@ -734,8 +734,8 @@ impl LookMatcher { haystack: &[u8], at: usize, ) -> bool { - // This used to luse LookSet::iter with Look::matches on each element, - // but that proved to be quite diastrous for perf. The manual "if + // This used to use LookSet::iter with Look::matches on each element, + // but that proved to be quite disastrous for perf. The manual "if // the set has this assertion, check it" turns out to be quite a bit // faster. if set.contains(Look::Start) { @@ -1060,7 +1060,7 @@ impl LookMatcher { // try and detect this in is_word_char::{fwd,rev}, but it's not clear // if it's worth it. \B is, after all, rarely used. Even worse, // is_word_char::{fwd,rev} could do its own UTF-8 decoding, and so this - // will wind up doing UTF-8 decoding twice. Owch. We could fix this + // will wind up doing UTF-8 decoding twice. Ouch. We could fix this // with more code complexity, but it just doesn't feel worth it for \B. // // And in particular, we do *not* have to do this with \b, because \b diff --git a/regex-automata/src/util/pool.rs b/regex-automata/src/util/pool.rs index d90d4ecff..698844ecc 100644 --- a/regex-automata/src/util/pool.rs +++ b/regex-automata/src/util/pool.rs @@ -947,7 +947,7 @@ mod inner { } } - /// A spin-lock based mutex. Yes, I have read spinlocks cosnidered + /// A spin-lock based mutex. Yes, I have read spinlocks considered /// harmful[1], and if there's a reasonable alternative choice, I'll /// happily take it. /// diff --git a/regex-automata/src/util/prefilter/aho_corasick.rs b/regex-automata/src/util/prefilter/aho_corasick.rs index 50cce827e..31d5572f8 100644 --- a/regex-automata/src/util/prefilter/aho_corasick.rs +++ b/regex-automata/src/util/prefilter/aho_corasick.rs @@ -39,7 +39,7 @@ impl AhoCorasick { }; // This is kind of just an arbitrary number, but basically, if we // have a small enough set of literals, then we try to use the VERY - // memory hungry DFA. Otherwise, we whimp out and use an NFA. The + // memory hungry DFA. Otherwise, we wimp out and use an NFA. The // upshot is that the NFA is quite lean and decently fast. Faster // than a naive Aho-Corasick NFA anyway. let ac_kind = if needles.len() <= 500 { diff --git a/regex-automata/src/util/search.rs b/regex-automata/src/util/search.rs index 39aec522b..90f83f8e4 100644 --- a/regex-automata/src/util/search.rs +++ b/regex-automata/src/util/search.rs @@ -14,7 +14,7 @@ use crate::util::{escape::DebugByte, primitives::PatternID, utf8}; /// /// It turns out that regex searches have a few parameters, and in most cases, /// those parameters have defaults that work in the vast majority of cases. -/// This `Input` type exists to make that common case seamnless while also +/// This `Input` type exists to make that common case seamless while also /// providing an avenue for changing the parameters of a search. In particular, /// this type enables doing so without a combinatorial explosion of different /// methods and/or superfluous parameters in the common cases. @@ -1341,7 +1341,7 @@ impl core::fmt::Display for PatternSetInsertError { write!( f, "failed to insert pattern ID {} into pattern set \ - with insufficiet capacity of {}", + with insufficient capacity of {}", self.attempted.as_usize(), self.capacity, ) diff --git a/regex-automata/src/util/sparse_set.rs b/regex-automata/src/util/sparse_set.rs index cbaa0b6f4..9d9930a4a 100644 --- a/regex-automata/src/util/sparse_set.rs +++ b/regex-automata/src/util/sparse_set.rs @@ -19,7 +19,7 @@ use alloc::{vec, vec::Vec}; use crate::util::primitives::StateID; -/// A pairse of sparse sets. +/// A pair of sparse sets. /// /// This is useful when one needs to compute NFA epsilon closures from a /// previous set of states derived from an epsilon closure. One set can be the @@ -85,7 +85,7 @@ impl SparseSets { /// /// The data structure is based on: https://research.swtch.com/sparse /// Note though that we don't actually use uninitialized memory. We generally -/// reuse sparse sets, so the initial allocation cost is bareable. However, its +/// reuse sparse sets, so the initial allocation cost is bearable. However, its /// other properties listed above are extremely useful. #[derive(Clone)] pub(crate) struct SparseSet { @@ -129,7 +129,7 @@ impl SparseSet { pub(crate) fn resize(&mut self, new_capacity: usize) { assert!( new_capacity <= StateID::LIMIT, - "sparse set capacity cannot excced {:?}", + "sparse set capacity cannot exceed {:?}", StateID::LIMIT ); self.clear(); diff --git a/regex-automata/src/util/start.rs b/regex-automata/src/util/start.rs index 27153780e..a26bb1a89 100644 --- a/regex-automata/src/util/start.rs +++ b/regex-automata/src/util/start.rs @@ -12,7 +12,7 @@ use crate::util::{ /// /// A DFA has a single starting state in the typical textbook description. That /// is, it corresponds to the set of all starting states for the NFA that built -/// it, along with their espsilon closures. In this crate, however, DFAs have +/// it, along with their epsilon closures. In this crate, however, DFAs have /// many possible start states due to a few factors: /// /// * DFAs support the ability to run either anchored or unanchored searches. diff --git a/regex-automata/src/util/syntax.rs b/regex-automata/src/util/syntax.rs index 78e3cf9a1..3be07bc80 100644 --- a/regex-automata/src/util/syntax.rs +++ b/regex-automata/src/util/syntax.rs @@ -280,7 +280,7 @@ impl Config { /// Enable verbose mode in the regular expression. /// - /// When enabled, verbose mode permits insigificant whitespace in many + /// When enabled, verbose mode permits insignificant whitespace in many /// places in the regular expression, as well as comments. Comments are /// started using `#` and continue until the end of the line. /// diff --git a/regex-automata/tests/gen/README.md b/regex-automata/tests/gen/README.md index 59439a11f..4b7ac1bc9 100644 --- a/regex-automata/tests/gen/README.md +++ b/regex-automata/tests/gen/README.md @@ -58,7 +58,7 @@ to test that serialization works for all of them. Arguably we should increase test coverage here, but this is a start. Note that in particular, this does not need to test that serialization and -deserialization correctly roundtrips on its own. Indeed, the normal regex test +deserialization correctly round-trips on its own. Indeed, the normal regex test suite has a test that does a serialization round trip for every test supported by DFAs. So that has very good coverage. What we're interested in testing here is our compatibility promise: do DFAs generated with an older revision of the diff --git a/regex-capi/README.md b/regex-capi/README.md index af5997977..5bdb403b8 100644 --- a/regex-capi/README.md +++ b/regex-capi/README.md @@ -17,7 +17,7 @@ API documentation: https://docs.rs/regex Examples -------- -There are readable examples in the `ctest` and `examples` sub-directories. +There are readable examples in the `ctest` and `examples` subdirectories. Assuming you have [Rust and Cargo installed](https://www.rust-lang.org/downloads.html) @@ -34,7 +34,7 @@ $ LD_LIBRARY_PATH=../target/release ./iter Performance ----------- It's fast. Its core matching engine is a lazy DFA, which is what GNU grep -and RE2 use. Like GNU grep, this regex engine can detect multi byte literals +and RE2 use. Like GNU grep, this regex engine can detect multibyte literals in the regex and will use fast literal string searching to quickly skip through the input to find possible match locations. @@ -51,7 +51,7 @@ All regular expressions must be valid UTF-8. The text encoding of haystacks is more complicated. To a first approximation, haystacks should be UTF-8. In fact, UTF-8 (and, one -supposes, ASCII) is the only well defined text encoding supported by this +supposes, ASCII) is the only well-defined text encoding supported by this library. It is impossible to match UTF-16, UTF-32 or any other encoding without first transcoding it to UTF-8. diff --git a/regex-cli/README.md b/regex-cli/README.md index 376d89091..e6f773cf3 100644 --- a/regex-cli/README.md +++ b/regex-cli/README.md @@ -182,7 +182,7 @@ flags we use here though: the size of the DFA as much as possible. In some cases it can make a big difference, but not all. Minimization can also be extremely expensive, but given that this is an offline process and presumably done rarely, it's usually -a good trade off to make. +a good trade-off to make. * `--shrink` uses heuristics to make the size of the NFA smaller in some cases. This doesn't impact the size of the DFA, but it can make determinization (the process of converting an NFA into a DFA) faster at the cost of making NFA @@ -258,7 +258,7 @@ favor of (potentially much) smaller DFAs. One can also generate a "dense" DFA to get faster searches but larger DFAs. * Above, we generated a "dfa," but one can also generate a "regex." The difference is that a DFA can only find the end of a match (or start of a match -if the DFA is reversed), where as a regex will generate two DFAs: one for +if the DFA is reversed), whereas a regex will generate two DFAs: one for finding the end of a match and then another for finding the start. One can generate two DFAs manually and stitch them together in the code, but generating a `regex` will take care of this for you. diff --git a/regex-cli/cmd/compile_test.rs b/regex-cli/cmd/compile_test.rs index 1a7f039e5..bd28a7dab 100644 --- a/regex-cli/cmd/compile_test.rs +++ b/regex-cli/cmd/compile_test.rs @@ -25,7 +25,7 @@ const REGEX_LITE_COMBOS: &[&[&str]] = &[&["std", "string"]]; const REGEX_AUTOMATA_COMBOS: &[&[&str]] = &[ &["std", "syntax", "perf", "unicode", "meta", "nfa", "dfa", "hybrid"], - // Try out some barebones combinations of individual regex engines. + // Try out some bare-bone combinations of individual regex engines. &["std", "syntax", "nfa-pikevm"], &["std", "syntax", "nfa-backtrack"], &["std", "syntax", "hybrid"], diff --git a/regex-cli/cmd/generate/serialize/dfa.rs b/regex-cli/cmd/generate/serialize/dfa.rs index 1c7b409b3..600d12111 100644 --- a/regex-cli/cmd/generate/serialize/dfa.rs +++ b/regex-cli/cmd/generate/serialize/dfa.rs @@ -1,6 +1,6 @@ // The code in this module honestly sucks. I did at one point try and make it a // little more composable, particularly with respect to the stuff that writes -// the Rust code, but it became an unintelligble mess. Instead, I squashed +// the Rust code, but it became an unintelligible mess. Instead, I squashed // it down into four functions: dense DFAs, dense regexes, sparse DFAs and // sparse regexes. And each of those functions handles the 'regex-automata', // 'once-cell' and 'lazy-static' variants. So that's 12 different variants. diff --git a/regex-lite/src/hir/mod.rs b/regex-lite/src/hir/mod.rs index 6e5348a5b..9236aab3f 100644 --- a/regex-lite/src/hir/mod.rs +++ b/regex-lite/src/hir/mod.rs @@ -31,11 +31,11 @@ pub fn escape(pattern: &str) -> String { /// classes. /// /// In order to determine whether a character may be escaped at all, the -/// [`is_escapeable_character`] routine should be used. The difference between -/// `is_meta_character` and `is_escapeable_character` is that the latter will +/// [`is_escapable_character`] routine should be used. The difference between +/// `is_meta_character` and `is_escapable_character` is that the latter will /// return true for some characters that are _not_ meta characters. For /// example, `%` and `\%` both match a literal `%` in all contexts. In other -/// words, `is_escapeable_character` includes "superfluous" escapes. +/// words, `is_escapable_character` includes "superfluous" escapes. /// /// Note that the set of characters for which this function returns `true` or /// `false` is fixed and won't change in a semver compatible release. (In this @@ -54,7 +54,7 @@ fn is_meta_character(c: char) -> bool { /// /// This returns true in all cases that `is_meta_character` returns true, but /// also returns true in some cases where `is_meta_character` returns false. -/// For example, `%` is not a meta character, but it is escapeable. That is, +/// For example, `%` is not a meta character, but it is escapable. That is, /// `%` and `\%` both match a literal `%` in all contexts. /// /// The purpose of this routine is to provide knowledge about what characters @@ -63,31 +63,31 @@ fn is_meta_character(c: char) -> bool { /// though there is no actual _need_ to do so. /// /// This will return false for some characters. For example, `e` is not -/// escapeable. Therefore, `\e` will either result in a parse error (which is +/// escapable. Therefore, `\e` will either result in a parse error (which is /// true today), or it could backwards compatibly evolve into a new construct /// with its own meaning. Indeed, that is the purpose of banning _some_ /// superfluous escapes: it provides a way to evolve the syntax in a compatible /// manner. -fn is_escapeable_character(c: char) -> bool { - // Certainly escapeable if it's a meta character. +fn is_escapable_character(c: char) -> bool { + // Certainly escapable if it's a meta character. if is_meta_character(c) { return true; } - // Any character that isn't ASCII is definitely not escapeable. There's + // Any character that isn't ASCII is definitely not escapable. There's // no real need to allow things like \☃ right? if !c.is_ascii() { return false; } - // Otherwise, we basically say that everything is escapeable unless it's a + // Otherwise, we basically say that everything is escapable unless it's a // letter or digit. Things like \3 are either octal (when enabled) or an // error, and we should keep it that way. Otherwise, letters are reserved // for adding new syntax in a backwards compatible way. match c { '0'..='9' | 'A'..='Z' | 'a'..='z' => false, - // While not currently supported, we keep these as not escapeable to + // While not currently supported, we keep these as not escapable to // give us some flexibility with respect to supporting the \< and // \> word boundary assertions in the future. By rejecting them as - // escapeable, \< and \> will result in a parse error. Thus, we can + // escapable, \< and \> will result in a parse error. Thus, we can // turn them into something else in the future without it being a // backwards incompatible change. '<' | '>' => false, @@ -120,7 +120,7 @@ impl Default for Config { /// /// These can be set via explicit configuration in code, or change dynamically /// during parsing via inline flags. For example, `foo(?i:bar)baz` will match -/// `foo` and `baz` case sensitiviely and `bar` case insensitively (assuming a +/// `foo` and `baz` case sensitively and `bar` case insensitively (assuming a /// default configuration). #[derive(Clone, Copy, Debug, Default)] pub(crate) struct Flags { diff --git a/regex-lite/src/hir/parse.rs b/regex-lite/src/hir/parse.rs index ca93b8838..6d4009d8d 100644 --- a/regex-lite/src/hir/parse.rs +++ b/regex-lite/src/hir/parse.rs @@ -380,7 +380,7 @@ impl<'a> Parser<'a> { let hir = self.parse_inner()?; // While we also check nesting during parsing, that only checks the // number of recursive parse calls. It does not necessarily cover - // all possible recursive nestings of the Hir itself. For example, + // all possible recursive nesting of the Hir itself. For example, // repetition operators don't require recursive parse calls. So one // can stack them arbitrarily without overflowing the stack in the // *parser*. But then if one recurses over the resulting Hir, a stack @@ -490,7 +490,7 @@ impl<'a> Parser<'a> { // Handle all of the one letter sequences inline. self.bump(); - if hir::is_meta_character(ch) || hir::is_escapeable_character(ch) { + if hir::is_meta_character(ch) || hir::is_escapable_character(ch) { return Ok(self.hir_char(ch)); } let special = |ch| Ok(self.hir_char(ch)); @@ -1153,7 +1153,7 @@ impl<'a> Parser<'a> { // that we should return an error instead since the repeated colons // give away the intent to write an POSIX class. But what if the user // typed `[[:lower]]` instead? How can we tell that was intended to be - // a POSXI class and not just a normal nested class? + // a POSIX class and not just a normal nested class? // // Reasonable people can probably disagree over this, but for better // or worse, we implement semantics that never fails at the expense of diff --git a/regex-lite/src/lib.rs b/regex-lite/src/lib.rs index 9b394a480..02405cb23 100644 --- a/regex-lite/src/lib.rs +++ b/regex-lite/src/lib.rs @@ -12,7 +12,7 @@ over performance and functionality. As a result, regex searches in this crate are typically substantially slower than what is provided by the `regex` crate. Moreover, this crate only has the most basic level of Unicode support: it matches codepoint by codepoint but otherwise doesn't support Unicode case -insensivity or things like `\p{Letter}`. In exchange, this crate contributes +insensitivity or things like `\p{Letter}`. In exchange, this crate contributes far less to binary size and compiles much more quickly. If you just want API documentation, then skip to the [`Regex`] type. Otherwise, @@ -831,7 +831,7 @@ case `O(m * n)` time. Thus, iteration of all matches in a haystack has worst case `O(m * n^2)`. A good example of a pattern that exhibits this is `(?:A+){1000}|` or even `.*[^A-Z]|[A-Z]`. -In general, unstrusted haystacks are easier to stomach than untrusted patterns. +In general, untrusted haystacks are easier to stomach than untrusted patterns. Untrusted patterns give a lot more control to the caller to impact the performance of a search. Therefore, permitting untrusted patterns means that your only line of defense is to put a limit on how big `m` (and perhaps also diff --git a/regex-lite/src/pikevm.rs b/regex-lite/src/pikevm.rs index 97b450555..be35d9794 100644 --- a/regex-lite/src/pikevm.rs +++ b/regex-lite/src/pikevm.rs @@ -780,7 +780,7 @@ enum FollowEpsilon { /// /// The data structure is based on: https://research.swtch.com/sparse /// Note though that we don't actually use uninitialized memory. We generally -/// reuse sparse sets, so the initial allocation cost is bareable. However, its +/// reuse sparse sets, so the initial allocation cost is bearable. However, its /// other properties listed above are extremely useful. #[derive(Clone)] struct SparseSet { @@ -822,7 +822,7 @@ impl SparseSet { fn resize(&mut self, new_capacity: usize) { assert!( new_capacity <= u32::MAX.as_usize(), - "sparse set capacity cannot excced {:?}", + "sparse set capacity cannot exceed {:?}", u32::MAX, ); self.clear(); diff --git a/regex-lite/src/string.rs b/regex-lite/src/string.rs index 4e4de9068..7753c4655 100644 --- a/regex-lite/src/string.rs +++ b/regex-lite/src/string.rs @@ -948,7 +948,7 @@ impl Regex { /// Returns the end byte offset of the first match in the haystack given. /// /// This method may have the same performance characteristics as - /// `is_match`. Behaviorlly, it doesn't just report whether it match + /// `is_match`. Behaviorally, it doesn't just report whether it match /// occurs, but also the end offset for a match. In particular, the offset /// returned *may be shorter* than the proper end of the leftmost-first /// match that you would find via [`Regex::find`]. @@ -1744,7 +1744,7 @@ impl<'h> Captures<'h> { /// use regex_lite::Regex; /// /// let re = Regex::new(r"([0-9]{4})-([0-9]{2})-([0-9]{2})").unwrap(); - /// let hay = "On 2010-03-14, I became a Tenneessee lamb."; + /// let hay = "On 2010-03-14, I became a Tennessee lamb."; /// let Some((full, [year, month, day])) = /// re.captures(hay).map(|caps| caps.extract()) else { return }; /// assert_eq!("2010-03-14", full); @@ -1852,7 +1852,7 @@ impl<'h> Captures<'h> { /// let re = Regex::new( /// r"(?[0-9]{2})-(?[0-9]{2})-(?[0-9]{4})", /// ).unwrap(); - /// let hay = "On 14-03-2010, I became a Tenneessee lamb."; + /// let hay = "On 14-03-2010, I became a Tennessee lamb."; /// let caps = re.captures(hay).unwrap(); /// /// let mut dst = String::new(); @@ -2609,7 +2609,7 @@ impl<'t> Replacer for NoExpand<'t> { /// no `$` anywhere, then interpolation definitely does not need to be done. In /// that case, the given string is returned as a borrowed `Cow`. /// -/// This is meant to be used to implement the `Replacer::no_expandsion` method +/// This is meant to be used to implement the `Replacer::no_expansion` method /// in its various trait impls. fn no_expansion>(t: &T) -> Option> { let s = t.as_ref(); @@ -2839,7 +2839,7 @@ impl RegexBuilder { /// This configures verbose mode for the entire pattern. /// - /// When enabled, whitespace will treated as insignifcant in the pattern + /// When enabled, whitespace will treated as insignificant in the pattern /// and `#` can be used to start a comment until the next new line. /// /// Normally, in most places in a pattern, whitespace is treated literally. diff --git a/regex-syntax/src/error.rs b/regex-syntax/src/error.rs index 98869c4f7..71ea0fc6d 100644 --- a/regex-syntax/src/error.rs +++ b/regex-syntax/src/error.rs @@ -192,7 +192,7 @@ impl<'p> Spans<'p> { } } - /// Notate the pattern string with carents (`^`) pointing at each span + /// Notate the pattern string with carets (`^`) pointing at each span /// location. This only applies to spans that occur within a single line. fn notate(&self) -> String { let mut notated = String::new(); diff --git a/regex-syntax/src/hir/literal.rs b/regex-syntax/src/hir/literal.rs index a5a3737f6..2a6350e64 100644 --- a/regex-syntax/src/hir/literal.rs +++ b/regex-syntax/src/hir/literal.rs @@ -72,7 +72,7 @@ use crate::hir::{self, Hir}; /// The main downside of literal extraction is that it can wind up causing a /// search to be slower overall. For example, if there are many matches or if /// there are many candidates that don't ultimately lead to a match, then a -/// lot of overhead will be spent in shuffing back-and-forth between substring +/// lot of overhead will be spent in shuffling back-and-forth between substring /// search and the regex engine. This is the fundamental reason why literal /// optimizations for regex patterns is sometimes considered a "black art." /// @@ -588,7 +588,7 @@ impl Extractor { // leakage. Downstream, the literals may wind up getting fed to // the Teddy algorithm, which supports searching literals up to // length 4. So that's why we pick that number here. Arguably this - // should be a tuneable parameter, but it seems a little tricky to + // should be a tunable parameter, but it seems a little tricky to // describe. And I'm still unsure if this is the right way to go // about culling literal sequences. match self.kind { diff --git a/regex-syntax/src/hir/mod.rs b/regex-syntax/src/hir/mod.rs index ce38ead7b..29f90819f 100644 --- a/regex-syntax/src/hir/mod.rs +++ b/regex-syntax/src/hir/mod.rs @@ -249,7 +249,7 @@ impl Hir { /// just get back the original `expr` since it's precisely equivalent. /// /// Smart constructors enable maintaining invariants about the HIR data type -/// while also simulanteously keeping the representation as simple as possible. +/// while also simultaneously keeping the representation as simple as possible. impl Hir { /// Returns an empty HIR expression. /// @@ -706,7 +706,7 @@ pub enum HirKind { /// The empty regular expression, which matches everything, including the /// empty string. Empty, - /// A literalstring that matches exactly these bytes. + /// A literal string that matches exactly these bytes. Literal(Literal), /// A single character class that matches any of the characters in the /// class. A class can either consist of Unicode scalar values as @@ -803,7 +803,7 @@ impl core::fmt::Debug for Literal { /// sequence of non-overlapping non-adjacent ranges of characters. /// /// There are no guarantees about which class variant is used. Generally -/// speaking, the Unicode variat is used whenever a class needs to contain +/// speaking, the Unicode variant is used whenever a class needs to contain /// non-ASCII Unicode scalar values. But the Unicode variant can be used even /// when Unicode mode is disabled. For example, at the time of writing, the /// regex `(?-u:a|\xc2\xa0)` will compile down to HIR for the Unicode class diff --git a/regex-syntax/src/hir/translate.rs b/regex-syntax/src/hir/translate.rs index 313a1e9e8..4f2b0b4a0 100644 --- a/regex-syntax/src/hir/translate.rs +++ b/regex-syntax/src/hir/translate.rs @@ -30,7 +30,7 @@ impl Default for TranslatorBuilder { } impl TranslatorBuilder { - /// Create a new translator builder with a default c onfiguration. + /// Create a new translator builder with a default configuration. pub fn new() -> TranslatorBuilder { TranslatorBuilder { utf8: true, diff --git a/regex-syntax/src/lib.rs b/regex-syntax/src/lib.rs index 20f25db71..f95f820ce 100644 --- a/regex-syntax/src/lib.rs +++ b/regex-syntax/src/lib.rs @@ -269,7 +269,7 @@ pub fn is_meta_character(c: char) -> bool { /// /// This returns true in all cases that `is_meta_character` returns true, but /// also returns true in some cases where `is_meta_character` returns false. -/// For example, `%` is not a meta character, but it is escapeable. That is, +/// For example, `%` is not a meta character, but it is escapable. That is, /// `%` and `\%` both match a literal `%` in all contexts. /// /// The purpose of this routine is to provide knowledge about what characters @@ -278,7 +278,7 @@ pub fn is_meta_character(c: char) -> bool { /// though there is no actual _need_ to do so. /// /// This will return false for some characters. For example, `e` is not -/// escapeable. Therefore, `\e` will either result in a parse error (which is +/// escapable. Therefore, `\e` will either result in a parse error (which is /// true today), or it could backwards compatibly evolve into a new construct /// with its own meaning. Indeed, that is the purpose of banning _some_ /// superfluous escapes: it provides a way to evolve the syntax in a compatible @@ -301,30 +301,30 @@ pub fn is_meta_character(c: char) -> bool { /// assert!(!is_escapeable_character('e')); /// ``` pub fn is_escapeable_character(c: char) -> bool { - // Certainly escapeable if it's a meta character. + // Certainly escapable if it's a meta character. if is_meta_character(c) { return true; } - // Any character that isn't ASCII is definitely not escapeable. There's + // Any character that isn't ASCII is definitely not escapable. There's // no real need to allow things like \☃ right? if !c.is_ascii() { return false; } - // Otherwise, we basically say that everything is escapeable unless it's a + // Otherwise, we basically say that everything is escapable unless it's a // letter or digit. Things like \3 are either octal (when enabled) or an // error, and we should keep it that way. Otherwise, letters are reserved // for adding new syntax in a backwards compatible way. match c { '0'..='9' | 'A'..='Z' | 'a'..='z' => false, - // While not currently supported, we keep these as not escapeable to + // While not currently supported, we keep these as not escapable to // give us some flexibility with respect to supporting the \< and // \> word boundary assertions in the future. By rejecting them as - // escapeable, \< and \> will result in a parse error. Thus, we can + // escapable, \< and \> will result in a parse error. Thus, we can // turn them into something else in the future without it being a // backwards incompatible change. // // OK, now we support \< and \>, and we need to retain them as *not* - // escapeable here since the escape sequence is significant. + // escapable here since the escape sequence is significant. '<' | '>' => false, _ => true, } diff --git a/regex-test/lib.rs b/regex-test/lib.rs index 2b630666e..fd78cae67 100644 --- a/regex-test/lib.rs +++ b/regex-test/lib.rs @@ -69,7 +69,7 @@ into a TOML file (which is not allowed). There is generally no other reason to enable `unescape`. * `unicode` - When enabled, the regex pattern should be compiled with its corresponding Unicode mode enabled. For example, `[^a]` matches any UTF-8 -encoding of any codepoint other than `a`. Case insensitivty should be Unicode +encoding of any codepoint other than `a`. Case insensitivity should be Unicode aware. Unicode classes like `\pL` are available. The Perl classes `\w`, `\s` and `\d` should be Unicode aware. And so on. This is an optional field and is enabled by default. @@ -335,7 +335,7 @@ impl RegexTest { /// Returns true if regex matching should have Unicode mode enabled. /// /// For example, `[^a]` matches any UTF-8 encoding of any codepoint other - /// than `a`. Case insensitivty should be Unicode aware. Unicode classes + /// than `a`. Case insensitivity should be Unicode aware. Unicode classes /// like `\pL` are available. The Perl classes `\w`, `\s` and `\d` should /// be Unicode aware. And so on. /// @@ -1204,7 +1204,7 @@ pub struct Captures { /// the overall match. /// /// This should either have length 1 (when not capturing group offsets are - /// included in the tes tresult) or it should have length equal to the + /// included in the test result) or it should have length equal to the /// number of capturing groups in the regex pattern. groups: Vec>, } diff --git a/src/builders.rs b/src/builders.rs index c111a96c0..3bb08de8b 100644 --- a/src/builders.rs +++ b/src/builders.rs @@ -552,7 +552,7 @@ pub(crate) mod string { /// This configures verbose mode for the entire pattern. /// - /// When enabled, whitespace will treated as insignifcant in the + /// When enabled, whitespace will treated as insignificant in the /// pattern and `#` can be used to start a comment until the next new /// line. /// @@ -1133,7 +1133,7 @@ pub(crate) mod string { /// This configures verbose mode for all of the patterns. /// - /// When enabled, whitespace will treated as insignifcant in the + /// When enabled, whitespace will treated as insignificant in the /// pattern and `#` can be used to start a comment until the next new /// line. /// @@ -1725,7 +1725,7 @@ pub(crate) mod bytes { /// This configures verbose mode for the entire pattern. /// - /// When enabled, whitespace will treated as insignifcant in the + /// When enabled, whitespace will treated as insignificant in the /// pattern and `#` can be used to start a comment until the next new /// line. /// @@ -2317,7 +2317,7 @@ pub(crate) mod bytes { /// This configures verbose mode for all of the patterns. /// - /// When enabled, whitespace will treated as insignifcant in the + /// When enabled, whitespace will treated as insignificant in the /// pattern and `#` can be used to start a comment until the next new /// line. /// diff --git a/src/lib.rs b/src/lib.rs index 6dbd3c202..213c71d99 100644 --- a/src/lib.rs +++ b/src/lib.rs @@ -1120,7 +1120,7 @@ case `O(m * n)` time. Thus, iteration of all matches in a haystack has worst case `O(m * n^2)`. A good example of a pattern that exhibits this is `(?:A+){1000}|` or even `.*[^A-Z]|[A-Z]`. -In general, unstrusted haystacks are easier to stomach than untrusted patterns. +In general, untrusted haystacks are easier to stomach than untrusted patterns. Untrusted patterns give a lot more control to the caller to impact the performance of a search. In many cases, a regex search will actually execute in average case `O(n)` time (i.e., not dependent on the size of the regex), but @@ -1274,7 +1274,7 @@ It is somewhat unusual for a regex engine to have dependencies, as most regex libraries are self contained units with no dependencies other than a particular environment's standard library. Indeed, for other similarly optimized regex engines, most or all of the code in the dependencies of this crate would -normally just be unseparable or coupled parts of the crate itself. But since +normally just be inseparable or coupled parts of the crate itself. But since Rust and its tooling ecosystem make the use of dependencies so easy, it made sense to spend some effort de-coupling parts of this crate and making them independently useful. diff --git a/src/regex/bytes.rs b/src/regex/bytes.rs index 19f5701af..b6ea75a7a 100644 --- a/src/regex/bytes.rs +++ b/src/regex/bytes.rs @@ -965,7 +965,7 @@ impl Regex { /// Returns the end byte offset of the first match in the haystack given. /// /// This method may have the same performance characteristics as - /// `is_match`. Behaviorlly, it doesn't just report whether it match + /// `is_match`. Behaviorally, it doesn't just report whether it match /// occurs, but also the end offset for a match. In particular, the offset /// returned *may be shorter* than the proper end of the leftmost-first /// match that you would find via [`Regex::find`]. @@ -1716,7 +1716,7 @@ impl<'h> Captures<'h> { /// use regex::bytes::Regex; /// /// let re = Regex::new(r"([0-9]{4})-([0-9]{2})-([0-9]{2})").unwrap(); - /// let hay = b"On 2010-03-14, I became a Tenneessee lamb."; + /// let hay = b"On 2010-03-14, I became a Tennessee lamb."; /// let Some((full, [year, month, day])) = /// re.captures(hay).map(|caps| caps.extract()) else { return }; /// assert_eq!(b"2010-03-14", full); @@ -1825,7 +1825,7 @@ impl<'h> Captures<'h> { /// let re = Regex::new( /// r"(?[0-9]{2})-(?[0-9]{2})-(?[0-9]{4})", /// ).unwrap(); - /// let hay = b"On 14-03-2010, I became a Tenneessee lamb."; + /// let hay = b"On 14-03-2010, I became a Tennessee lamb."; /// let caps = re.captures(hay).unwrap(); /// /// let mut dst = vec![]; @@ -2589,7 +2589,7 @@ impl<'s> Replacer for NoExpand<'s> { /// no `$` anywhere, then interpolation definitely does not need to be done. In /// that case, the given string is returned as a borrowed `Cow`. /// -/// This is meant to be used to implement the `Replacer::no_expandsion` method +/// This is meant to be used to implement the `Replacer::no_expansion` method /// in its various trait impls. fn no_expansion>(replacement: &T) -> Option> { let replacement = replacement.as_ref(); diff --git a/src/regex/string.rs b/src/regex/string.rs index 880d6082a..db777a991 100644 --- a/src/regex/string.rs +++ b/src/regex/string.rs @@ -952,7 +952,7 @@ impl Regex { /// Returns the end byte offset of the first match in the haystack given. /// /// This method may have the same performance characteristics as - /// `is_match`. Behaviorlly, it doesn't just report whether it match + /// `is_match`. Behaviorally, it doesn't just report whether it match /// occurs, but also the end offset for a match. In particular, the offset /// returned *may be shorter* than the proper end of the leftmost-first /// match that you would find via [`Regex::find`]. @@ -1721,7 +1721,7 @@ impl<'h> Captures<'h> { /// use regex::Regex; /// /// let re = Regex::new(r"([0-9]{4})-([0-9]{2})-([0-9]{2})").unwrap(); - /// let hay = "On 2010-03-14, I became a Tenneessee lamb."; + /// let hay = "On 2010-03-14, I became a Tennessee lamb."; /// let Some((full, [year, month, day])) = /// re.captures(hay).map(|caps| caps.extract()) else { return }; /// assert_eq!("2010-03-14", full); @@ -1830,7 +1830,7 @@ impl<'h> Captures<'h> { /// let re = Regex::new( /// r"(?[0-9]{2})-(?[0-9]{2})-(?[0-9]{4})", /// ).unwrap(); - /// let hay = "On 14-03-2010, I became a Tenneessee lamb."; + /// let hay = "On 14-03-2010, I became a Tennessee lamb."; /// let caps = re.captures(hay).unwrap(); /// /// let mut dst = String::new(); @@ -2571,7 +2571,7 @@ impl<'s> Replacer for NoExpand<'s> { /// no `$` anywhere, then interpolation definitely does not need to be done. In /// that case, the given string is returned as a borrowed `Cow`. /// -/// This is meant to be used to implement the `Replacer::no_expandsion` method +/// This is meant to be used to implement the [`Replacer::no_expansion`] method /// in its various trait impls. fn no_expansion>(replacement: &T) -> Option> { let replacement = replacement.as_ref(); diff --git a/src/regexset/bytes.rs b/src/regexset/bytes.rs index 1220a1466..db067438a 100644 --- a/src/regexset/bytes.rs +++ b/src/regexset/bytes.rs @@ -563,7 +563,7 @@ impl SetMatches { /// assert_eq!(matches, vec![0, 1, 3]); /// ``` /// - /// Note that `SetMatches` also implemnets the `IntoIterator` trait, so + /// Note that `SetMatches` also implements the `IntoIterator` trait, so /// this method is not always needed. For example: /// /// ``` diff --git a/src/regexset/string.rs b/src/regexset/string.rs index 2a3e7b802..6528aacbb 100644 --- a/src/regexset/string.rs +++ b/src/regexset/string.rs @@ -559,7 +559,7 @@ impl SetMatches { /// assert_eq!(matches, vec![0, 1, 3]); /// ``` /// - /// Note that `SetMatches` also implemnets the `IntoIterator` trait, so + /// Note that `SetMatches` also implements the `IntoIterator` trait, so /// this method is not always needed. For example: /// /// ``` diff --git a/testdata/README.md b/testdata/README.md index c3bc1acb5..dcac6719f 100644 --- a/testdata/README.md +++ b/testdata/README.md @@ -9,7 +9,7 @@ The basic idea here is that we have many different regex engines but generally one set of tests. We want to be able to run those tests (or most of them) on every engine. Prior to `regex 1.9`, we used to do this with a hodge podge soup of macros and a different test executable for each engine. It overall took a -longer time to compile, was harder to maintain and it made the test definitions +longer time to compile, was harder to maintain, and it made the test definitions themselves less clear. In `regex 1.9`, when we moved over to `regex-automata`, the situation got a lot diff --git a/testdata/fowler/dat/nullsubexpr.dat b/testdata/fowler/dat/nullsubexpr.dat index a94430649..eb3e721d3 100644 --- a/testdata/fowler/dat/nullsubexpr.dat +++ b/testdata/fowler/dat/nullsubexpr.dat @@ -45,7 +45,7 @@ E SAME ababab (0,0)(0,0) #E ((z)+|a)* zabcde (0,2)(1,2) E ((z)+|a)* zabcde (0,2)(1,2)(0,1) Rust -#{E a+? aaaaaa (0,1) no *? +? mimimal match ops +#{E a+? aaaaaa (0,1) no *? +? minimal match ops #E (a) aaa (0,1)(0,1) #E (a*?) aaa (0,0)(0,0) #E (a)*? aaa (0,0) diff --git a/tests/fuzz/mod.rs b/tests/fuzz/mod.rs index 88c196ae6..fb366cf07 100644 --- a/tests/fuzz/mod.rs +++ b/tests/fuzz/mod.rs @@ -71,7 +71,7 @@ fn fail_branch_prevents_match() { // Basically, the NFA compiler works in two phases. The first phase builds // a more complicated-but-simpler-to-construct sequence of NFA states that // includes unconditional epsilon transitions. As part of converting this -// sequence to the "final" NFA, we remove those unconditional espilon +// sequence to the "final" NFA, we remove those unconditional epsilon // transition. The code responsible for doing this follows every chain of // these transitions and remaps the state IDs. The way we were doing this // before resulted in re-following every subsequent part of the chain for each