Skip to content

Latest commit

 

History

History
258 lines (206 loc) · 7.89 KB

parsers.md

File metadata and controls

258 lines (206 loc) · 7.89 KB

Parser Documentation

A parser includes a pattern with associated metadata that Logria uses to parse and aggregate log messages.

Storage

Parsers are stored as JSON in $LOGRIA_USER_HOME/$LOGRIA_ROOT/parsers and do not have file extensions. A parser is defined like so:

{
    "pattern": " - ",
    "pattern_type": "Split",
    "example": "2005-03-19 15:10:26,773 - simple_example - CRITICAL - critical message",
    "order": [
      "Timestamp",
      "Method",
      "Level",
      "Message"
    ],
    "aggregation_methods": {
        "Timestamp": {
            "DateTime": "[year]-[month]-[day] [hour]:[minute]:[second],[subsecond]"
        },
        "Method": "Count",
        "Level": "Count",
        "Message": "Sum"
    }
}

If $LOGRIA_USER_HOME/$LOGRIA_ROOT/parsers does not exist, Logria will create it.

Anatomy

All parsers have the following keys:

  • pattern
    • The pattern to apply
  • pattern_type
    • The method we intend to apply the pattern with, one of {regex, split}, detailed in Types of Parsers
  • name
    • The name of the parser
    • Displayed to the user when selecting parsers
  • example
    • An example message to match with the parser
    • Displayed to the user when selecting which part of a message to render
  • order
    • The order the message parts occur in for aggregation
  • aggregation_methods
    • Can be Mean, Sum, Count, Mode, Date, Time, DateTime, and None
    • See Aggregation Methods below for details

Types of Parsers

There are two types of parsers: regex and split.

Regex Parser

A regex parser uses a regex expression to match parts of a log and looks like this:

{
    "pattern": "([^ ]*) ([^ ]*) ([^ ]*) \\[([^]]*)\\] \"([^\"]*)\" ([^ ]*) ([^ ]*)",
    "pattern_type": "Regex",
    "example": "127.0.0.1 user-identifier user-name [10/Oct/2000:13:55:36 -0700] \"GET /apache_pb.gif HTTP/1.0\" 200 2326",
    "order": [
        "Remote Host",
        "User ID",
        "Username",
        "Date",
        "Request",
        "Status",
        "Size",
    ],
    "aggregation_methods": {
        "Remote Host": "Count",
        "User ID": "Count",
        "Username": "Count",
        "Date": "Count",
        "Request": "Count",
        "Status": "Count",
        "Size": "Count"
    }
}

Split Patterns

A split parser uses str::split to split a message on a delimiter:

{
    "pattern": " - ",
    "pattern_type": "Split",
    "example": "2005-03-19 15:10:26,773 - simple_example - CRITICAL - critical message",
    "order": [
        "Timestamp",
        "Method",
        "Level",
        "Message"
    ],
    "aggregation_methods": {
        "Timestamp": {
            "DateTime": "[year]-[month]-[day] [hour]:[minute]:[second],[subsecond]"
        },
        "Method": "Count",
        "Level": "Count",
        "Message": "Sum"
    }
}

Aggregation Methods

The aggregation_methods key stores a HashMap<String, AggregationMethod> of the name of the parsed message to a method to handle message aggregation. Since HashMaps are unordered, a list called order must also be present. This list contains strings that match the key names in aggregation_methods.

Included Methods

Methods currently include Mean, Sum, Count, Mode, Date, Time, DateTime, and None. These all have different behaviors.

Mean and Sum

Both of these methods search for the first occurrence of a number in the parsed message.

Mean will display the mean, the count, and the sum of the parsed floats:

Message
    Mean: 51.32
    Count: 5.113
    Total: 262,417

Sum will only display the sum:

Message
    Total: 262,417

Float parsing logic is defined and tested in aggregators.rs. Some examples include:

extract_number("653.12 this is a test");  // 653.12
extract_number("4.123 this is a test 123.4");  // 4.123
extract_number("this is a 123.123. test");  // None, invalid

Count and Mode

This uses a data structure similar to Python's collections.Counter to keep track of messages. Each message is hashed, so identical messages will get incremented. It defaults to displaying the top 5 results; this can be adjusted using the :agg command.

When activated, it will display the ordinal count of each occurrence as well as its ratio to the total amount of messages counted:

Level
    INFO: 2,794 (55%)
    WARNING: 1,433 (28%)
    ERROR: 886 (17%)

Mode is a special case of Counter where the top n is frozen to 1.

Date, Time, and DateTime

Date, Time, or DateTime methods require a format description as outlined in the time book or time docs.

Date will default all messages to midnight and Time will default all messages to min.

When activated, these methods display the rate at which messages are received, the total number of messages, and the earliest and latest timestamps.

Timestamp
    Rate: 196 per second
    Count: 5,113
    Earliest: 2021-11-15 22:28:42.21
    Latest: 2021-11-15 22:29:08.389

None

None disables parsing for that field. It displays like this when activated:

Timestamp
    Disabled

Example Aggregation Data

Given an order and aggregation_map with methods like this:

"order": [
    "Timestamp",
    "Method",
    "Level",
    "Message"
],
"aggregation_methods": {
    "Timestamp": {
        "DateTime": "[year]-[month]-[day] [hour]:[minute]:[second],[subsecond]"
    },
    "Method": "Count",
    "Level": "Count",
    "Message": "Mean"
}

The resultant aggregation data will render like so:

Timestamp
    Rate: 196 per second
    Count: 5,113
    Earliest: 2021-11-15 22:28:42.21
    Latest: 2021-11-15 22:29:08.389
Method
    __main__.‹module>: 2,215 (43%)
    __main__.first: 1,433 (28%)
    __main__.second: 886 (17%)
    __main__.third: 579 (11%)
Process
    MainProcess: 5,113 (100%)
Level
    INFO: 2,794 (55%)
    WARNING: 1,433 (28%)
    ERROR: 886 (17%)
Message
    Mean: 51.32
    Count: 5.113
    Total: 262,417
┌────────────────────────────────────────────────────────────────────────────────────────────────┐
|Parsing with Color + Hyphen Separated, aggregation mode                                         │
└────────────────────────────────────────────────────────────────────────────────────────────────┘

Activating Parsers

When invoked, Logria will list the parsers defined in the parsers directory for the user to select based on the index of the filename:

  0: Common Log Format
  1: Hyphen Separated
  2: Color + Hyphen Separated

Once the first selection has been made, the user will be able to select which part of the matched log we will use when streaming:

  0: 2020-02-04 19:06:52,852
  1: __main__.<module>
  2: MainProcess
  3: INFO
  4: I am a log! 91

This text is generated by the example key in the parser's JSON.