Skip to content

Commit

Permalink
Add string/binary type color to ByteStream (#12897)
Browse files Browse the repository at this point in the history
# Description

This PR allows byte streams to optionally be colored as being
specifically binary or string data, which guarantees that they'll be
converted to `Binary` or `String` appropriately on `into_value()`,
making them compatible with `Type` guarantees. This makes them
significantly more broadly usable for command input and output.

There is still an `Unknown` type for byte streams coming from external
commands, which uses the same behavior as we previously did where it's a
string if it's UTF-8.

A small number of commands were updated to take advantage of this, just
to prove the point. I will be adding more after this merges.

# User-Facing Changes
- New types in `describe`: `string (stream)`, `binary (stream)`
- These commands now return a stream if their input was a stream:
  - `into binary`
  - `into string`
  - `bytes collect`
  - `str join`
  - `first` (binary)
  - `last` (binary)
  - `take` (binary)
  - `skip` (binary)
- Streams that are explicitly binary colored will print as a streaming
hexdump
  - example:
    ```nushell
    1.. | each { into binary } | bytes collect
    ```

# Tests + Formatting
I've added some tests to cover it at a basic level, and it doesn't break
anything existing, but I do think more would be nice. Some of those will
come when I modify more commands to stream.

# After Submitting
There are a few things I'm not quite satisfied with:

- **String trimming behavior.** We automatically trim newlines from
streams from external commands, but I don't think we should do this with
internal commands. If I call a command that happens to turn my string
into a stream, I don't want the newline to suddenly disappear. I changed
this to specifically do it only on `Child` and `File`, but I don't know
if this is quite right, and maybe we should bring back the old flag for
`trim_end_newline`
- **Known binary always resulting in a hexdump.** It would be nice to
have a `print --raw`, so that we can put binary data on stdout
explicitly if we want to. This PR doesn't change how external commands
work though - they still dump straight to stdout.

Otherwise, here's the normal checklist:

- [ ] release notes
- [ ] docs update for plugin protocol changes (added `type` field)

---------

Co-authored-by: Ian Manske <ian.manske@pm.me>
  • Loading branch information
devyn and IanManske committed May 20, 2024
1 parent baeba19 commit c61075e
Show file tree
Hide file tree
Showing 42 changed files with 1,102 additions and 411 deletions.
4 changes: 2 additions & 2 deletions crates/nu-cli/src/util.rs
Original file line number Diff line number Diff line change
Expand Up @@ -276,8 +276,8 @@ fn evaluate_source(
eval_block::<WithoutDebug>(engine_state, stack, &block, input)
}?;

let status = if let PipelineData::ByteStream(stream, ..) = pipeline {
stream.print(false)?
let status = if let PipelineData::ByteStream(..) = pipeline {
pipeline.print(engine_state, stack, false, false)?
} else {
if let Some(hook) = engine_state.get_config().hooks.display_output.clone() {
let pipeline = eval_hook(
Expand Down
6 changes: 4 additions & 2 deletions crates/nu-cmd-lang/src/core_commands/describe.rs
Original file line number Diff line number Diff line change
Expand Up @@ -163,6 +163,8 @@ fn run(

let description = match input {
PipelineData::ByteStream(stream, ..) => {
let type_ = stream.type_().describe();

let description = if options.detailed {
let origin = match stream.source() {
ByteStreamSource::Read(_) => "unknown",
Expand All @@ -172,14 +174,14 @@ fn run(

Value::record(
record! {
"type" => Value::string("byte stream", head),
"type" => Value::string(type_, head),
"origin" => Value::string(origin, head),
"metadata" => metadata_to_value(metadata, head),
},
head,
)
} else {
Value::string("byte stream", head)
Value::string(type_, head)
};

if !options.no_collect {
Expand Down
58 changes: 23 additions & 35 deletions crates/nu-command/src/bytes/collect.rs
Original file line number Diff line number Diff line change
@@ -1,3 +1,4 @@
use itertools::Itertools;
use nu_engine::command_prelude::*;

#[derive(Clone, Copy)]
Expand Down Expand Up @@ -35,46 +36,33 @@ impl Command for BytesCollect {
input: PipelineData,
) -> Result<PipelineData, ShellError> {
let separator: Option<Vec<u8>> = call.opt(engine_state, stack, 0)?;

let span = call.head;

// input should be a list of binary data.
let mut output_binary = vec![];
for value in input {
match value {
Value::Binary { mut val, .. } => {
output_binary.append(&mut val);
// manually concat
// TODO: make use of std::slice::Join when it's available in stable.
if let Some(sep) = &separator {
let mut work_sep = sep.clone();
output_binary.append(&mut work_sep)
}
}
// Explicitly propagate errors instead of dropping them.
Value::Error { error, .. } => return Err(*error),
other => {
return Err(ShellError::OnlySupportsThisInputType {
let metadata = input.metadata();
let iter = Itertools::intersperse(
input.into_iter_strict(span)?.map(move |value| {
// Everything is wrapped in Some in case there's a separator, so we can flatten
Some(match value {
// Explicitly propagate errors instead of dropping them.
Value::Error { error, .. } => Err(*error),
Value::Binary { val, .. } => Ok(val),
other => Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "binary".into(),
wrong_type: other.get_type().to_string(),
dst_span: call.head,
dst_span: span,
src_span: other.span(),
});
}
}
}
}),
})
}),
Ok(separator).transpose(),
)
.flatten();

let output = ByteStream::from_result_iter(iter, span, None, ByteStreamType::Binary);

match separator {
None => Ok(Value::binary(output_binary, call.head).into_pipeline_data()),
Some(sep) => {
if output_binary.is_empty() {
Ok(Value::binary(output_binary, call.head).into_pipeline_data())
} else {
// have push one extra separator in previous step, pop them out.
for _ in sep {
let _ = output_binary.pop();
}
Ok(Value::binary(output_binary, call.head).into_pipeline_data())
}
}
}
Ok(PipelineData::ByteStream(output, metadata))
}

fn examples(&self) -> Vec<Example> {
Expand Down
11 changes: 7 additions & 4 deletions crates/nu-command/src/conversions/into/binary.rs
Original file line number Diff line number Diff line change
Expand Up @@ -127,15 +127,18 @@ fn into_binary(
let cell_paths = call.rest(engine_state, stack, 0)?;
let cell_paths = (!cell_paths.is_empty()).then_some(cell_paths);

if let PipelineData::ByteStream(stream, ..) = input {
// TODO: in the future, we may want this to stream out, converting each to bytes
Ok(Value::binary(stream.into_bytes()?, head).into_pipeline_data())
if let PipelineData::ByteStream(stream, metadata) = input {
// Just set the type - that should be good enough
Ok(PipelineData::ByteStream(
stream.with_type(ByteStreamType::Binary),
metadata,
))
} else {
let args = Arguments {
cell_paths,
compact: call.has_flag(engine_state, stack, "compact")?,
};
operate(action, args, input, call.head, engine_state.ctrlc.clone())
operate(action, args, input, head, engine_state.ctrlc.clone())
}
}

Expand Down
2 changes: 1 addition & 1 deletion crates/nu-command/src/conversions/into/cell_path.rs
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ fn into_cell_path(call: &Call, input: PipelineData) -> Result<PipelineData, Shel
}
PipelineData::ByteStream(stream, ..) => Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "list, int".into(),
wrong_type: "byte stream".into(),
wrong_type: stream.type_().describe().into(),
dst_span: head,
src_span: stream.span(),
}),
Expand Down
20 changes: 17 additions & 3 deletions crates/nu-command/src/conversions/into/string.rs
Original file line number Diff line number Diff line change
Expand Up @@ -156,9 +156,23 @@ fn string_helper(
let cell_paths = call.rest(engine_state, stack, 0)?;
let cell_paths = (!cell_paths.is_empty()).then_some(cell_paths);

if let PipelineData::ByteStream(stream, ..) = input {
// TODO: in the future, we may want this to stream out, converting each to bytes
Ok(Value::string(stream.into_string()?, head).into_pipeline_data())
if let PipelineData::ByteStream(stream, metadata) = input {
// Just set the type - that should be good enough. There is no guarantee that the data
// within a string stream is actually valid UTF-8. But refuse to do it if it was already set
// to binary
if stream.type_() != ByteStreamType::Binary {
Ok(PipelineData::ByteStream(
stream.with_type(ByteStreamType::String),
metadata,
))
} else {
Err(ShellError::CantConvert {
to_type: "string".into(),
from_type: "binary".into(),
span: stream.span(),
help: Some("try using the `decode` command".into()),
})
}
} else {
let config = engine_state.get_config().clone();
let args = Arguments {
Expand Down
2 changes: 1 addition & 1 deletion crates/nu-command/src/filters/drop/column.rs
Original file line number Diff line number Diff line change
Expand Up @@ -135,7 +135,7 @@ fn drop_cols(
PipelineData::Empty => Ok(PipelineData::Empty),
PipelineData::ByteStream(stream, ..) => Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "table or record".into(),
wrong_type: "byte stream".into(),
wrong_type: stream.type_().describe().into(),
dst_span: head,
src_span: stream.span(),
}),
Expand Down
43 changes: 37 additions & 6 deletions crates/nu-command/src/filters/first.rs
Original file line number Diff line number Diff line change
Expand Up @@ -170,12 +170,43 @@ fn first_helper(
))
}
}
PipelineData::ByteStream(stream, ..) => Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "list, binary or range".into(),
wrong_type: "byte stream".into(),
dst_span: head,
src_span: stream.span(),
}),
PipelineData::ByteStream(stream, metadata) => {
if stream.type_() == ByteStreamType::Binary {
let span = stream.span();
if let Some(mut reader) = stream.reader() {
use std::io::Read;
if return_single_element {
// Take a single byte
let mut byte = [0u8];
if reader.read(&mut byte).err_span(span)? > 0 {
Ok(Value::int(byte[0] as i64, head).into_pipeline_data())
} else {
Err(ShellError::AccessEmptyContent { span: head })
}
} else {
// Just take 'rows' bytes off the stream, mimicking the binary behavior
Ok(PipelineData::ByteStream(
ByteStream::read(
reader.take(rows as u64),
head,
None,
ByteStreamType::Binary,
),
metadata,
))
}
} else {
Ok(PipelineData::Empty)
}
} else {
Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "list, binary or range".into(),
wrong_type: stream.type_().describe().into(),
dst_span: head,
src_span: stream.span(),
})
}
}
PipelineData::Empty => Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "list, binary or range".into(),
wrong_type: "null".into(),
Expand Down
4 changes: 2 additions & 2 deletions crates/nu-command/src/filters/insert.rs
Original file line number Diff line number Diff line change
Expand Up @@ -261,8 +261,8 @@ fn insert(
type_name: "empty pipeline".to_string(),
span: head,
}),
PipelineData::ByteStream(..) => Err(ShellError::IncompatiblePathAccess {
type_name: "byte stream".to_string(),
PipelineData::ByteStream(stream, ..) => Err(ShellError::IncompatiblePathAccess {
type_name: stream.type_().describe().into(),
span: head,
}),
}
Expand Down
2 changes: 1 addition & 1 deletion crates/nu-command/src/filters/items.rs
Original file line number Diff line number Diff line change
Expand Up @@ -86,7 +86,7 @@ impl Command for Items {
}),
PipelineData::ByteStream(stream, ..) => Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "record".into(),
wrong_type: "byte stream".into(),
wrong_type: stream.type_().describe().into(),
dst_span: call.head,
src_span: stream.span(),
}),
Expand Down
48 changes: 42 additions & 6 deletions crates/nu-command/src/filters/last.rs
Original file line number Diff line number Diff line change
Expand Up @@ -160,12 +160,48 @@ impl Command for Last {
}),
}
}
PipelineData::ByteStream(stream, ..) => Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "list, binary or range".into(),
wrong_type: "byte stream".into(),
dst_span: head,
src_span: stream.span(),
}),
PipelineData::ByteStream(stream, ..) => {
if stream.type_() == ByteStreamType::Binary {
let span = stream.span();
if let Some(mut reader) = stream.reader() {
use std::io::Read;
// Have to be a bit tricky here, but just consume into a VecDeque that we
// shrink to fit each time
const TAKE: u64 = 8192;
let mut buf = VecDeque::with_capacity(rows + TAKE as usize);
loop {
let taken = std::io::copy(&mut (&mut reader).take(TAKE), &mut buf)
.err_span(span)?;
if buf.len() > rows {
buf.drain(..(buf.len() - rows));
}
if taken < TAKE {
// This must be EOF.
if return_single_element {
if !buf.is_empty() {
return Ok(
Value::int(buf[0] as i64, head).into_pipeline_data()
);
} else {
return Err(ShellError::AccessEmptyContent { span: head });
}
} else {
return Ok(Value::binary(buf, head).into_pipeline_data());
}
}
}
} else {
Ok(PipelineData::Empty)
}
} else {
Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "list, binary or range".into(),
wrong_type: stream.type_().describe().into(),
dst_span: head,
src_span: stream.span(),
})
}
}
PipelineData::Empty => Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "list, binary or range".into(),
wrong_type: "null".into(),
Expand Down
36 changes: 30 additions & 6 deletions crates/nu-command/src/filters/skip/skip_.rs
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@ impl Command for Skip {
Signature::build(self.name())
.input_output_types(vec![
(Type::table(), Type::table()),
(Type::Binary, Type::Binary),
(
Type::List(Box::new(Type::Any)),
Type::List(Box::new(Type::Any)),
Expand Down Expand Up @@ -51,6 +52,11 @@ impl Command for Skip {
"editions" => Value::test_int(2021),
})])),
},
Example {
description: "Skip 2 bytes of a binary value",
example: "0x[01 23 45 67] | skip 2",
result: Some(Value::test_binary(vec![0x45, 0x67])),
},
]
}
fn run(
Expand Down Expand Up @@ -87,12 +93,30 @@ impl Command for Skip {
let ctrlc = engine_state.ctrlc.clone();
let input_span = input.span().unwrap_or(call.head);
match input {
PipelineData::ByteStream(stream, ..) => Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "list, binary or range".into(),
wrong_type: "byte stream".into(),
dst_span: call.head,
src_span: stream.span(),
}),
PipelineData::ByteStream(stream, metadata) => {
if stream.type_() == ByteStreamType::Binary {
let span = stream.span();
if let Some(mut reader) = stream.reader() {
use std::io::Read;
// Copy the number of skipped bytes into the sink before proceeding
std::io::copy(&mut (&mut reader).take(n as u64), &mut std::io::sink())
.err_span(span)?;
Ok(PipelineData::ByteStream(
ByteStream::read(reader, call.head, None, ByteStreamType::Binary),
metadata,
))
} else {
Ok(PipelineData::Empty)
}
} else {
Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "list, binary or range".into(),
wrong_type: stream.type_().describe().into(),
dst_span: call.head,
src_span: stream.span(),
})
}
}
PipelineData::Value(Value::Binary { val, .. }, metadata) => {
let bytes = val.into_iter().skip(n).collect::<Vec<_>>();
Ok(Value::binary(bytes, input_span).into_pipeline_data_with_metadata(metadata))
Expand Down
32 changes: 26 additions & 6 deletions crates/nu-command/src/filters/take/take_.rs
Original file line number Diff line number Diff line change
Expand Up @@ -78,12 +78,32 @@ impl Command for Take {
stream.modify(|iter| iter.take(rows_desired)),
metadata,
)),
PipelineData::ByteStream(stream, ..) => Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "list, binary or range".into(),
wrong_type: "byte stream".into(),
dst_span: head,
src_span: stream.span(),
}),
PipelineData::ByteStream(stream, metadata) => {
if stream.type_() == ByteStreamType::Binary {
if let Some(reader) = stream.reader() {
use std::io::Read;
// Just take 'rows' bytes off the stream, mimicking the binary behavior
Ok(PipelineData::ByteStream(
ByteStream::read(
reader.take(rows_desired as u64),
head,
None,
ByteStreamType::Binary,
),
metadata,
))
} else {
Ok(PipelineData::Empty)
}
} else {
Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "list, binary or range".into(),
wrong_type: stream.type_().describe().into(),
dst_span: head,
src_span: stream.span(),
})
}
}
PipelineData::Empty => Err(ShellError::OnlySupportsThisInputType {
exp_input_type: "list, binary or range".into(),
wrong_type: "null".into(),
Expand Down

0 comments on commit c61075e

Please sign in to comment.