Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tee-Object -Process #20123

Open
dkaszews opened this issue Aug 16, 2023 · 40 comments
Open

Tee-Object -Process #20123

dkaszews opened this issue Aug 16, 2023 · 40 comments
Assignees
Labels
Issue-Enhancement the issue is more of a feature request than a bug WG-Engine core PowerShell engine, interpreter, and runtime WG-NeedsReview Needs a review by the labeled Working Group

Comments

@dkaszews
Copy link
Contributor

dkaszews commented Aug 16, 2023

Summary of the new feature / enhancement

Tee-Object could use a new parameter Process, accepting a scriptblock which will be invoked for every item going through it. Example usecases:

Proposed technical implementation details (optional)

Opens:

  1. Is there some magic we use to split and connect pipes, so that it correctly calls Begin, Process, End? Without it, we may either accidentally call the cmdlet for each object separately, causing each of them to clobber the last. This means we cannot simply implement it as ForEach-Item { $_ | $Process; $_ }:
    > 0..4 | ConvertTo-Json
    [
    	0,
    	1,
    	2,
    	3,
    	4
    ]
    > 0..4 | %{ ConvertTo-Json }
    0
    1
    2
    3
    4
  2. Similarly, if we first bulk all of the piped items and only later push them into the secondary pipe, this will break streaming, as with implementation with: ForEach-Object -Begin { $pipe = [List[object]]:new() } -Process { $pipe.Add($_); $_ } -End { $pipe | $Process }. It will be especially visible with usecase 1.: Tee-Object -Process { Write-Host } will only start outputting once the main pipeline has finished.
  3. How should we treat implicit or explicit output from the secondary pipe? Should it affect the primary pipe, should it be written to host, or should it be ignored? For ForEach-Object, the first is true, as can be seen with 0..4 | %{ $_; $_ } | Measure-Object outputting 10.
  4. If we can split the pipeline properly, do we have to worry about one part of it mutating the pipe for the other? In other words, what should be the order of execution of Process? Secondary pipe then primary, other way around, concurrent, undefined?
    Get-ChildItem | Tee-Object -Process { $_.Length = 123 } | ConvertTo-Json > files.json
    # Should the resulting JSON contain original Length, or all 123, as if with `ForEach-Object`?
    Get-ChildItem | Tee-Object -Process { ConvertTo-Json > files.json } | %{ $_.Length = 123 }
    # What if we switch main and secondary pipes around?
  5. Is the syntax I have been showing with scriptblock omitting $_ and pretending all the items are piped from the left even legal or reasonable? I don't see how to express some of the intentions I have shown without it, unless we make $_ in Tee-Object -Process { $_ | Export-Csv files.csv } mean the entire pipeline. Otherwise, maybe some new symbol like $% would make sense, as % is commonly already associated with item processing as default alias for ForEach-Object.
@dkaszews dkaszews added Issue-Enhancement the issue is more of a feature request than a bug Needs-Triage The issue is new and needs to be triaged by a work group. labels Aug 16, 2023
@jhoneill
Copy link

jhoneill commented Aug 17, 2023

Is the syntax I have been showing with scriptblock omitting $_ and pretending all the items are piped from the left even legal or reasonable? I don't see how to express some of the intentions I have shown without it, unless we make $_ in Tee-Object -Process { $_ | Export-Csv files.csv }

YES
{ $_ | Export-Csv files.csv } and { $_} | Export-Csv files.csv are very different. The first writes each object to the file, and overwrites it with the next object. The second writes all the objects to a file.

The more I've pushed this around, the more I think you need to have a second pipeline. "Tee" needs to push objects so that export-csv in this sees them as a sequence of input objects from the pipeline. And nothing can leak from pipeline that export-csv is in, back to the main one

I've recycled some stuff I know from Proxy functions to create and run a steppable pipeline as a "Tee" or "Fork". There is probably a more elegant way of preventing the that pipeline sending data into the following command. but this is just a p.o.c :-)

function Invoke-Fork {
    param( 
        [Parameter(Position=0,Mandatory=$true)]  
        [scriptblock]$forkBlock  ,
        [Parameter(ValueFromPipeline=$true)]
        $InputObect 
    )
    begin     { try {
                    $steppablePipeline = $forkBlock.GetSteppablePipeline($myInvocation.CommandOrigin)
                    $null = $steppablePipeline.Begin($PSCmdlet) } 
        catch { throw }
    }
    process   { try {$null  =  $steppablePipeline.Process($_) ;   Write-Output $_  }  catch { throw}}
    end       { try {$null  =  $steppablePipeline.End() } catch { throw }}
    clean     { if  ($null -ne $steppablePipeline) { $null = $steppablePipeline.Clean() }  }
}

So this works

dir ~ | Invoke-Fork {Export-csv -Path $env:TEMP\stuff.csv  } | ft Name,Length

and the block can be a multi-command pipeline

dir ~ | Invoke-Fork {Select-Object -Property  name,LastWriteTime  |  Export-csv -Path $env:TEMP\stuff.csv  } | ft Name,Length

My addition of | out-null works

dir ~ | Invoke-Fork {Select-Object -Property  name,LastWriteTime  } | ft Name,Length

Write-host works - what gets printed when is harder to follow.

dir ~ | Invoke-Fork {Measure-Object -SUM LENGTH | % SUM | Write-Host } | ft Name,Length

Roughly what you're looking to do ?

@dkaszews
Copy link
Contributor Author

@jhoneill Will have to play around it, but this looks amazing 🤩

@mklement0
Copy link
Contributor

mklement0 commented Aug 18, 2023

A steppable pipeline looks promising, but note that Invoke-Fork doesn't prevent "pollution" of the pipeline with output from the script block at all, given that $steppablePipeline.Process($_) itself doesn't produce output and writes directly to the pipeline; in other words: $null = ... has no effect.

PS> 1..3 | Invoke-Fork { % { -$_ } }
-1
1
-2
2
-3
3

Assuming it is possible to capture the output from a steppable pipeline (which should then be a separate one), the logic could be as follows:

  • Silence the output by default (this would work with a save-to-file command such as Export-Csv) ...
  • ... unless -Variable or -FilePath are also specified, in which case capture the output in the specified variable or save it to the specified file.
  • Update: In combination with the new -Host switch (see Tee-Object should have a -Console parameter #19827), print directly to the host (display).

@dkaszews
Copy link
Contributor Author

@mklement0 How is Measure-Command implemented? Because I noticed that one does silence all output unless you explicitly pipe the measured command into Out-Default.

@mklement0
Copy link
Contributor

mklement0 commented Aug 18, 2023

Good question, @dkaszews.

It uses the internal .InvokeWithPipe() method directly on the input ScriptBlock (-Expression argument), and uses a null pipe as the output (new Pipe { NullPipe = true }):

Expression.InvokeWithPipe(
useLocalScope: false,
errorHandlingBehavior: ScriptBlock.ErrorHandlingBehavior.WriteToCurrentErrorPipe,
dollarUnder: InputObject, // $_
input: Array.Empty<object>(), // $input
scriptThis: AutomationNull.Value,
outputPipe: new Pipe { NullPipe = true },
invocationInfo: null);

Note that the script block is invoked for every input object, with $_ bound to that object, as the following example demonstrates:

1..3 | Measure-Command { $_ | Out-Host; pause }

@dkaszews
Copy link
Contributor Author

dkaszews commented Aug 18, 2023

@mklement0 So while the PowerShell PoC is not able to replicate it, it should be relatively easy to do in the core C#?

The whole idea behind this proposal was to replicate some of the stuff you can do in Linux with tee and process substitution. I wonder whether a better idea would be to port mkfifo and allow to do this kind of plumbing directly in PowerShell.

Edit: sorry, I missed the last part. So the Measure-Command does not support streaming objects, which is the issue I mentioned in open 1., might be a dead end then.

@mklement0
Copy link
Contributor

@dkaszews, if I understand correctly, what is needed is a steppable pipeline that is separate from the pipeline in which the enclosing command runs, is invoked input object by input object from the enclosing command's .ProcessRecord() method, and whose output can either be suppressed (by default) or collected (-Variable) or redirected to a file (-FilePath) separately.

I'm not familiar enough with the SDK to know if this possible, but perhaps @SeeminglyScience can weigh in.

@jhoneill
Copy link

That's odd. I posted a version of "invoke fork" with a kludged on addition of "| out-Null" (I referred to that here #20133 (comment) ) but it seems to have gone in editing the post. it was
$forkblock = [scriptblock]::create( $forkblock.tostring() + ' | out-null' )

I think there may be an important difference with Measure-Command because that isn't starting a pipe into which objects can be pushed.

@mklement0
Copy link
Contributor

mklement0 commented Aug 18, 2023

Good point - Out-Null can be integrated into the pipeline in the steppable script block, and the -FilePath and -Variable cases can be handled that way too - see the proof of concept below, which builds on the previous one (Invoke-WithFork).

I suspect there is a more elegant way of creating the steppable script block with the added pipeline segments than basing it on the string representation of the input script block, which could have side effects.

Note that the -Variable case sets the output variable via Set-Variable -Scope 1, which only works if the function isn't part of a script module (a moot point in a cmdlet implementation).

Thus, you can do the following:

$objects = [pscustomobject] @{ Num = 1 }, [pscustomobject] @{ Num = 2 }

# Neither -FilePath nor -Variable specified: script-block output, if any, is discarded.
$objects | Tee-WithProcess { Export-Csv out.csv } | 
  Out-Host

'---'

# Combination with the -Host switch: print the output to the host (display).
$objects | Tee-WithProcess { Format-List } -Host  | 
  Out-Host

'---'

# -FilePath: Output (via Out-File) to the specified file
# UPDATE: The p.o.c now treats a script block that doesn't contain a *single command*
#        or single pipeline that *starts with a command*  implicitly as if it were enclosed 
#        in `ForEach-Object`,  so no explicit use of the latter is needed anymore.
$objects | Tee-WithProcess { -$_.Num  } -FilePath out.txt | 
  Out-Host

'---'

# -Variable: Output is collected in the specified variable ($out)
$objects | Tee-WithProcess { -$_.Num  } -Variable out |
  Out-Host

Tee-WithProcess source code:

function Tee-WithProcess {
  param( 
    [Parameter(Position = 0, Mandatory)]  
    [scriptblock] $Process,
    [Parameter(ValueFromPipeline)]
    $InputObject,
    [string] $Variable,
    [string] $FilePath,
    [Alias('Host')] # Note: In a function implementation $Host cannot be used.
    [switch] $ToHost
  )
  begin {
    try {
      # Rule out mutually exclusive switches.
      if ((0, 1)[[bool] $Variable] + (0, 1)[[bool] $FilePath] + $ToHost.IsPresent -gt 1) { throw "-Variable, -FilePath, and -Host are mutually exclusive." }
      # Determine if the content of the script block is a single that statement comprising
      # either a single *command* or a single pipeline that *starts with* a command
      # which is assumed to *directly receive all pipeline input* (e.g, `Export-Csv`).
      # If not, assume it represents code to be run *for each input object*, 
      # with $_ reflecting each, and therefore wrap it in a ForEach-Object call.
      $isCommandPipeline =  ($Process.Ast.EndBlock.Statements.Count -eq 1 -and $Process.Ast.EndBlock.Statements[0].PipelineElements[0] -is [System.Management.Automation.Language.CommandAst])
      # NOTE: The implementation the block for the steppable pipeline via 
      #       recreation of the script block from its *string representation* in the single-command case
      #       is SUBOPTIMAL, but was chosen for simplicity in this proof-of-concept.
      $steppableScriptBlock = 
      [scriptblock]::Create($(
        if ($ToHost) {
          $isCommandPipeline ? 
            [scriptblock]::Create("$Process | Out-Host")
            : 
            { ForEach-Object -Process $Process | Out-Host }
        } elseif ($FilePath) {
          $isCommandPipeline ? 
            [scriptblock]::Create("$Process | Out-File -LiteralPath `"$FilePath`"")
            : 
            { ForEach-Object -Process $Process | Out-File -LiteralPath $FilePath }
        } elseif ($Variable) {
          $isCommandPipeline ? 
            [scriptblock]::Create("$Process | Write-Output -NoEnumerate -OutVariable aux | ForEach-Object -Process {} -End { Set-Variable -Scope 1 `"$Variable`" -Value `$aux }")
            :
            { ForEach-Object -Process $Process | Write-Output -NoEnumerate -OutVariable aux | ForEach-Object -Process {} -End { Set-Variable -Scope 1 $Variable -Value $aux } }
        } else {
          $isCommandPipeline ? 
            [scriptblock]::Create("$Process | Out-Null")
            : 
            { ForEach-Object -Process $Process | Out-Null }
        }
      ))
      $steppablePipeline = $steppableScriptBlock.GetSteppablePipeline($myInvocation.CommandOrigin)
      $steppablePipeline.Begin($PSCmdlet) 
    } 
    catch { $PSCmdlet.ThrowTerminatingError($_) }
  }
  process { try { $steppablePipeline.Process($InputObject) } catch { $PSCmdlet.ThrowTerminatingError($_) }; $InputObject }
  end     { try { $steppablePipeline.End() } catch { $PSCmdlet.ThrowTerminatingError($_) } }
  clean   { if ($null -ne $steppablePipeline) { $steppablePipeline.Clean() } }
}

@dkaszews
Copy link
Contributor Author

Any reason why you need [Parameter(ValueFromPipeline)] $InputObect? The typo clearly shows it's not used anywhere. Or is it just needed to enable piping into the function, even though we access those values via $_?

@mklement0
Copy link
Contributor

mklement0 commented Aug 18, 2023

Thanks for catching that, @dkaszews. You do need a ValueFromPipeline pipeline-binding parameter, and while its name doesn't matter if you use $_ in the process block instead, for clarity it's better to use the declared parameter there. I've fixed the code accordingly.

@dkaszews
Copy link
Contributor Author

I played around trying to rebind the output pipe to either null, file or variable without converting the scriptblock to string and back, but no dice. { $Process | Out-Null } throws error in GetSteppablePipeline about it being empty (?), while { %{ $Process | Out-Null } } causes the $Process to never be invoked, probably one of ForEach-Object does not pass the $_ along.

Maybe it is solveable with a smarter wrapper, maybe it is not possible without delving into engine internals. I poked around engine/lang/scriptblock.cs, it looks like there might be some way to sever or rebind the output pipe, but somebody more knowledgeable should chime in.

For the testing, I used the following to distinguish the host and pipe output, it nicely shows whether the $Process is invoked and if so, whether it leaks.

1..3 |
  Tee-WithProcess { %{ Write-Host 'Invoked'; 'Leaked' } } |
  ConvertTo-Json -Compress |
  Should -Be '[1,2,3]'

@jhoneill
Copy link

I played around trying to rebind the output pipe to either null, file or variable without converting the scriptblock to string and back, but no dice. { $Process | Out-Null } ....

That's not going to work . At the very least it would to be { & $Process | Out-Null } to run the script block in the variable but that only seems to run as an END block ...

The code I thought I originally posted

function Invoke-Fork {
    param( 
        [Parameter(Position=0,Mandatory=$true)]  
        $ForkBlock,
        [Parameter(ValueFromPipeline=$true)]
        $InputObect 
    )
    begin   {try   {
                    $ForkBlock = [scriptblock]::Create( $forkBlock.ToString() + " | Out-Null" )
                    $steppablePipeline = $ForkBlock.GetSteppablePipeline($myInvocation.CommandOrigin)
                    $steppablePipeline.Begin($PSCmdlet) } 
             catch {throw }
    }
    process {try   {$steppablePipeline.Process($_) ; Write-Output $_} catch {throw}}
    end     {try   {$steppablePipeline.End() } catch { throw }}
    clean   {if    ($steppablePipeline) { $null = $steppablePipeline.Clean() }  }
}

Passes your test but with the convert to/from string

#553 [——€] PS7 ~> 1,2,3 | Invoke-Fork { %{ Write-Host 'Invoked'; 'Leaked' } } |   ConvertTo-Json -Compress  
Invoked
Invoked
Invoked
[1,2,3]

I changed the begin block

                    #$ForkBlock = [scriptblock]::Create( $forkBlock.ToString() + " | Out-Null" )
                    $F2 = { Foreach-object -Process { & $ForkBlock | Out-Null}}
                    $steppablePipeline = $F2.GetSteppablePipeline($myInvocation.CommandOrigin)

That works without the round-trip to string and back.

@iRon7
Copy link

iRon7 commented Aug 19, 2023

I wonder if it makes sence to accept a string type besides a ScriptBlock type for the Process parameter which would represent a (enumerated) member of the current object similar to some other cmdlets:

... | Tee-Object -Process  Name  -FilePath out.txt | ...

@mklement0
Copy link
Contributor

mklement0 commented Aug 19, 2023

{ ForEach-Object -Process { & $ForkBlock | Out-Null} } (better: { ForEach-Object -Process $Process | Out-Null }) is tempting, but it wouldn't work with Tee -Process { Set-Content foo.txt } for instance, i.e. with single command invocations as opposed to an input script block that itself happens to use ForEach-Object (e.g., % { Write-Host 'Invoked'; 'Leaked' }).


@iRon7, I like the idea of simplifying with a string, but the question is what the string should represent.

I just updated the proof-of-concept with the future -Host switch, which should also work with the -Process parameter, so that you can do, for instance:

# "Tee" the Format-List output to the host.
$date = Get-Date | Tee-WithProcess { Format-List } -Host

You could argue that being able to pass Format-List as a string is the more natural candidate for string use than the name of a property on the input, though it would only work with parameter-less invocations.

# Possibly allow specifying a parameter-less command name as a *string*.
$date = Get-Date | Tee-WithProcess Format-List -Host

While a simple way to access a property (or several) on the input objects is appealing too, it would call for a new -Property parameter that is mutually exclusive with -Process, and that may be trying to pack too much into the command.

Sticking with just -Process, being able to specify a property name is the bigger convenience, given that -Process Name then saves you from having to pass -Process { ForEach-Object { $_.Name } } - but I wonder if it's counterintuitive.

Taking a step back: Conceivably, there could be two closely related, mutually exclusive parameters (names negotiable):

  • -ProcessCommand: the current -Process proposal; i.e. its script block is treated as being a segment in the implied pipeline with the input objects as the first segment; a string argument could then refer to a command name that doesn't require arguments.
$objects | Tee-WithProcess -ProcessCommand { Export-Csv out.csv }
$objects | Tee-WithProcess -ProcessCommand Format-List -Host # short for: -ProcessCommand { Format-List }
  • -ProcessEach: its script block is processed for each input object, i.e. it is implicitly the -Process argument of a ForeEach-Object call; a string argument could then refer to a property.
$objects | Tee-WithProcess -ProcessEach { "Processing $($_.Num)..."  } -Host
$objects | Tee-WithProcess -ProcessEach Num -Variable numbers  # short for: -ProcessEach { $_.Num }

@iRon7
Copy link

iRon7 commented Aug 20, 2023

@mklement0,

I was just thinking the same and even possibly also supporting multiple properties, this would than result in a scalar

$objects | Tee-WithProcess -ProcessEach Num -Variable numbers

or an object in case an array is supplied as argument, e.g.:

$objects | Tee-WithProcess -ProcessEach Num, Name -Host
$objects | Tee-WithProcess -ProcessEach ,Name -Variable # using the unary comma for an object with a single property

You might even think of calculated properties here, but as you said, it may might pack too much into the command

@dkaszews
Copy link
Contributor Author

To test some of the approaches, I have added begin and end to the test payload in order to distinguish cases where the action is invoked for all items together (correct) and separately (incorrect, may overwrite previous result):

> $ProcessSimple = { %{ Write-Host "Tee-Process process: $_"; "Tee-Process return: $_" } }
> $ProcessAdvanced = {
  begin { Write-Host 'Tee-Process begin' }
  process { Write-Host "Tee-Process process: $_"; "Tee-Process return: $_" }
  end { Write-Host 'Tee-Process end' }
}
> $ProcessContent = { Set-Content 'process.log' }

I then created different versions of the Tee-WithProcess, differing in line $steppableScriptBlock = ...:

  1. = $Process - create pipeline directly from parameter:

    1. $ProcessSimple - leaks:

      > 1..3 | Tee-WithProcess $ProcessSimple | ConvertTo-Json -Compress
      Tee-Process process: 1
      Tee-Process process: 2
      Tee-Process process: 3
      [1,"Tee-Process return: 1",2,"Tee-Process return: 2",3,"Tee-Process return: 3"]
    2. $ProcessAdvanced - error I don't quite understand:

      > 1..3 | Tee-WithProcess $ProcessAdvanced | ConvertTo-Json -Compress
      Tee-WithProcess: The variable '$steppablePipeline' cannot be retrieved because it has not been set.
      Tee-WithProcess: Exception calling "GetSteppablePipeline" with "1" argument(s): "The script block cannot be conv
      erted because it contains more than one clause. Expressions or control structures are not permitted. Verify that
       the script block contains exactly one pipeline or command."
    3. $ProcessContent - works:

      > 1..3 | Tee-WithProcess $ProcessContent | ConvertTo-Json -Compress; Get-Content 'process.log'
      [1,2,3]
      1
      2
      3
  2. @mklement0's = [scriptblock]::create("$Process | Out-Null"):

    1. $ProcessSimple - works:

      > 1..3 | Tee-WithProcess $ProcessSimple | ConvertTo-Json -Compress
      Tee-Process process: 1
      Tee-Process process: 2
      Tee-Process process: 3
      [1,2,3]
    2. $ProcessAdvanced - syntax error due to sticking | Out-Null in invalid place:

      > 1..3 | Tee-WithProcess $ProcessAdvanced | ConvertTo-Json -Compress
      Tee-WithProcess: The variable '$steppablePipeline' cannot be retrieved because it has not been set.
      Tee-WithProcess: Exception calling "Create" with "1" argument(s): "At line:5 char:2
      +  | Out-Null
      +  ~
      unexpected token '|', expected 'begin', 'process', 'end', 'clean', or 'dynamicparam'."
    3. $ProcessContent - works:

      > 1..3 | Tee-WithProcess $ProcessContent | ConvertTo-Json -Compress; Get-Content 'process.log'
      [1,2,3]
      1
      2
      3
  3. @jhoneill's = { %{ & $Process | Out-Null } }:

    1. $ProcessSimple - works:

      > 1..3 | Tee-WithProcess $ProcessSimple | ConvertTo-Json -Compress
      Tee-Process process: 1
      Tee-Process process: 2
      Tee-Process process: 3
      [1,2,3]
    2. $ProcessAdvanced - fails to forward pipe items, invokes action separately for each item:

      > 1..3 | Tee-WithProcess $ProcessAdvanced | ConvertTo-Json -Compress
      Tee-Process begin
      Tee-Process process:
      Tee-Process end
      Tee-Process begin
      Tee-Process process:
      Tee-Process end
      Tee-Process begin
      Tee-Process process:
      Tee-Process end
      [1,2,3]
    3. $ProcessContent - fails to forward pipe items:

      > 1..3 | Tee-WithProcess $ProcessContent | ConvertTo-Json -Compress; Get-Content 'process.log'
      
      cmdlet Set-Content at command pipeline position 1
      Supply values for the following parameters:
      Value[0]:
  4. @mklement0's improvement on @jhoneill with = { ForEach-Object -Process $Process | Out-Null }

    1. $ProcessSimple - works:

      > 1..3 | Tee-WithProcess $ProcessSimple | ConvertTo-Json -Compress
      Tee-Process process: 1
      Tee-Process process: 2
      Tee-Process process: 3
      [1,2,3]
    2. $ProcessAdvanced - error I don't quite understand:

      > 1..3 | Tee-WithProcess $ProcessAdvanced | ConvertTo-Json -Compress
      Tee-WithProcess: The script block cannot be invoked because it contains more than one clause. The Invoke() metho
      d can only be used on script blocks that contain a single clause.
    3. $ProcessContent - fails to forward pipe items:

      > 1..3 | Tee-WithProcess $ProcessContent | ConvertTo-Json -Compress; Get-Content 'process.log'
      
      cmdlet Set-Content at command pipeline position 1
      Supply values for the following parameters:
      Value[0]:

@dkaszews
Copy link
Contributor Author

dkaszews commented Aug 20, 2023

I did couple more experiments to try to replicate the errors shown above with the following scriptblocks, not wrapped in ForEach-Object:

$ProcessSingle = { Write-Host 'test' }
$ProcessMulti = { Write-Host $_; $_ }

With version 1. and 2. I got the following two results:

> 1..3 | Tee-WithProcess $ProcessSingle | ConvertTo-Json -Compress
Tee-WithProcess: The input object cannot be bound to any parameters for the command either because the command d
oes not take pipeline input or the input and its properties do not match any of the parameters that take pipelin
e input.

> 1..3 | Tee-WithProcess $ProcessMulti | ConvertTo-Json -Compress
Tee-WithProcess: The variable '$steppablePipeline' cannot be retrieved because it has not been set.
Tee-WithProcess: Exception calling "GetSteppablePipeline" with "1" argument(s): "Only a script block that contai
ns exactly one pipeline or command can be converted. Expressions or control structures are not permitted. Verify
 that the script block contains exactly one pipeline or command."

The first one is relatively clear - raw scriptblock does not accept pipeline input by default, which is why it errors on Process(). The second one is a bit weirder and seems to match some of the errors in earlier experiments.

With version 3. and 4. which wrap the $Process in ForEach-Object themselves, the results are as expected:

> 1..3 | Tee-WithProcess $ProcessSingle | ConvertTo-Json -Compress
test
test
test
[1,2,3]

> 1..3 | Tee-WithProcess $ProcessMulti | ConvertTo-Json -Compress
1
2
3
[1,2,3]

@dkaszews
Copy link
Contributor Author

I did a test regarding open 4., and unfortunately, as expected, there is nothing to stop the side-process from mutating the objects passing through, unless we deep copy them which would likely kill performance. Man, do I wish every language had C++'s const&.

> 1..3 | %{ @{ x = $_ } } | Tee-WithProcess { %{ $_['x'] = 0 } } | ConvertTo-Json -Compress
[{"x":0},{"x":0},{"x":0}]

> 1..3 | %{ @{ x = $_ } } | ConvertTo-Json -Compress
[{"x":1},{"x":2},{"x":3}]

The question now is, if we have to trust the $Process to not mutate the objects passing through, why not trust it to not pollute the output? Simply document it saying that any output is passed down the pipeline and that the user should Out-Null it if needed. It's similar how a function using List<T>.Remove(T) may accidentally return an extra bool if you don't explicitly ignore its output.

@jhoneill
Copy link

I wonder if it makes sence to accept a string type besides a ScriptBlock type for the Process parameter which would represent a (enumerated) member of the current object similar to some other cmdlets:

... | Tee-Object -Process  Name  -FilePath out.txt | ...

That's the equivalent of | Tee-Object -FilePath out.txt | foreach-object Name ... I'm wary of "being everything to everyone"

Is -Variable any different from -outVariable or -PipelineVariable common parameters.

To test some of the approaches, I have added begin and end to the test payload
Woah. Stop. You can't have that as a payload for a -Process block in a ForEach-object - I've seen that can't be created as a steppable pipeline.

If you want that you'll need to do it with Tee-Object -begin... -process ... -end...

, if we have to trust the $Process to not mutate the objects passing through, why not trust it to not pollute the output?

Depending on whether Process runs before or after the object is passed on to success pipeline - things later in the success pipeline could mutate the object (or some other external variable) before process runs. There's general common sense about being very clear what changes are made to a dataset when two sets of code work on it concurrently because of possible advantages and traps to fall into making changes

I think, though, if it didn't catch "stray" object and people had to put their own Out-Null in, the first change that would be requested would be to make Out-Null automatic.

@mklement0
Copy link
Contributor

@iRon7, especially with a -ProcessEach parameter (more on that later) I'd keep it simple and only allow one property name, analogous to how you can only use one in explicit ForEach-Object calls (e.g., 'foo', 'bar!' | ForEach-Object Length)

If you want multiple properties with Select-Object functionality, -ProcessCommand { Select-Object Name, Num } doesn't strike me as a hardship.

(As an aside: , Name as a command argument would cause a syntax error; would require (, 'Name')).


@dkaszews, regarding not being to prevent mutation of the teed objects:

  • Not doing so - if unwanted - would have to be documented as the user's responsibility.
  • Conversely, however, it can be conceived of as a feature: allow transformation of the teed objects before passing them on.

E.g.:

# Equivalent of:
#   $objects | ForEach-Object  { $_.Num++; $_ } | Export-Csv out.csv 
$objects | Tee-Object { $_.Num++ } | Export-Csv out.csv

I'll elaborate on the -ProcessEach vs. -ProcessCommand parameters further with an idea to get away with one parameter, -Process, but regarding your experiments:

With the -Process parameter as in my current proof-of-concept, whatever is in the supplied script block becomes a pipeline segment, with the input objects as the implied first segment, so that only what syntactically works there can work.
In other words: only a command call can be inside the -Process script block

E.g., $objects | Tee-WithProcess { Export-Csv out.csv } is equivalent to $objects | Export-Csv out.csv, and $objects | Tee-WitProcess { $_.Num } is equivalent to $objects | $_.Num, which fails, because expressions cannot be used in non-initial pipeline segments.

Thus, your $ProcessSimple and $ProcessContent examples are the same basic use case (passing a command: % (ForEach-Object) / Set-Content), which works, and $ProcessAdvanced doesn't (it lacks a script-block enclosure and invocation with either . or &)

Switching the implementation of calling the -Process script block to { ForEach-Object -Process $Process | Out-Null } - which in isolation is not an option, because it prevents use of commands, but would be appropriate and necessary for -ProcessEach - would make the case of passing an expression work - assuming it operates on $_ (e.g., -Process { $_.Num }) - that's why $ProcessContent then doesn't work: Set-Content is then called for each input object and receives no input.

@mklement0
Copy link
Contributor

mklement0 commented Aug 20, 2023

Based on the above it occurred to me that there's no need for distinct -ProcessCommand and -ProcessEach parameters and that it can be inferred from the -Process argument's content (via AST analysis) whether it represents a single command (or a single pipeline that starts with a command) that can act as its own / subsequent pipeline segment(s) and therefore is invoked once, or an expression (or multiple commands) that expects to be invoked for each input object, and must therefore be invoked implicitly via ForEach-Object.

I've updated the proof-of-concept accordingly, so that the following now works:

# Single-command case
$objects | Tee-WithProcess { Export-Csv out.csv }

# Expression case with implicit ForEach-Object
$result = $objects | Tee-WithProcess { 'Processing: ' + $_.Num } -Host

In the rare event that a single command is passed that should be called for each object, simply wrap it in ForEach-Object explicitly (e.g. 1..3 | Tee-WithProcess { % { Write-Host -ForegroundColor ($_ % 2 ? 'Red' : 'White') $_ } })

However, with the return to a single -Process parameter the question how to interpret a string in lieu of a script block returns:

  • One option: If the string is the name of an existing command (e.g, Format-List), treat it as such; otherwise (e.g., Num, treat it as a property name and therefore like { $_.Num }.
  • Another option: given the simplicity of { $_.Num } (no more need for ForEach-Object), make do without the string option.

@dkaszews
Copy link
Contributor Author

@jhoneill

Woah. Stop. You can't have that as a payload for a -Process block in a ForEach-object - I've seen that can't be created as a steppable pipeline.
If you want that you'll need to do it with Tee-Object -begin... -process ... -end...

Thanks, I was confused because 1..3 | & $ProcessAdvanced | ConvertTo-Json -Compress worked fine, I confused & call operator with % ForEach-Object alias.

I rewrote it to:

$ProcessForeach = {
    ForEach-Object `
        -begin { Write-Host 'Tee-Process begin' } `
        -process { Write-Host "Tee-Process process: $_"; "Tee-Process return: $_" } `
        -end { Write-Host 'Tee-Process end' }
}

and now it works correctly with version 2. (strings), but fails with 3. and 4. due to extra begin/end calls:

> 1..3 | Tee-WithProcess -Version 1 $ProcessForeach | ConvertTo-Json -Compress
Tee-Process begin
Tee-Process process: 1
Tee-Process process: 2
Tee-Process process: 3
Tee-Process end
["Tee-Process return: 1",1,"Tee-Process return: 2",2,"Tee-Process return: 3",3]

> 1..3 | Tee-WithProcess -Version 2 $ProcessForeach | ConvertTo-Json -Compress
Tee-Process begin
Tee-Process process: 1
Tee-Process process: 2
Tee-Process process: 3
Tee-Process end
[1,2,3]

> 1..3 | Tee-WithProcess -Version 3 $ProcessForeach | ConvertTo-Json -Compress
Tee-Process begin
Tee-Process process: 1
Tee-Process end
Tee-Process begin
Tee-Process process: 2
Tee-Process end
Tee-Process begin
Tee-Process process: 3
Tee-Process end
[1,2,3]

> 1..3 | Tee-WithProcess -Version 4 $ProcessForeach | ConvertTo-Json -Compress
Tee-Process begin
Tee-Process process: 1
Tee-Process end
Tee-Process begin
Tee-Process process: 2
Tee-Process end
Tee-Process begin
Tee-Process process: 3
Tee-Process end
[1,2,3]

Depending on whether Process runs before or after the object is passed on to success pipeline - things later in the success pipeline could mutate the object (or some other external variable) before process runs.

I tried swapping around $SteppablePipeline.Process($InputObject) and Write-Output $InputObject around, it doesn't seem to affect it. I think it's because Write-Object is buffered until the process block is fully finished, as this article suggests.

I think, though, if it didn't catch "stray" object and people had to put their own Out-Null in, the first change that would be requested would be to make Out-Null automatic.

Sure, it's just that I don't a way to implement without @mklement0's string hackery, which I am afraid will have some odd side effects, and definitely performance hit of having to reparse the scriptblock.

In my ideal world, there would be a method SteppablePipeline.SetOutPipe which you could call with a scriptblock { Out-Null }, { Set-Content } or { Set-Variable } depending on which Tee-Object parameters.

@mklement0

regarding not being to prevent mutation of the teed objects:
Not doing so - if unwanted - would have to be documented as the user's responsibility.
Conversely, however, it can be conceived of as a feature: allow transformation of the teed objects before passing them on.

Sure, it has to be documented, but same is for ForEach-Object. On the other hand, I would consider using it as primary purpose abuse of notation, same as you could achieve 1..3 | ForEach-Object { Write-Host "Processing: $_ } with 1..3 | Group-Object { Write-Host "Processing: $_ } | Out-Null.

@iRon7

I agree with @mklement0, I think strings would be confusing, especially since without any of -FilePath, -Variable or -Host (up for grabs #19827), it would be discarded by the implicit Out-Null.

@dkaszews
Copy link
Contributor Author

dkaszews commented Aug 20, 2023

Whoa, did not know that scriptblock had Ast member. I may have found something interesting inside it (some output omitted for brevity):

> $ProcessAdvanced.Ast
BeginBlock         : begin { Write-Host 'Tee-Process begin' }
ProcessBlock       : process { Write-Host "Tee-Process process: $_"; "Tee-Process return: $_" }
EndBlock           : end { Write-Host 'Tee-Process end' }

> $ProcessForeach.Ast
BeginBlock         :
ProcessBlock       :
EndBlock           : ForEach-Object `
                             -begin { Write-Host 'Tee-Process begin' } `
                             -process { Write-Host "Tee-Process process: $_"; "Tee-Process return: $_" } `
                             -end { Write-Host 'Tee-Process end' }

We can distinguish advanced and simple scriptblocks, because the latter only have EndBlock (which I find surprising, because I thought that would be ProcessBlock. That means we can easily transform the former into the latter if needed.

But also it seems like we should be able to add | Out-Null at AST level instead of using strings:

> $Noisy = { %{ Write-Host "Processing: $_"; Write-Output "Returing: $_" } }
> $Quiet = { %{ Write-Host "Processing: $_"; Write-Output "Returing: $_" } | Out-Null }

> 1..3 | Tee-WithProcess -Version 1 $Noisy | ConvertTo-Json -Compress
Processing: 1
Processing: 2
Processing: 3
[1,"Returing: 1",2,"Returing: 2",3,"Returing: 3"]

> 1..3 | Tee-WithProcess -Version 1 $Quiet | ConvertTo-Json -Compress
Processing: 1
Processing: 2
Processing: 3
[1,2,3]

> $Noisy.Ast.EndBlock.Statements[0].PipelineElements.Extent.Text
%{ Write-Host "Processing: $_"; Write-Output "Returing: $_" }

> $Quiet.Ast.EndBlock.Statements[0].PipelineElements.Extent.Text
%{ Write-Host "Processing: $_"; Write-Output "Returing: $_" }
Out-Null

This shows that simply appending Out-Null to Statements[0].PipelineElements is exactly what I talked about in:

In my ideal world, there would be a method SteppablePipeline.SetOutPipe which you could call with a scriptblock { Out-Null }, { Set-Content } or { Set-Variable } depending on which Tee-Object parameters.

While everything under Ast is a ReadOnlyCollection, there should be some API used by parser to create AST that can be used to create the latter from the former.

@mklement0
Copy link
Contributor

mklement0 commented Aug 20, 2023

@dkaszews, indeed the recreate-the-script-block-from-its-string-representation problem needs solving, and it is now half solved: in the case where ForEach-Object is implicitly applied in the updated p.o.c, the original script block is used as-is.

Good pointers re AST; it's definitely possible to create a new AST that builds on an existing one, and then call .GetScriptBlock() on it, but I think that context may be lost in the process, and it may in effect not be better (though probably faster) than the recreation-from-string approach - but I don't know.

@mklement0
Copy link
Contributor

mklement0 commented Aug 20, 2023

If you want to pass an advanced script block to the updated p.o.c (which I imagine will rarely be necessary in practice), this is how you would do it (& would work too, though there's no good reason to use a child scope):

1..3 | Tee-WithProcess { . { begin { 'begin' } process { 10 * $_ } end { 'end' } } } -Host

Output (mix of host and success output):

begin
10
1
20
2
30
3
end

@jhoneill
Copy link

@dkaszews This getting harder to follow, but

I tried swapping around $SteppablePipeline.Process($InputObject) and Write-Output $InputObject around, it doesn't seem to affect it. I think it's because Write-Object is buffered until the process block is fully finished, as this article suggests.

The bit about Write-object does seem to be wrong.

function a {
# Sends output in all 3 blocks 
    begin   {Write-host " A begins"  ; 0    }
    process {Write-host " A process" ; 1,2,3}
    end     {Write-host " A end"     ; 4    }
}

function b { param ([Parameter(ValueFromPipeline)]$P)
    begin   {Write-host " b begins '$p'" }
    process {Write-host " b process '$p'"; write-output $p  #try $p or write-output $$P
             #if ($p -eq 2 ) {throw}   # test what happens to the others if a fatal error happens in the middle.  
    }
    end     {Write-host " b end $P"}
}

function c { param ([Parameter(ValueFromPipeline)]$P)
    begin   {Write-host " c begins '$p'" }
    process  {Write-host " c process '$p'" ; $P}
    end      {Write-host " c end $P"; }
    }

a | b | c 

Whether B uses Write-output $p or just $p C's process block sees each item before, B's process block sees the next one

 A begins
 b begins ''
 b process '0'
 c begins ''
 c process '0'
0
 A process
 b process '1'
 c process '1'
1
 b process '2'
 c process '2'
2
 b process '3'
 c process '3'
3
 A end
 b process '4'
 c process '4'
4
 b end 4
 c end 4

Note that A sending output in its begin block changes the order of begin and first process. Otherwise it is

 A begins
 b begins ''
 c begins ''
 A process
 b process '1'
 c process '1'
1
 b process '2'
 c process '2'
2

We can distinguish advanced and simple scriptblocks, because the latter only have EndBlock (which I find surprising, because I thought that would be ProcessBlock. That means we can easily transform the former into the latter if needed.

As a side thing functions which don't have begin / process /end run as an end block so

function d { param ([Parameter(ValueFromPipeline)]$P)  $p  }

> 1,2,3 | d 
3

Filters seem to have fallen out of use but a filter which doesn't specify begin / process /end run as a process block

> filter e { param ([Parameter(ValueFromPipeline)]$P)     $p  }  
1,2,3 | e 
1
2
3

@TobiasPSP TobiasPSP added the WG-Cmdlets general cmdlet issues label Sep 11, 2023
@SteveL-MSFT SteveL-MSFT added WG-Engine core PowerShell engine, interpreter, and runtime WG-NeedsReview Needs a review by the labeled Working Group labels Oct 4, 2023
@SteveL-MSFT
Copy link
Member

@PowerShell/wg-powershell-cmdlets discussed this and agree on the utility of adding this capability, however, we have concerns about how this will work in practice with live .NET objects. We defer to the @PowerShell/wg-powershell-engine group to discuss.

@SeeminglyScience
Copy link
Collaborator

We discussed this in the Engine WG but the more I look through the thread the more unsure I am about what the actual ask is.

It seems like it's either:

  1. A new command that essentially wraps the SteppablePipeline API
  2. Sugar for ForEach-Object { $_.DoSomethingWithDollarUnderHere(); $_ }

Are either of these correct? If not, can someone distill the discussion into a proposal we can discuss?

@SeeminglyScience SeeminglyScience added Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept and removed WG-Cmdlets general cmdlet issues labels Nov 28, 2023
@mklement0
Copy link
Contributor

From my perspective, the proposal is as follows - I hope @dkaszews and @iRon7 agree:

  • Add a [scriptblock]-typed -Process parameter to Tee-Object.

  • Used in isolation, it allows you to quietly process the teed objects, such as by calling Export-Csv (success output, if any, should be discarded).

  • Used in combination with a teeing target (-Host, -Variable, -FilePath) it allows you to pre-process the teed objects: that is, it is the output from the script block that is sent to the target; e.g. -Process { Format-Table -GroupBy Foo } -Host

  • As for how to treat the -Process script block: The proposal is to infer from the AST of the script block whether it is:

    • a single command (pipeline) to which all teed objects can therefore be piped directly (via a steppable pipeline); e.g. -Process { Export-Csv out.csv }
    • otherwise, it is assumed to be per-input-object processing code - akin to the -ScriptBlock argument for ForEach-Object - that operates on $_, e.g. -Process { $_ + 1 }

Examples and a proof-of-concept are in this comment.

The open questions are:

  • Conceptual: Is the inference of the intended treatment of the -Process script block (too) obscure? Hopefully it'll "just work" from the user's perspective.

  • Technical:

    • Is there a way to use the given -Process script block as-is, without having to re-create it via its string representation / AST (which is what the proof-of-concept does), where context may be lost?
    • If not, is this an acceptable limitation (that will rarely be noticed in practice) that simply needs documenting?

@SeeminglyScience SeeminglyScience removed the Waiting on Author The PR was reviewed and requires changes or comments from the author before being accept label Nov 28, 2023
@iRon7
Copy link

iRon7 commented Nov 29, 2023

@SeeminglyScience,

a single command (pipeline) to which all teed objects can therefore be piped directly (via a steppable pipeline); e.g. -Process { Export-Csv out.csv }

Just a thought: why not multiple?
By dividing each pipeline to a hashtable item with a key equal to expanded scriptblock: "<scriptblock>".
So that I can do things along with: -Process { Export-Csv E:\Test_csv\$($_.Appname).CSV }

@mklement0
Copy link
Contributor

@iRon7, I assume you mean supporting multiple script blocks, i.e. to type -Process as [scriptblock[]], right?

That sounds like a nice enhancement, though we'd have to make it clear how that behaves in combination with a teeing target: if I understand you correctly, we'd then get:

1..3 | Tee-Object -Process { Write-Output }, { $_ * 10 } -Host | Out-Null
1
10
2
20
3
30

@iRon7
Copy link

iRon7 commented Nov 29, 2023

@mklement0,

I assume you mean supporting multiple script blocks, i.e. to type -Process

No, what I meant is instead of a single pipeline, for each main process step check if the concerned pipeline is already started,
If that is the case process it: $Pipeline["$ProcessScriptBlock"].Process($_)
and if not, open a new pipeline:

$Pipeline["$ProcessScriptBlock"] = $ProcessScriptBlock.GetSteppablePipeline()
$Pipeline["$ProcessScriptBlock"].Begin()

And at the end close all the $ProcessScriptBlock pipelines

Basically as in these examples:

@iRon7
Copy link

iRon7 commented Nov 29, 2023

Prototype of what I had in mind:

function TeeObject {
    param([ScriptBlock]$Process)
    begin {
        $Pipeline = @{}
    }
    process {
        $Key = $ExecutionContext.InvokeCommand.ExpandString($Process.ToString())
        if (!$Pipeline.Contains($Key)) {
            $Pipeline[$Key] = $Process.GetSteppablePipeline()
            $Pipeline[$Key].Begin($True)
        }
        $Pipeline[$Key].Process($_)
        $_	# pass on the current item
    }
    end {
        $Pipeline.get_Values().End()
    }
}

$Csv = @'
name,  account,  Appname, count, users, List
name1, account1,     123,     1,     2, abc
name1, account2,     123,     2,     3, z
name1, account3,     123,     3,     6, xc
name1, account4,     123,     5,     7, df
name1, account5,     123,     5,     8, rg
name1, account6,     123,     7,     0, sfg
name2, account1,     456,     1,     1, dfg
name3, account1,     789,     3,     7, rt
name3, account2,     789,     7,     1, ert
'@

$Csv | ConvertFrom-Csv | TeeObject -Process { Export-Csv .\$($_.Appname).Csv } | Out-Null

@mklement0
Copy link
Contributor

@iRon7, while I can see the appeal of such functionality in principle, I don't think the implementation you're proposing is the way to go (and I'm not sure there is an alternative way that fits into this proposal, which doesn't preclude thinking about it separately, however):

  • The behavior is obscure and may situationally not be desired.
  • More importantly, given that string interpolation evaluates arbitrary subexpressions ($(...)), their repeated evaluation may be undesired, not just for performance reasons, but because those expressions may have side effects, potentially even destructive ones if repeated.

@SeeminglyScience SeeminglyScience closed this as not planned Won't fix, can't repro, duplicate, stale Nov 30, 2023
@microsoft-github-policy-service microsoft-github-policy-service bot removed the Needs-Triage The issue is new and needs to be triaged by a work group. label Nov 30, 2023
@SeeminglyScience
Copy link
Collaborator

(misclick fyi)

@rhubarb-geek-nz
Copy link

Tee-Object for two? Knife and fork? Why limit yourself to two?

As @iRon7 says

Just a thought: why not multiple?

Invoke-PipelineBroadcast runs multiple parallel script blocks and feeds the input pipeline records to each.

github
PSGallery
Example usage

@mklement0
Copy link
Contributor

@rhubarb-geek-nz, that's an interesting cmdlet, but it serves a different use case: sending a single pipeline's input to multiple, independent script blocks for output-less processing.

Here, we are talking about extension to Tee-Object, whose core mandate is to pass its pipeline input through, while allowing one additional target to also pass the input to.
(#20123 (comment) toyed with the idea of supporting multiple, -Process script blocks, but it was based on a misreading of @iRon7's intent, and I'm not convinced it is necessary.)

@rhubarb-geek-nz
Copy link

rhubarb-geek-nz commented May 16, 2024

@rhubarb-geek-nz, that's an interesting cmdlet, but it serves a different use case: sending a single pipeline's input to multiple, independent script blocks for output-less processing.

I have added a PassThru argument so the input pipeline can optionally go directly to the output pipeline in addition to all the script blocks. The script blocks can also do their own pipelines and outputs. Perhaps I should have called it Tee-Party

@rhubarb-geek-nz
Copy link

Tee-Party it is.

PS> Get-Alias -Definition Invoke-PipelineBroadcast

CommandType     Name                                               Version    Source
-----------     ----                                               -------    ------
Alias           Tee-Party                                          1.0.0      rhubarb-geek-nz.TeeParty

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Enhancement the issue is more of a feature request than a bug WG-Engine core PowerShell engine, interpreter, and runtime WG-NeedsReview Needs a review by the labeled Working Group
Projects
None yet
Development

No branches or pull requests

9 participants