Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I want return an empty array in a function, but it return a null value. #21547

Closed
5 tasks done
abgox opened this issue Apr 28, 2024 · 13 comments
Closed
5 tasks done

I want return an empty array in a function, but it return a null value. #21547

abgox opened this issue Apr 28, 2024 · 13 comments
Labels
Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a Resolution-Answered The question is answered.

Comments

@abgox
Copy link

abgox commented Apr 28, 2024

Prerequisites

Steps to reproduce

  • I want return an empty array in a function, but it doesn't do it, which has an unintended effect.
  • I also used the [array] cast type, which didn't work either.
  • The problem is that since I explicitly specified its return type, it should not return any other type.

  • Here, the Compare-Object error occurs because it does not return the expected array.
test
  • Here I used -is to see if it was an array, and returned false.
test2 test3

Expected behavior

- It should return an empty array.

Actual behavior

- It return null.

Error details

No response

Environment data

Name                           Value
----                           -----
PSVersion                      7.4.2
PSEdition                      Core
GitCommitId                    7.4.2
OS                             Microsoft Windows 10.0.26100
Platform                       Win32NT
PSCompatibleVersions           {1.0, 2.0, 3.0, 4.0…}
PSRemotingProtocolVersion      2.3
SerializationVersion           1.1.0.1
WSManStackVersion              3.0

Visuals

@abgox abgox added the Needs-Triage The issue is new and needs to be triaged by a work group. label Apr 28, 2024
@rhubarb-geek-nz
Copy link

rhubarb-geek-nz commented Apr 28, 2024

function test-function {
        return New-Object System.Collections.ArrayList
}

$arr = test-function

$arr.GetType()

gives

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     ArrayList                                System.Object

@abgox
Copy link
Author

abgox commented Apr 28, 2024

  • Thanks, it can solve the problem.
  • But, why @() be a problem?

@rhubarb-geek-nz
Copy link

rhubarb-geek-nz commented Apr 28, 2024

  • But, why @() be a problem?

You have my sympathy, PowerShell can be infuriating with lists of zero or one items where the list can magically evaporate.

My best explanation is PowerShell always tries to simplify, often turning lists of one item into just one item. This can be problematic in that you can't just write code dealing with lists, you have to test for the cases of none, one and some. Also in key places classic PowerShell differs from the behaviour of PowerShell core. Hence parameters like -AsArray, -NoEnumerate etc trying to undo what PowerShell insists on doing.

@mklement0
Copy link
Contributor

mklement0 commented Apr 28, 2024

There is no magic, only an initially perhaps surprising behavior that is fundamental to PowerShell:

In the PowerShell pipeline - which is invariably involved when producing output from a command (function, script, cmdlet, script block) - (most) enumerables (such as arrays, ArrayLists, ...) are auto-enumerated; that is, an enumerable's elements are sent one by one to the success output stream.

In other words: Unless you take extra steps (see below), the original enumerable is - predictably - lost, and in the success output stream you cannot tell the difference between outputting a single-element enumerable and the one element it contains, given that in both cases it is only the latter that is sent to the output stream.

An empty enumerable - such as in your case - sends "nothing" to the success output stream (pipeline), which is technically the [System.Management.Automation.Internal.AutomationNull]::Value singleton, which in effect behaves like $null in expression contexts and argument-based parameter binding (such as your case).

The success output stream is an open-ended stream of objects that itself has no notion of an array or a similar data structure: the objects in it can - and often are - processed one by one, as they are received, in which case the question of how to collect them for later processing doesn't arise.

Collecting stream output of necessity comes into play when you assign to a variable (e.g., $arr = test-function), or make command output participate in a larger expression (e.g., 'foo' + (test-function)), including use of $(...) and @(...) (except with array literals). Collecting a single object in the stream causes it to be collect as-is. It is only if two or more objects in the stream that a list-like data type is invariably needed for collection, in which case PowerShell invariably creates an [object[]]-typed array. For the reasons explained above, this array is unrelated to any originating enumerable type, which never participated as itself in the pipeline.

To send an instance of an enumerable type itself, as a whole to the success output stream, you must prevent auto-enumeration:

  • New-Object itself uses this technique, which is why @rhubarb-geek-nz's workaround is effective; New-Object's behavior is unusual among cmdlets (see below), but necessary in order to preserve the constructed instance as-is.
    • It is worth noting that the alternative, v5+ expression syntax for calling .NET type constructors does not exhibit this unusual behavior; that is, while New-Object object[] 0 sends the resulting array as itself to the success output stream, the otherwise equivalent expression [object[]]::new(0) is subject to auto-enumeration
  • The conceptually clearest expression of the intent to suppress auto-enumeration is to use Write-Output -NoEnumerate,
    but an often-seen shortcut is to use the unary form of , the array-constructor operator to create a transient helper array that wraps the output enumerable in a single-element array whose auto-enumeration then sends the enumerable itself to the success output stream.

In other words: The following techniques all work to output an empty array as a whole from your function:

# Using New-Object
function test-function { New-Object -object[] 0 }

# Using Write-Output -NoEnumerate
function test-function { Write-Output -NoEnumerate @() }

# Using a transitory single-element helper array wrapper
function test-function { , @() }

It is worth noting that auto-enumeration is a core PowerShell feature that you should generally not deviate from, especially in public-facing functions / cmdlets / scripts.

On a higher level of abstraction, one of PowerShell's core strength is its consistency, of which consistent behavior in output streams / in the pipeline is one aspect.

To put it in concrete terms: Users justifiably expect commands to output objects one by one rather than outputting list-like containers as a whole, especially given that the latter behavior will not behave as expected in the pipeline; e.g.:

# Expected, auto-enumerating streaming behavior (element-by-element streaming).
# Where-Object's script block is invoked once for each element.
# -> 2, 3 
& { @(1, 2, 3) } | Where-Object { $_ -ge 2 }

# Unusual, array-as-a-whole output behavior.
# !! -> @(1, 2, 3) 
# !! Where-Object only receives *one* input object, which is the *array* as  while, in which
# !! case -ge acts as an array filter, that returns subarray @(2, 3), which Where-Object interprets
# !! as $true, and therefore *passes the input object (array) through*.
& { Write-Output -NoEnumerate @(1, 2, 3) } | Where-Object { $_ -ge 2 }

So as not to confound user expectations, deviation from this behavior should make the target command require user opt-in, such as via the -NoEnumerate and -AsArray switches some built-in cmdlets (now) offer.

The legacy PowerShell edition, Windows PowerShell, neglects to exhibit this patterns in a few cases (i.e. outputs arrays-as-a-whole by default or invariably), which have since been corrected in PowerShell 7.

A prominent example is ConvertFrom-Json, which only in PowerShell 7 exhibits the expected behavior - see #3424 for the backstory.

Note that while PowerShell 7's built-in cmdlets now work consistently, from what I can tell, third-party code and even modules that ship with Windows may still exhibit the unexpected behavior; e.g., Get-WinUserLanguageList

If you encounter such a command and want to force enumeration, simply enclose it in (...) (which collects all output in memory first) or pipe to Write-Output (which preserves the streaming behavior).

@SeeminglyScience SeeminglyScience added Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a Resolution-Answered The question is answered. and removed Needs-Triage The issue is new and needs to be triaged by a work group. labels Apr 28, 2024
@rhubarb-geek-nz
Copy link

rhubarb-geek-nz commented Apr 28, 2024

Another trick to avoid PowerShell's delisting of single elements is to capture the OutVariable itself which will contain all the elements of the output pipeline, so you can see it is the assignment doing collection.

$date = Get-Date -OutVariable datevar
$date.GetType()
$datevar.GetType()
$datevar[0].GetType()

gives

IsPublic IsSerial Name                                     BaseType
-------- -------- ----                                     --------
True     True     DateTime                                 System.ValueType
True     True     ArrayList                                System.Object
True     True     DateTime                                 System.ValueType

@mklement0
Copy link
Contributor

@rhubarb-geek-nz, while that is technically true, I consider this asymmetry between direct variable assignment and -OutVariable a bug, not a feature, as discussed many years ago in the following issue (nowadays, I would use slightly different framing and language, but the gist of the issue still applies):

Consider the following pitfall:

$null = Get-Item -OutVariable v $PROFILE

# !! ->  "The property 'LastWriteTime' cannot be found on this object. Verify that the property exists and can be set."
$v.LastWriteTime = [datetime]::now

Clearly, the intent of Get-Item $PROFILE is to retrieve a single object; yet, $v is now an ArrayList instance, so that $v.LastWriteTime applies member-access enumeration, which is unsupported for setting properties.

@rhubarb-geek-nz
Copy link

rhubarb-geek-nz commented Apr 29, 2024

As a general rule, avoid assignment in PowerShell when you are dealing with multiple items. It is problematic managing code paths which sometimes retun a single item or a list of items. Compare with an SQL query, a result set can return zero items, one item or multiple items with no drama. Where as powershell can give you a null, a single item or a collection.

It is all water under the bridge, but my recommendation remains, avoid assignment operator when dealing with multiple items where the count may be 0, 1 or many. Use the output either in a pipeline or the OutVariable for consistent results.

@rhubarb-geek-nz
Copy link

Clearly, the intent of Get-Item $PROFILE is to retrieve a single object

However

PS> get-command get-item -syntax

Get-Item [-Path] <string[]> [-Filter <string>] [-Include <string[]>] [-Exclude <string[]>] [-Force] [-Credential <pscredential>] [-Stream <string[]>] [<CommonParameters>]

Get-Item -LiteralPath <string[]> [-Filter <string>] [-Include <string[]>] [-Exclude <string[]>] [-Force] [-Credential <pscredential>] [-Stream <string[]>] [<CommonParameters>]

Your parameter goes to the path variable which can both take an array and expand the wildcards

PS> $FOO='*.ps1'
PS> get-item $FOO

Mode                 LastWriteTime         Length Name
----                 -------------         ------ ----
-a---          29/04/2024    00:20            156 array.ps1
-a---          28/04/2024    20:31            120 empty.ps1
-a---          28/04/2024    21:57             91 error.ps1
-a---          28/04/2024    21:55            101 outvar.ps1

So your clearly is clearly not quite as clear as you suggest.

@mklement0
Copy link
Contributor

mklement0 commented Apr 29, 2024

The "clearly" applied to the specific command, where a literal, single path was provided as the input.

The point is that any cmdlet is free to situationally "return" - i.e., emit to the success output stream - zero, one, or more output objects.

Earlier we've discussed the stream collection behavior that applies in direct variable. assignment, notably that a single output object is collected as-is.

The point of my previous comment was:

  • There is NO good reason for $v = ... to collect the output objects differently than ... -OutVariable v.

  • This difference can lead to bugs / unexpected behavior that may be hard to understand.

Also, note that your framing wasn't correct:

avoid PowerShell's delisting of single elements is to capture the OutVariable itself

-OutVariable has no impact on auto-enumeration, which happens regardless, unless explicitly suppressed.
It's simply that the-OutVariable feature unconditionally creates an ArrayList for the collected output, irrespective of the number of output objects. (Case in point: if you use New-Object System.Collections.ArrayList in combination with -OutVariable, you get a nested single-element ArrayList instance, whose first and only element contains the empty instance created by New-Object).

In concrete terms:

  • With zero output objects, direct variable assignment stores [System.Management.Automation.Internal.AutomationNull]::Value (the "enumerable null", which behaves like $null in an expression context, and like an empty enumerable in the pipeline), whereas -OutVariable creates an empty ArrayList instance.
  • With one output object, direct variable assignment collects that object as-is, whereas -OutVariable creates a single-element ArrayList
  • With two or more output objects, direct variable assignment creates a - fixed size - [object[]] array, whereas -OutVariable creates a (multi-element, resizable) ArrayList instance.

While you may choose to rely on this awkward inconsistency (which the documentation only hints at, without spelling out the ramifications) in order to always get an array-like result, I personally recommend avoiding it, both for the awkwardness of then having to suppress the success output ($null = ... -OutVariable) and the confusing discrepancy.

The short of it:

  • In order to emit enumerables as a whole from a PowerShell command, auto-enumeration must be suppressed (as an aside: in the Cmdlet.WriteObject() SDK function, the logic is reversed), using the techniques previously discussed.

  • If you want to ensure that at most one object is captured in a variable, pipe to Select-Object -First 1 or - if you don't mind collecting all output first - use (...)[0] (assuming Set-StrictMode is at most at -Version 2).

  • If you want to ensure that output is always captured in an array, use @(...), the array-subexpression operator ($v = @(...)), or (with subtly different behavior, [array] $v = ....).

  • However, thanks to PowerShell's unified handling of scalars and lists, provided via intrinsic members for scalars and member-access enumeration for enumerables, it is often not necessary to force creation of an array or, conversely, to explicitly enumerate the elements of an array for member access (read-only property access and method access).

  • And, yes, if you don't actually need to collect a(n intermediate) command's output, processing it in a streaming fashion in a pipeline is the best approach.

@rhubarb-geek-nz
Copy link

rhubarb-geek-nz commented Apr 29, 2024

A common pattern that I have, which is why I have so much frustration with PowerShell's Schroedinger's OO model is that being able to round trip JSON data is of vital importance. If the original JSON was an array it needs to stays as an array even if it only has one contained object. Likewise if an object contains a property that was array of one object that needs to stay as an array after our processing. Yes ConvertFrom-JSON now has the -NoEnumerate, and that adds to the complexity when writing scripts that have to work on both Desktop and Core. In order to do that we have to have test cases where every array anywhere within an object can have zero, one or some items so we know we are using the right flags at each processing step and work with all combinations of data.

@mklement0
Copy link
Contributor

ConvertFrom-JSON now has -NoEnumerate, and that adds to the complexity when writing scripts that have to work on both Desktop and Core.

That is unfortunate, but an unavoidable consequence of things getting improved / fixed in PS Core.

An array stored in a property value should never pose any problem, however; e.g., the following round-trips properly, in both editions:

[pscustomobject] @{ ArrayProp = @(1) } | ConvertTo-Json | ConvertFrom-Json

Also note that a simple way to avoid auto-enumeration is to pass an enumerable as an argument to ConvertTo-Json:

ConvertTo-Json @(1) -Compress # -> '[1]'

Copy link
Contributor

This issue has been marked as answered and has not had any activity for 1 day. It has been closed for housekeeping purposes.

Copy link
Contributor

microsoft-github-policy-service bot commented May 1, 2024

📣 Hey @abgox, how did we do? We would love to hear your feedback with the link below! 🗣️

🔗 https://aka.ms/PSRepoFeedback

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue-Question ideally support can be provided via other mechanisms, but sometimes folks do open an issue to get a Resolution-Answered The question is answered.
Projects
None yet
Development

No branches or pull requests

4 participants