Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ResultPager doesn't merge data correctly for all API's #912

Open
fideloper opened this issue Jul 23, 2020 · 9 comments
Open

ResultPager doesn't merge data correctly for all API's #912

fideloper opened this issue Jul 23, 2020 · 9 comments

Comments

@fideloper
Copy link

fideloper commented Jul 23, 2020

Hello!

The Problem (Description)

This is technically a duplicate of #597 from 2017, but I wanted to add in details and present some possible solutions.

We're using this library on ChipperCI, some customers noted that we aren't correctly listing all of the repositories our GitHub App is installed into when they have a large number of repositories.

Digging in, we realized only the last page of results were being returned back when using the ResultPager::fetchAll() method on results that had more than one page.

This only happens on certain API calls (I've only seen it on the Apps API so far - listed below).

The Issue

This is due to the use of array_merge() to merge resulting arrays back:

https://github.com/KnpLabs/php-github-api/blob/2.x/lib/Github/ResultPager.php#L92-L96:

while ($this->hasNext()) {
    $next = $this->fetchNext();

    if ($isSearch) {
        $result = array_merge($result, $next['items']);
    } else {
        $result = array_merge($result, $next); // <-- Specifically here
    }
}

Here's an example call into the App's api that results in the issue:

$github = new Client(new Builder, 'machine-man-preview');
$github->authenticate($userOAuthToken, Client::AUTH_HTTP_TOKEN);
$resultFetcher = new ResultPager($github);
$installations = $resultFetcher->fetchAll($github->currentUser(), 'installations');

A result object from one API call is (I set it to 1 per page to test this out, but it happens when there are more than 100 results for any API with results like this) - converted from JSON to PHP:

// API call number one, 1 per page, when there are 2 results total:
$result = [
    'total_count' => 2,
    'installations' => [
        ['id' => 12345 /* and so on */],
    ]
]

// API call number two, again using 1 per page, when there are 2 results total:
$next = [
    'total_count' => 2,
    'installations' => [
        ['id' => 678910 /* and so on */],
    ]
]

If you array_merge() these, PHP will over-write the the first array with the second, as their associative keys match!

$finalResult = array_merge($result, $next);

// since the arrays have matching keys, the $next array over-writes everything in the $result array
// and we effectively only get the last page of results for any set of results over the 100 results per page 
// limit set by GitHub's API
$finalResult === $next; // true
array_diff($finalResult, $next); // yields an empty result [], they are the same

Possible Solutions

A first idea I had:

The solution may be API class extending AbstractAPI to know if there is a key to be used there.

So far I can only find examples from the App's api, and from searching the source of that page, only these 3 endpoints appear to use total_count (and the other key for repositories / installations) in the returned results:

  1. https://docs.github.com/en/rest/reference/apps#list-repositories-accessible-to-the-user-access-token
    • e.g. GET /user/installations/{installation_id}/repositories
  2. https://docs.github.com/en/rest/reference/apps#list-repositories-accessible-to-the-app-installation
    • e.g. GET /installation/repositories
  3. https://docs.github.com/en/rest/reference/apps#list-app-installations-accessible-to-the-user-access-token
    • e.g. GET /user/installations

However, I've checked some GitHub issues here that hint that this may be an issue elsewhere (and actually, this bug is open too: #597).

The keys in the $result / $next array are repositories or installations.

So perhaps something like this:

// File: AbstractAPI

protected $resultKey;

public function getResultKey()
{
    return $this->resultKey;
}

And then in ResultPager:

while ($this->hasNext()) {
    $next = $this->fetchNext();

    if ($isSearch) {
        $result = array_merge($result, $next['items']);
    } elseif (! $this->getResultKey()) {
        $result = array_merge($result, $next[$this->getResultKey()]);
    } else {
        $result = array_merge($result, $next); 
    }
}

This would likely need to be used not just for certain API's (e.g. the App's api) but the specific endpoints that need it 😅


Hopefully these extra details can help pin-point the issue! In the meantime, I'll do a similar work-around as the others in this (duplicate but less detailed) issue.

Let me know if I can clarify anything!

@fideloper
Copy link
Author

@GrahamCampbell This could effect #907 as well. Let me know what you think!

@GrahamCampbell
Copy link
Contributor

I think we should report this as a bug to GitHub actually. It looks like the developer who implemented the installations API was not aware of their conventions.

@rtek
Copy link

rtek commented Aug 3, 2020

Inline hack that I used to get fetchAll working for the listRepositories command.

Mileage may vary.

$pager = new class($client) extends ResultPager {
    protected function callApi(ApiInterface $api, $method, array $parameters)
    {
        return parent::callApi($api, $method, $parameters)['repositories'];
    }
    public function fetchNext()
    {
        return parent::fetchNext()['repositories'];
    }
};

@fideloper
Copy link
Author

Right on. I did the following since I have one call that needs repositories and one that needs installations:

<?php

namespace App\Api\GitHub;


use Github\Api\ApiInterface;
use Github\Api\Search;
use Github\ResultPager as BaseResultPager;

class ResultPager extends BaseResultPager
{
    protected $resultKey;

    /**
     * @param $key
     * @return $this
     */
    public function setResultKey($key)
    {
        $this->resultKey = $key;
        return $this;
    }

    /**
     * {@inheritdoc}
     */
    public function fetchAll(ApiInterface $api, $method, array $parameters = [])
    {
        $isSearch = $api instanceof Search;

        // get the perPage from the api
        $perPage = $api->getPerPage();

        // set parameters per_page to GitHub max to minimize number of requests
        $api->setPerPage(100);

        try {
            $result = $this->callApi($api, $method, $parameters);
            $this->postFetch();

            if ($this->resultKey) {
                $result = $result[$this->resultKey];
            }

            if ($isSearch) {
                $result = isset($result['items']) ? $result['items'] : $result;
            }

            while ($this->hasNext()) {
                $next = $this->fetchNext();

                if ($isSearch) {
                    $result = array_merge($result, $next['items']);
                } elseif ($this->resultKey) {
                    $result = array_merge($result, $next[$this->resultKey]);
                } else {
                    $result = array_merge($result, $next);
                }
            }
        } finally {
            // restore the perPage
            $api->setPerPage($perPage);
        }

        return $result;
    }
}

Example call to this result fetcher:

$github = new Client(new Builder, 'machine-man-preview');
$github->authenticate($oAuthToken, Client::AUTH_HTTP_TOKEN);
$resultFetcher = new GitHubResultPager($github);

$resultFetcher
    ->setResultKey('installations')
    ->fetchAll($github->currentUser(), 'installations');

@fideloper
Copy link
Author

@GrahamCampbell I opened a ticket with GitHub but haven't heard a thing back :D

@fideloper
Copy link
Author

From GitHub:

Hi there Chris,

Thanks for writing in to ask about this - and I'm sorry for the time we've kept you waiting for a response.

Returning those results in a nested array with a total_count is the intended behaviour for these endpoints. This isn't the only endpoint to behave this way, you'll notice that our Search API also returns a nested array of results.

We've used this format on some endpoints where it's particuarly useful to obtain a total count. It would be a breaking change to roll this out to more endpoints which does mean things can be a bit inconsistent.

I can understand how this could cause trouble with libraries that attempt to treat all REST API responses the same but it won't be possible to change the format of these responses. I hope this answers your question and helps move the discussion forward in the issue you linked.

Best regards,
Steve

@vovayatsyuk
Copy link
Contributor

vovayatsyuk commented May 6, 2021

Just updated this library and found that the previous solution posted by @fideloper doesn't work anymore.

Here is an updated class for version 3.2.0 in case somebody will need it too:

<?php

namespace App\Api\Github;

use Github\Api\AbstractApi;
use Github\ResultPager as BaseResultPager;

class ResultPager extends BaseResultPager
{
    private $resultKey;

    public function setResultKey($key)
    {
        $this->resultKey = $key;
        return $this;
    }

    public function fetch(AbstractApi $api, string $method, array $parameters = []): array
    {
        return $this->prepareResult(parent::fetch($api, $method, $parameters));
    }

    protected function get(string $key): array
    {
        return $this->prepareResult(parent::get($key));
    }

    private function prepareResult(array $result): array
    {
        if ($this->resultKey && $result) {
            $result['items'] = $result[$this->resultKey] ?? [];
        }

        return $result;
    }
}

Usage:

$repositories = (new ResultPager($client))
    ->setResultKey('repositories')
    ->fetchAll($client->apps(), 'listRepositories');

@Necmttn
Copy link
Contributor

Necmttn commented Jun 3, 2021

Just updated this library and found that the previous solution posted by @fideloper doesn't work anymore.

Here is an updated class for version 3.2.0 in case somebody will need it too:

<?php

namespace App\Api\Github;

use Github\Api\AbstractApi;
use Github\ResultPager as BaseResultPager;

class ResultPager extends BaseResultPager
{
    private $resultKey;

    public function setResultKey($key)
    {
        $this->resultKey = $key;
        return $this;
    }

    public function fetch(AbstractApi $api, string $method, array $parameters = []): array
    {
        return $this->prepareResult(parent::fetch($api, $method, $parameters));
    }

    protected function get(string $key): array
    {
        return $this->prepareResult(parent::get($key));
    }

    private function prepareResult(array $result): array
    {
        if ($this->resultKey && $result) {
            $result['items'] = $result[$this->resultKey] ?? [];
        }

        return $result;
    }
}

Usage:

$repositories = (new ResultPager($client))
    ->setResultKey('repositories')
    ->fetchAll($client->apps(), 'listRepositories');

Thank you! it works like a charm.

@fideloper
Copy link
Author

fideloper commented Mar 8, 2022

This is still an "issue" as of tag 3.5.1

The Problem

The API results (when I get a list of installations and repositories) are:

$github = new Client(new Builder, 'machine-man-preview');
$github->authenticate('<user oauth token>', AuthMethod::ACCESS_TOKEN);
$resultFetcher = new ResultPager($github);

$installations = $resultFetcher->fetchAll($github->currentUser(), 'installations');

// RESULTS LOOK LIKE THIS:
//[
//    'total_count' => 100,
//    'installations' => [...],
//]


// Get GitHub App token for JWT auth
$token = AppsFacade::newInstallationToken($installation['id']);

$github = new Client(new Builder, 'machine-man-preview');
$github->authenticate($token['token'], AuthMethod::JWT);
$resultFetcher = new ResultPager($github);

$response = $resultFetcher->fetchAll($github->apps(), 'listRepositories');

// RESULTS LOOK LIKE THIS:
//[
//    'total_count' => 100,
//    'repositories' => [...],
//]

➡️ The way this iterates over results only returns the last page!

Essentially code return iterator_to_array($this->fetchAllLazy($api, $method, $parameters)); yields those results. Each iteration yielding the results over-write the previous ones, as its return an array with keys total_count and repositories each iteration.

It doesn't seem to merge results!

Latest Hacky Solution

So, I've extended the class once again, here it is!

🔴 This iteration assumes the method setResultKey() is always used

<?php

namespace App\Api\GitHub;

use Generator;
use Github\Api\AbstractApi;
use Github\ResultPager as BasePager;

class ResultPager extends BasePager
{
    private $resultKey;

    /**
     * @param $key
     * @return $this
     */
    public function setResultKey($key)
    {
        $this->resultKey = $key;
        return $this;
    }

    /**
     * {@inheritdoc}
     */
    public function fetchAllLazy(AbstractApi $api, string $method, array $parameters = []): Generator
    {
        $result = $this->fetch($api, $method, $parameters);

        foreach ($result[$this->resultKey] as $key => $item) {
            if (is_string($key)) {
                yield $key => $item;
            } else {
                yield $item;
            }
        }

        while ($this->hasNext()) {
            $result = $this->fetchNext();

            foreach ($result[$this->resultKey] as $key => $item) {
                if (is_string($key)) {
                    yield $key => $item;
                } else {
                    yield $item;
                }
            }
        }
    }
}

Second Solution

Another Laravel-centric solution that does not involve over-riding the ResultPager class is to use a LazyCollection.

use Illuminate\Support\LazyCollection;

$github = new Client(new Builder, 'machine-man-preview');
$github->authenticate('<oauth token here>', AuthMethod::ACCESS_TOKEN);
$resultFetcher = new ResultPager($github);

$allInstallations = LazyCollection::make(function () use ($github, $resultFetcher) {
    yield $resultFetcher->fetch($github->currentUser(), 'installations');

    while ($resultFetcher->hasNext()) {
        yield $resultFetcher->fetchNext();
    }
})->reduce(function ($results, $result) {
    $results['installations'] = array_merge($results['installations'] ?? [], $result['installations']);

    return $results;
}, []);

foreach($allInstallations['installations'] as $installation) {...}

h/t to @tlaverdure 🍻

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants