Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrate electricity maps API for emissions factors #1234

Open
5 tasks done
ccasher opened this issue Sep 7, 2023 · 7 comments
Open
5 tasks done

integrate electricity maps API for emissions factors #1234

ccasher opened this issue Sep 7, 2023 · 7 comments
Assignees
Labels
enhancement New feature or request

Comments

@ccasher
Copy link
Collaborator

ccasher commented Sep 7, 2023

Using the following API documentation, add a configurable option for users to provide an EM token, and choose to use those emissions factors for CCF estimations.

Tech Tasks

  • Create new env variable for EM token
  • Add logic that CCF will use EM emissions factor if token is set
  • Update csp regions to map against EM zones
  • Set up API calls to EM based on the date of the usage data to retrieve historic EM data
  • Update documentation on microsite (methodology, configurations, and config options)
@ccasher ccasher added the enhancement New feature or request label Sep 7, 2023
@madsnedergaard
Copy link

Thanks for implementing this, it's super exciting to see! 🤩

The current way our API portal works, the API URL can be different depending on how the user has signed up and to which "product" on our API Portal.

I'll happily make a PR for adding a new variable (base_url) to the code below and updating the documentation, if you don't mind?

https://github.com/cloud-carbon-footprint/cloud-carbon-footprint/blob/b7f0ff1134af15605af7eea1ef37759fafef3e1e/packages/common/src/EmissionsFactors.ts#L81C19-L81C19

@camcash17
Copy link
Collaborator

Hi @madsnedergaard, a PR for this change would be extremely welcome! Thank you.

Also if you wouldn't mind, we'd appreciate if you could take a look at the cloud region to EM zone mappings and call out any inconsistencies:

Also, you might notice we chose to use the Past Carbon Intensity History endpoint. If you think there was a better option to retrieve the intensity data as the application maps over usage rows, please let us know!

Finally, I was curious about any API limitations we might face. The way it's currently written, the maximum amount of EM API requests for a given day would be total number of unique zones present in the cloud usage data. With every additional timestamp that gets added to the requested time range for CCF, the amount of EM API requests will increase by the multiple of zones x timestamps. I was able to stress test a bit and request data for 1 month and it seemed to hold up. Though, one potential feature addition for CCF is to handle hourly usage data, which would add a factor of 24 additional API calls for a given day.

Interested to hear your thoughts!

@ccasher
Copy link
Collaborator Author

ccasher commented Oct 5, 2023

Hi @madsnedergaard - just wanted to follow up on this!

@madsnedergaard
Copy link

Hi @madsnedergaard, a PR for this change would be extremely welcome! Thank you.

Sorry for the lack of response, I've been away on vacation - will start on a PR this week :)

I've set up CCF for our own GCP cloud project and we have billing data in BigQuery since May, so I'll use this for testing while developing.

Also if you wouldn't mind, we'd appreciate if you could take a look at the cloud region to EM zone mappings and call out any inconsistencies:

The mapping for all three look correct to me, only thing missing is that we actually have Hong Kong as well, which for all three providers are currently null - I'll change that in the PR too.

Also, you might notice we chose to use the Past Carbon Intensity History endpoint. If you think there was a better option to retrieve the intensity data as the application maps over usage rows, please let us know!

Hmm, I think the best solution would be to use the past-range endpoint which queries up to 240h (10 days) for each request. But this also means more logic is required to split the queries into chunks.

Finally, I was curious about any API limitations we might face. The way it's currently written, the maximum amount of EM API requests for a given day would be total number of unique zones present in the cloud usage data. With every additional timestamp that gets added to the requested time range for CCF, the amount of EM API requests will increase by the multiple of zones x timestamps. I was able to stress test a bit and request data for 1 month and it seemed to hold up. Though, one potential feature addition for CCF is to handle hourly usage data, which would add a factor of 24 additional API calls for a given day.

Hmm, sounds like it could potentially become a problem, especially with hourly (e.g. 43.000 requests for 1 year with 5 zones), so if you decide to go that direction we should probably get in touch and discuss potentially developing a new endpoint specifically for this project :)

@madsnedergaard
Copy link

Hmm, while working on the PR I found a few problems with the current implementation:

  1. Only using the CO2 intensity of "midnight" for the full day

Since the billing data is daily, the datetime being queried is set to the first hour of the day (e.g. 2023-05-07T00:00:00.000Z). This means that the emission factor used for the day is taken from the very first hour, which usually have a quite different emission factor compared to hours during the day. Ideally it should use an average of the day, although that is not currently available on the API.

  1. Creating a lot of queries takes a long time

While testing this with our own data, it took approximately 10 minutes to fetch emissions for half a year of data. This is not really going to scale well, especially if many people use the integration.


To handle both of these problems, I am wondering if we could instead adapt the integration to fetch data from our recently launched, public data portal instead: https://www.electricitymaps.com/data-portal
It's providing data averaged over the day, and it would be significantly faster and simpler to fetch just one file per zone instead of calling the API all the time :)
The data is currently available in CSV files, but we can look into providing a JSON file as well for easier ingestion. What do you think?

@camcash17
Copy link
Collaborator

Hi @madsnedergaard, thanks for getting back to this!

Also, thanks for getting a PR put together. I was not able to see any zones in China or a few regions around the Arabian peninsula. Also, using the past-range endpoint sounds like it could be a good route. Maybe we could find a time to connect if we try to update the logic or support hourly emissions.

Do you have any idea when the average daily factor might be available by the API? Also, could you please elaborate on the approach you are suggesting using a JSON file from the data-portal? This sounds like it could be a large file depending on how much usage is requested from CCF.

Lastly, CCF typically finds big data scalability issues when you get into the request range of over 1 month at a time (depending on your organization's cloud usage). For this reason, we usually suggest backfilling data as a background job (which will likely take some time, but meant to just be done once), and then continually triggering the CCF request on a daily basis there after.

@madsnedergaard
Copy link

Hey again, I have started working on adding aggregated data to the API to make it simpler to get this data without having to call the API a lot of times or using the CSVs from the Data Portal.

Will create a PR once it has been released to production :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Development

No branches or pull requests

3 participants