Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

More detailed documentation around the limitations for Azure and scaling the solution #1331

Closed
MartinDawson opened this issue Apr 16, 2024 · 3 comments

Comments

@MartinDawson
Copy link
Contributor

MartinDawson commented Apr 16, 2024

Hi,

After having used this for a while at our company for AWS and Azure emissions I think it would be useful to list the following issues with the current architecture and the way to solve them (A LOT of changes are needed to bring this code up to a scalable solution) and the performance recommendations aren't really too good here (https://www.cloudcarbonfootprint.org/docs/performance-considerations):

  • The current architecture for Azure as mentioned in the docs is to call an API and write to a cache. This isn't scalable when you want huge amounts of emission data, i.e subscription x resourceGroup x resource x serviceName x regionName x timestamp = billions of rows for any organization that has meaningful amounts of data inside it.

Using an API for this even with caching is not going to work, it's simply too much data to transfer. We solved this for Azure by having MSFT export all of our companies CSV data daily into azure blob storage containers. We then set up an ETL process with Azure Data Lake to process this and fire a daily azure function which seeds our own timescaledb database.

We also have a nodejs script that fetches the initial CSV data and seeds our database on start for many months of data.

This is the only way if you want resource level granularity, i.e like Carbon optimization has.

  • The current frontend code is not optimized. Currently it's doing client side filters everywhere and the filters code is very coupled to every other frontend component (and very complex). When handling massive amounts of data this isn't scalable so server side filtering and pagination are needed (especially for resource level data as that can have 100k+ resources in it).

This obviously requires a lot of changes, both in the thoughtworks code Azure, Cli, App, api to add CSV support, an API layer, an ETL process, a nodejs script, modifications to the frontend, Azure functions etc etc.

Wanted to post this for any other devs who are thinking of using this for repository for scaling up, this repo is a good base but it will take a lot of changes, many many months of effort and lots of code.

@CodeMartian
Copy link

Hi @MartinDawson! Thanks for all of the feedback. These limitations definitely bring up challenges on scaling up, especially in an Azure environment. CCF is being worked on with these enhancements in mind. To address the concerns you described, there are two open issues:

Both of these should help improve CCF specifically in the limitations you are talking about. Please feel free to add any ideas you may have to these issues, or if you would like to, please contribute!

We really do appreciate your feedback! We also rely on and encourage our community members to open PRs for any general enhancements or fixes made on their own CCF instances/forks.

@MartinDawson
Copy link
Contributor Author

Thanks for the great repo. I should have put it in the original comment, it's a very good starting point and amazing that it's open source.

We looked at cost details API, it still isn't sufficient for large data IIRC.

Unfortunately companies are going through huge cuts right now just like my previous one was that used this repo so there's less appetite from management to spend time contributing :/

@4upz
Copy link
Member

4upz commented May 16, 2024

@MartinDawson All very good points, and thanks for sharing.

We totally understand if your organization doesn't have a budget for contributing. Even mentioning your workaround of using Azure Blob Storage and csv exports are extremely helpful! This is an alternative that we're considering building alongside the Cost Details upgrade as mentioned in #1175. This may end up being the recommended route for enterprise users with a similar scale of data. Ideally, with a built-in method for kicking off or regularly automating exports.

Thanks again for your feedback as these type of issues help move work up our priority list by directly seeing its impact 😄. I'll close this issue since we already have actionable tasks for most of these suggestions, but don't hesitate to leave additional feedback or questions on the relative issues if any new ones come up.

@4upz 4upz closed this as completed May 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants