Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow fetching of historical data for CL-SIC #1692

Merged
merged 3 commits into from Nov 29, 2018

Conversation

systemcatch
Copy link
Collaborator

  • Handle old format xls files correctly.
  • Read only the correct sheet from xls file.
  • Add geothermal category.
  • Avoid error if no generation in hour for fuel type.
  • Add new plants.

Bruno, until they start updating the site again you can use target_datetime to get files after 6th November.

Output example

(EM-env) chris@ThinkPad:~/electricitymap$ python parsers/CL_SIC.py 
fetch_production(target_datetime=2016-01-01)
[{'zoneKey': 'CL-SIC', 'datetime': datetime.datetime(2016, 1, 2, 0, 0, tzinfo=tzfile('/usr/share/zoneinfo/Chile/Continental')), 'production': {'solar': 0.0, 'coal': 1888.2, 'wind': 966342.8999999998, 'biomass': 234.69999999999993, 'unknown': 0.7, 'hydro': 17361032.322899994, 'oil': 14.9, 'gas': 321.7}, 'storage': {}, 'source': 'sic.coordinador.cl'}, {'zoneKey': 'CL-SIC', 'datetime': datetime.datetime(2016, 1, 1, 1, 0, tzinfo=tzfile('/usr/share/zoneinfo/Chile/Continental')), 'production': {'solar': 0.0, 'coal': 1903.4, 'wind': 1585315.8243000007, 'biomass': 244.70000000000002, 'unknown': 0.7, 'hydro': 14333647.57380001, 'oil': 15.1, 'gas': 285.5}, 'storage': {}, 'source': 'sic.coordinador.cl'}, {'zoneKey': 'CL-SIC', 'datetime': datetime.datetime(2016, 1, 1, 2, 0, tzinfo=tzfile('/usr/share/zoneinfo/Chile/Continental')), 'production': {'solar': 0.0, 'coal': 1904.4, 'wind': 1185011.3194, 'biomass': 234.70000000000002, 'unknown': 0.7, 'hydro': 13837277.098700004, 'oil': 15.2, 'gas': 268.8}, 'storage': {}, 'source': 'sic.coordinador.cl'}, {'zoneKey': 'CL-SIC', 'datetime': datetime.datetime(2016, 1, 1, 3, 0, tzinfo=tzfile('/usr/share/zoneinfo/Chile/Continental')), 'production': {'solar': 0.0, 'coal': 1896.2, 'wind': 729369.0264000002, 'biomass': 240.90000000000003, 'unknown': 0.7, 'hydro': 12849447.199200004, 'oil': 13.7, 'gas': 271.0}, 'storage': {}, 'source': 'sic.coordinador.cl'},...........snipped 

ref #1683 #1257

- Handle old format xls files correctly.
- Read only the correct sheet from xls file.
- Add geothermal category.
- Avoid error if no generation in hour for fuel type.
- Add new plants.
try:
plant_vals = thermal_df.loc[plant].to_dict()
except KeyError:
# plant is missing from df
Copy link
Member

@corradio corradio Nov 28, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe log a warning here?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be better to loop over the plants in the df rather than the dict mapping here.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

indeed. Can you add a todo, and add a log here (just in case)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully my changes are ok, all unmapped plants are being sent to the logger.



def data_processer(df, date, logger):
def data_processer(df, date, old_format, logger):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for clarity you could use is_old_format

req = s.get(document_url)
soup = BeautifulSoup(req.text, 'html.parser')

# Find the latest file.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the eventuality that their website is not maintained anymore and the html tag is never udpated, one way to find the latest xls file is to call directly the excel URL "coordinador.cl/wp-content/uploads/estadisticas/operdiar/18/OP181127.xls" with the latest date returning a "status 200" request starting from date = now

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's more a nice-to-have than a must have, but, i'm afraid that SIC is not maintaining this webpage anymore in favor of a newest one, so I'd bet the html tag will stay old

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes right now you can use target datetime to get files after 6th November, we can change this if the website stops updating permanently.

Copy link
Contributor

@brunolajoie brunolajoie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There seems to be an error when trying to fetch historical data:
python3 test_parser.py CL-SIC --target_datetime '2018-11-20'
returns

Traceback (most recent call last):
  File "test_parser.py", line 83, in <module>
    print(test_parser())
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
    return self.main(*args, **kwargs)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
    rv = self.invoke(ctx)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
    return callback(*args, **kwargs)
  File "test_parser.py", line 49, in test_parser
    res = parser(*args, target_datetime=target_datetime)
  File "/Users/brunolajoie/code/electricitymap/contrib/parsers/CL_SIC.py", line 439, in fetch_production
    gxd = get_xls_data(target_datetime = target_datetime, session = None)
  File "/Users/brunolajoie/code/electricitymap/contrib/parsers/CL_SIC.py", line 251, in get_xls_data
    lookup_date = target_datetime.format('YYMMDD')
AttributeError: 'datetime.datetime' object has no attribute 'format'

date_no_tz = arrow.get(date_str, "YYMMDD")
date = date_no_tz.replace(tzinfo='Chile/Continental')
else:
lookup_date = target_datetime.format('YYMMDD')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
lookup_date = target_datetime.format('YYMMDD')
target_datetime = arrow.get(target_datetime)
lookup_date = target_datetime.format('YYMMDD')

That should make the historical fetching option compatible with test_parser.py & our backend

plant_vals = thermal_df.loc[plant].to_dict()
plant_type = thermal_plants[plant]

plant_type = THERMAL_PLANTS.get(plant, 'unknown')
Copy link
Member

@corradio corradio Nov 29, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry to be annoying, but shouldn't we log the case where a plant_type is not the THERMAL_PLANTS in order to add it in the future?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

L305 should catch that.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha! didn't see it. 👍

@systemcatch
Copy link
Collaborator Author

Just an FYI guys, arrow has some weird results when parsing certain strings. Good to know for any target_datetime implementation.

arrow-py/arrow#519
arrow-py/arrow#91

@brunolajoie brunolajoie merged commit 04e821c into electricitymaps:master Nov 29, 2018
@systemcatch systemcatch deleted the cl-sic-past branch March 4, 2019 10:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants