New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow fetching of historical data for CL-SIC #1692
Conversation
- Handle old format xls files correctly. - Read only the correct sheet from xls file. - Add geothermal category. - Avoid error if no generation in hour for fuel type. - Add new plants.
parsers/CL_SIC.py
Outdated
try: | ||
plant_vals = thermal_df.loc[plant].to_dict() | ||
except KeyError: | ||
# plant is missing from df |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe log a warning here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Might be better to loop over the plants in the df rather than the dict mapping here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
indeed. Can you add a todo, and add a log here (just in case)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully my changes are ok, all unmapped plants are being sent to the logger.
parsers/CL_SIC.py
Outdated
|
||
|
||
def data_processer(df, date, logger): | ||
def data_processer(df, date, old_format, logger): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for clarity you could use is_old_format
req = s.get(document_url) | ||
soup = BeautifulSoup(req.text, 'html.parser') | ||
|
||
# Find the latest file. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In the eventuality that their website is not maintained anymore and the html tag is never udpated, one way to find the latest xls file is to call directly the excel URL "coordinador.cl/wp-content/uploads/estadisticas/operdiar/18/OP181127.xls" with the latest date returning a "status 200" request starting from date = now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's more a nice-to-have than a must have, but, i'm afraid that SIC is not maintaining this webpage anymore in favor of a newest one, so I'd bet the html tag will stay old
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes right now you can use target datetime to get files after 6th November, we can change this if the website stops updating permanently.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There seems to be an error when trying to fetch historical data:
python3 test_parser.py CL-SIC --target_datetime '2018-11-20'
returns
Traceback (most recent call last):
File "test_parser.py", line 83, in <module>
print(test_parser())
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 722, in __call__
return self.main(*args, **kwargs)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 697, in main
rv = self.invoke(ctx)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 895, in invoke
return ctx.invoke(self.callback, **ctx.params)
File "/usr/local/lib/python3.7/site-packages/click/core.py", line 535, in invoke
return callback(*args, **kwargs)
File "test_parser.py", line 49, in test_parser
res = parser(*args, target_datetime=target_datetime)
File "/Users/brunolajoie/code/electricitymap/contrib/parsers/CL_SIC.py", line 439, in fetch_production
gxd = get_xls_data(target_datetime = target_datetime, session = None)
File "/Users/brunolajoie/code/electricitymap/contrib/parsers/CL_SIC.py", line 251, in get_xls_data
lookup_date = target_datetime.format('YYMMDD')
AttributeError: 'datetime.datetime' object has no attribute 'format'
date_no_tz = arrow.get(date_str, "YYMMDD") | ||
date = date_no_tz.replace(tzinfo='Chile/Continental') | ||
else: | ||
lookup_date = target_datetime.format('YYMMDD') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lookup_date = target_datetime.format('YYMMDD') | |
target_datetime = arrow.get(target_datetime) | |
lookup_date = target_datetime.format('YYMMDD') |
That should make the historical fetching option compatible with test_parser.py & our backend
plant_vals = thermal_df.loc[plant].to_dict() | ||
plant_type = thermal_plants[plant] | ||
|
||
plant_type = THERMAL_PLANTS.get(plant, 'unknown') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sorry to be annoying, but shouldn't we log the case where a plant_type
is not the THERMAL_PLANTS
in order to add it in the future?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
L305 should catch that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ha! didn't see it. 👍
Just an FYI guys, arrow has some weird results when parsing certain strings. Good to know for any target_datetime implementation. |
Bruno, until they start updating the site again you can use target_datetime to get files after 6th November.
Output example
(EM-env) chris@ThinkPad:~/electricitymap$ python parsers/CL_SIC.py fetch_production(target_datetime=2016-01-01) [{'zoneKey': 'CL-SIC', 'datetime': datetime.datetime(2016, 1, 2, 0, 0, tzinfo=tzfile('/usr/share/zoneinfo/Chile/Continental')), 'production': {'solar': 0.0, 'coal': 1888.2, 'wind': 966342.8999999998, 'biomass': 234.69999999999993, 'unknown': 0.7, 'hydro': 17361032.322899994, 'oil': 14.9, 'gas': 321.7}, 'storage': {}, 'source': 'sic.coordinador.cl'}, {'zoneKey': 'CL-SIC', 'datetime': datetime.datetime(2016, 1, 1, 1, 0, tzinfo=tzfile('/usr/share/zoneinfo/Chile/Continental')), 'production': {'solar': 0.0, 'coal': 1903.4, 'wind': 1585315.8243000007, 'biomass': 244.70000000000002, 'unknown': 0.7, 'hydro': 14333647.57380001, 'oil': 15.1, 'gas': 285.5}, 'storage': {}, 'source': 'sic.coordinador.cl'}, {'zoneKey': 'CL-SIC', 'datetime': datetime.datetime(2016, 1, 1, 2, 0, tzinfo=tzfile('/usr/share/zoneinfo/Chile/Continental')), 'production': {'solar': 0.0, 'coal': 1904.4, 'wind': 1185011.3194, 'biomass': 234.70000000000002, 'unknown': 0.7, 'hydro': 13837277.098700004, 'oil': 15.2, 'gas': 268.8}, 'storage': {}, 'source': 'sic.coordinador.cl'}, {'zoneKey': 'CL-SIC', 'datetime': datetime.datetime(2016, 1, 1, 3, 0, tzinfo=tzfile('/usr/share/zoneinfo/Chile/Continental')), 'production': {'solar': 0.0, 'coal': 1896.2, 'wind': 729369.0264000002, 'biomass': 240.90000000000003, 'unknown': 0.7, 'hydro': 12849447.199200004, 'oil': 13.7, 'gas': 271.0}, 'storage': {}, 'source': 'sic.coordinador.cl'},...........snipped
ref #1683 #1257