-
-
Notifications
You must be signed in to change notification settings - Fork 1.7k
GSoC 2016 Application Olga Vorokh: Image processing and source detection in Gammapy
Application template.
Organization: OpenAstronomy
Sub-org: Astropy
Name: Olga Vorokh
Email: vorokho@gmail.com
Telephone: +375 29 138 65 57
Time zone: UTC +3
Source control username(s): GitHub - @OlgaVorokh
CV: https://drive.google.com/file/d/0B9CvVTgUiigZbFpZdzZFa21BZ28/view?usp=sharing
Blog(s): http://alcyonegammapy.blogspot.com.by/
GSoC Blog RSS feed: http://alcyonegammapy.blogspot.com/feeds/posts/default?alt=rss
PR links:
University: Belarusian State University
Major: Computer science
Current Year: Third year
Expected Graduation date: June 31, 2017
Degree: BSc
My name is Olga Vorokh and I am a 3rd year computer science student at the Belarusian State University at the Faculty of Applied Mathematics and Informatics. At the Belarusian State University I work on applying neural networks to the game solving as my undergrade coursework for the Discrete mathematics and Algorithmics department. The topic of convolutional neural networks is very actual today and I wanted to learn more about it. I have two years experience coding in Python and working with git/GitHub.
My programming journey began when I was 14 with C++ and competitive programming tasks. I got to the highest stage of high school programming competitions in my country — republican olympiad. After enrolling into Belarusian State University my experience lead me to the one of the strongest ACM teams in the university, achieving third diploma on the ACM ICPC Semifinals(NEERC) two years ago. In this regard, in January of this year I was awarded the badge of Presidential Fund for Social Support of Gifted Students.
In addition to BSU I am studying at the Yandex school of data analysis — additional courses from researchers and senior programmers from Yandex (the largest search engine in Russia, 4th largest search engine worldwide). I took a two semester long course on machine learning, one semester on NLP and a course on image processing. The courses had lectures about machine learning algorithms history, details, implementation and practical homeworks. In the school we are focused on writing our own code for data analysis using python, sklearn, pandas, numpy, scipy, cv2 and matplotlib. Since then Python became my favorite and most proficient programming language.
Several assignments from the past year include:
- implementing your own SVM using convex optimization library(cvxopt),
- analyzing different ways to build a feature set for factor machine(libFM) on a movie recommendation system dataset,
- building an ensemble of classifiers for kaggle competition extracting features(SIFTs) from bunch of images, clustering them,
- building histograms of sifts and using it for simple yet powerful image search engine.
Also I tried my skills in kaggle competitions. I participated in it during I was studying at Yandex School of Data Analysis. But this competition was internal and that is the reason why I give you url on the details of that challenge.
I am fond of singing and always take part in different concerts. I am the member of on-stage performance faculty group. In this connection, our group held series of musical events for children from children's home.
Astronomy is my hobby and for the past several years I have been attending weekly lectures and astronomy news at my local planetarium, playing with stellarium at home and even visiting “belarusian astronomy festival” — local conference on astronomy. So, as you can see, I am very excited about combining data analysis and astronomy. That’s why I have started contributing to Gammapy!
Read more about work that I done at the very end of application — I have pushed three Pull Requests.
Gammapy is a Python package for professional gamma-ray astronomers to analyse data from space-based gamma-ray telescopes (like Fermi-LAT) as well as ground-based gamma-ray telescopes (such as HESS or CTA). It is a very young project, so there’s a lot to do. The project I will be working on consists of gammapy.image, gammapy.detect (using gammapy.image as a building block) and making small modules and improvements such as source extraction or cluster detection tools. Working on the tests and documentation will be a significant part of my project, making both new and existing code documented and increasing coverage.
We will start to work with gammapy.image because during the work with gammapy.detect we will use gammapy.image package methods as building blocks.
Gammapy.image contains classes and methods for image analysis of gamma-ray data. It also includes general image processing routines. This package consists of large amount of methods but they are poorly documented, outdated and have very low test coverage. For instance, many of them don’t have usage examples and haven’t got full description.
The image module is a building block for my other tasks, which is why the first and the main target of my work is to review existing functions, change their API to be more consistent -- most of the standalone functions should be attached to SkyMap (the data container class for map based gamma-ray data that combines raw 2D data arrays with FITS file format) or SkyMapCollection.
Another important image class is SpectralCube. It is a part of gamma-ray data. The most image processing functions might be useful for cubes. This is why I also want to adapt existing image functions for cubes.
The second goal is improving gammapy.detect. The gammapy.detect submodule includes low level functions to compute significance and test statistics maps as well as some high level source detection method prototypes. In this submodule I plan to research and complete Iterative source detector -- source detection method that iteratively computes significance maps using multiple scales, adding detected sources from i iteration into the background model for i+1 iteration. To finish the work on this module I want to improve test coverage and fully document the algorithm.
As a part of my work on gammapy.detect I need to rewrite gammapy.detect.CWT - Continuos Wavelet Tranform. There are good and fast existing open-source libraries for mathematical operations, so I want to make a wrapper instead of writing one from scratch, documenting and testing it.
In conclusion of my work on the gammapy.image package I want to expose functionality as command-line tools gammapy.scripts (sub-package for image and detect command line tools). Command-line tools are important as they make it accessible for people who don’t know Python and make it quicker to use for common tasks.
Also I would like to add simple tools to do automatic source extraction, i.e. make a source catalog with estimated source position, extension, flux. According my work schedule I am planing to have two reserved weeks at the end of GSoC. If there's time I'll implement and test one of the cluster detection methods that have been used in gamma-ray astronomy (see these papers [1] [2] [3])
The work on gammapy.scripts is between working on a gammapy.image and gammapy.detect.
Most of the time I am going to spend for implementing and debugging tests and writing documentation, because it is also a considerable part of my work. Before application deadline I’ve made pull requests with py.test and sphinx to get familiarity with tools.
In the following timeline, I'm not listing time for tests and docs separately. I will work on code, tests and docs continuously throughout GSoC as they are important part of this project.
-
Week 1-2 (May 23) Go through all the existing functions of gammapy.image, change their API to be more consistent - most of the standalone functions should be attached to SkyMap\SkyMapCollection.
-
Week 3-5 (June 6) Make existing image function work for spectral cubes. Expose commonly used functionality of gammapy.image as command-line tools. Complete mid-term evaluations.
-
Week 6-7 (June 27) Start working on gammapy.detect. Add simple tools to do automatic source extraction, i.e. make a source catalog with estimated source position, extension, flux.
-
Week 8-9 (July 11) Finish the Iterative Source Detector.
-
Week 10-11 (July 25) Rewrite gammapy.detect.CWT - Continuos Wavelet Tranform (try to make a wrapper on some existing open-source library).
-
Week 12-13 (August 8) These two weeks will serve as a buffer period in case earlier steps of the proposal require more time, unforeseen difficulties arise, etc. If everything runs smoothly and the buffer period is unnecessary, then work on adding an event cluster detection method.
Before the application deadline I have made three pull requests to gammapy, related to my gsoc project.
The first one fixed a bug in gammapy.detect image_ts and significantly improved documentation for the main gammapy.detect page, using sphinx
Link: https://github.com/gammapy/gammapy/pull/456
The second one added unittest for the console utility, using py.test
Link: https://github.com/gammapy/gammapy/pull/456
And I have added a reference data files for the tests above
Link: https://github.com/gammapy/gammapy-extra/pull/31
These pull requests are similar to what I am going to do during coding period and I’ve gained understanding of core gammapy classes and common utilities such as sphinx or py.test.