Skip to content

GSoC 2016 Application Olga Vorokh: Image processing and source detection in Gammapy

OlgaVorokh edited this page Mar 27, 2016 · 1 revision

Application template.

Sub-organization information

Organization: OpenAstronomy

Sub-org: Astropy

Student Information

Name: Olga Vorokh

Email: vorokho@gmail.com

Telephone: +375 29 138 65 57

Time zone: UTC +3

Source control username(s): GitHub - @OlgaVorokh

CV: https://drive.google.com/file/d/0B9CvVTgUiigZbFpZdzZFa21BZ28/view?usp=sharing

Blog(s): http://alcyonegammapy.blogspot.com.by/

GSoC Blog RSS feed: http://alcyonegammapy.blogspot.com/feeds/posts/default?alt=rss

PR links:

  1. https://github.com/gammapy/gammapy/pull/475

  2. https://github.com/gammapy/gammapy/pull/456

  3. https://github.com/gammapy/gammapy-extra/pull/31

University Information

University: Belarusian State University

Major: Computer science

Current Year: Third year

Expected Graduation date: June 31, 2017

Degree: BSc

Bio

My name is Olga Vorokh and I am a 3rd year computer science student at the Belarusian State University at the Faculty of Applied Mathematics and Informatics. At the Belarusian State University I work on applying neural networks to the game solving as my undergrade coursework for the Discrete mathematics and Algorithmics department. The topic of convolutional neural networks is very actual today and I wanted to learn more about it. I have two years experience coding in Python and working with git/GitHub.

My programming journey began when I was 14 with C++ and competitive programming tasks. I got to the highest stage of high school programming competitions in my country — republican olympiad. After enrolling into Belarusian State University my experience lead me to the one of the strongest ACM teams in the university, achieving third diploma on the ACM ICPC Semifinals(NEERC) two years ago. In this regard, in January of this year I was awarded the badge of Presidential Fund for Social Support of Gifted Students.

In addition to BSU I am studying at the Yandex school of data analysis — additional courses from researchers and senior programmers from Yandex (the largest search engine in Russia, 4th largest search engine worldwide). I took a two semester long course on machine learning, one semester on NLP and a course on image processing. The courses had lectures about machine learning algorithms history, details, implementation and practical homeworks. In the school we are focused on writing our own code for data analysis using python, sklearn, pandas, numpy, scipy, cv2 and matplotlib. Since then Python became my favorite and most proficient programming language.

Several assignments from the past year include:

  • implementing your own SVM using convex optimization library(cvxopt),
  • analyzing different ways to build a feature set for factor machine(libFM) on a movie recommendation system dataset,
  • building an ensemble of classifiers for kaggle competition extracting features(SIFTs) from bunch of images, clustering them,
  • building histograms of sifts and using it for simple yet powerful image search engine.

Also I tried my skills in kaggle competitions. I participated in it during I was studying at Yandex School of Data Analysis. But this competition was internal and that is the reason why I give you url on the details of that challenge.

I am fond of singing and always take part in different concerts. I am the member of on-stage performance faculty group. In this connection, our group held series of musical events for children from children's home.

Astronomy is my hobby and for the past several years I have been attending weekly lectures and astronomy news at my local planetarium, playing with stellarium at home and even visiting “belarusian astronomy festival” — local conference on astronomy. So, as you can see, I am very excited about combining data analysis and astronomy. That’s why I have started contributing to Gammapy!

Read more about work that I done at the very end of application — I have pushed three Pull Requests.

Project Proposal Information

Proposal Title: "Implement image processing and source detection methods in Gammapy"

Proposal Abstract

Gammapy is a Python package for professional gamma-ray astronomers to analyse data from space-based gamma-ray telescopes (like Fermi-LAT) as well as ground-based gamma-ray telescopes (such as HESS or CTA). It is a very young project, so there’s a lot to do. The project I will be working on consists of gammapy.image, gammapy.detect (using gammapy.image as a building block) and making small modules and improvements such as source extraction or cluster detection tools. Working on the tests and documentation will be a significant part of my project, making both new and existing code documented and increasing coverage.

Proposal Detailed Description

Gammapy.image

We will start to work with gammapy.image because during the work with gammapy.detect we will use gammapy.image package methods as building blocks.

Gammapy.image contains classes and methods for image analysis of gamma-ray data. It also includes general image processing routines. This package consists of large amount of methods but they are poorly documented, outdated and have very low test coverage. For instance, many of them don’t have usage examples and haven’t got full description.

The image module is a building block for my other tasks, which is why the first and the main target of my work is to review existing functions, change their API to be more consistent -- most of the standalone functions should be attached to SkyMap (the data container class for map based gamma-ray data that combines raw 2D data arrays with FITS file format) or SkyMapCollection.

Another important image class is SpectralCube. It is a part of gamma-ray data. The most image processing functions might be useful for cubes. This is why I also want to adapt existing image functions for cubes.

Gammapy.detect

The second goal is improving gammapy.detect. The gammapy.detect submodule includes low level functions to compute significance and test statistics maps as well as some high level source detection method prototypes. In this submodule I plan to research and complete Iterative source detector -- source detection method that iteratively computes significance maps using multiple scales, adding detected sources from i iteration into the background model for i+1 iteration. To finish the work on this module I want to improve test coverage and fully document the algorithm.

As a part of my work on gammapy.detect I need to rewrite gammapy.detect.CWT - Continuos Wavelet Tranform. There are good and fast existing open-source libraries for mathematical operations, so I want to make a wrapper instead of writing one from scratch, documenting and testing it.

Gammapy.scripts

In conclusion of my work on the gammapy.image package I want to expose functionality as command-line tools gammapy.scripts (sub-package for image and detect command line tools). Command-line tools are important as they make it accessible for people who don’t know Python and make it quicker to use for common tasks.

Also I would like to add simple tools to do automatic source extraction, i.e. make a source catalog with estimated source position, extension, flux. According my work schedule I am planing to have two reserved weeks at the end of GSoC. If there's time I'll implement and test one of the cluster detection methods that have been used in gamma-ray astronomy (see these papers [1] [2] [3])

The work on gammapy.scripts is between working on a gammapy.image and gammapy.detect.

The importance of tests and documentation

Most of the time I am going to spend for implementing and debugging tests and writing documentation, because it is also a considerable part of my work. Before application deadline I’ve made pull requests with py.test and sphinx to get familiarity with tools.

Timeline

In the following timeline, I'm not listing time for tests and docs separately. I will work on code, tests and docs continuously throughout GSoC as they are important part of this project.

  • Week 1-2 (May 23) Go through all the existing functions of gammapy.image, change their API to be more consistent - most of the standalone functions should be attached to SkyMap\SkyMapCollection.

  • Week 3-5 (June 6) Make existing image function work for spectral cubes. Expose commonly used functionality of gammapy.image as command-line tools. Complete mid-term evaluations.

  • Week 6-7 (June 27) Start working on gammapy.detect. Add simple tools to do automatic source extraction, i.e. make a source catalog with estimated source position, extension, flux.

  • Week 8-9 (July 11) Finish the Iterative Source Detector.

  • Week 10-11 (July 25) Rewrite gammapy.detect.CWT - Continuos Wavelet Tranform (try to make a wrapper on some existing open-source library).

  • Week 12-13 (August 8) These two weeks will serve as a buffer period in case earlier steps of the proposal require more time, unforeseen difficulties arise, etc. If everything runs smoothly and the buffer period is unnecessary, then work on adding an event cluster detection method.

Previous contributions to Gammapy

Before the application deadline I have made three pull requests to gammapy, related to my gsoc project.

The first one fixed a bug in gammapy.detect image_ts and significantly improved documentation for the main gammapy.detect page, using sphinx

Link: https://github.com/gammapy/gammapy/pull/456

The second one added unittest for the console utility, using py.test

Link: https://github.com/gammapy/gammapy/pull/456

And I have added a reference data files for the tests above

Link: https://github.com/gammapy/gammapy-extra/pull/31

These pull requests are similar to what I am going to do during coding period and I’ve gained understanding of core gammapy classes and common utilities such as sphinx or py.test.

Clone this wiki locally