Skip to content

Latest commit

 

History

History
1189 lines (678 loc) · 90.6 KB

tools.md

File metadata and controls

1189 lines (678 loc) · 90.6 KB

Objectives Outline

5A. Explain why open science tools encourage responsible open science (e.g., using the FAIR and CARE principles)

5B. Identify Open Science communities and initiatives - within and across disciplines - and join a community of practice (CoP) of interest to you

5C. Provide examples of how open science is practiced in a research team

5D. Identify types of Open Science tools along with their purpose

5E. Match appropriate open science tools to specific objectives within the research workflow

5F. Describe 3-5 open science tools and how to use them in projects (e.g., for communication, sharing of results, and collaboration).

Lessons Outline

Lesson 1: Introduction to Open Science tools

  • Definition: What do we mean by “Open Science tools”?
  • What’s the difference between ‘open’ and ‘closed’ tools? Why use Open Science tools?
  • How do Open Science tools fit into the research lifecycle?
  • How do Open Science tools address responsible practices?

Lesson 2: Open Science tools across the research lifecycle

  • Open Science tools for protocols
  • Open Science tools for data
    • Tools for Data Management Plans
    • Sharing data with your (research) team
    • Data repositories
  • Open Science tools for code
    • Collaborative development tools
    • Code repositories
  • Open Science tools for results
  • Open Science tools for authoring
    • Collaborative writing tools
    • Reference management tools
    • Publishing Open Science and Open Access

Lesson 3: Open Science tools for reproducibility

  • What is reproducibility?
  • Computational notebooks
    • Jupyter
    • R Markdown
    • Quarto

Lesson 4: Practicing open science in a team

  • Team Open Science Practices
    • Build Team and Align Trust, Expectations, and Conduct
    • Work As Collaboratively, Transparently, and Openly as Possible
    • Establish Team Tasks and Responsibilities
    • Review Ethical Concerns
    • Establish Team Communications
  • Resources and Team Guidelines Checklist
    • Establish Common Team Resources
    • Use Reminders and Milestones to Manage and Track Data and Digital Objects
    • Improve Practices Through Use and Feedback
  • Team Results Preservation Checklist
    • Plan to Preserve and Share for the Long-term
    • Preserve the Research/Project Components as open and FAIR as possible
    • Manage a Project Registry (or Directory) for the Outputs

Lesson 5: Open Science communities

  • Why engage with Open Science communities?
  • What is a Community of Practice (CoP)?
    • Communities list
  • How to engage with Open Science communities
    • Pathways for contribution
    • Pathways for collaboration
    • Pathways for engagement
      • Case Study: FORRT
  • How to build and lead a community
    • Guidelines for building communities
    • Mountain of engagement

Lesson 1: Introduction to Open Science tools. (What are Open Science tools? Why use Open Science tools? How do Open Science tools fit into the research lifecycle?)

This lesson is the first of OpenCore Module 5: Open Science Tools and Resources. This Module provides a collection of tools that are available to increase the visibility and discoverability of your project. It complements the previous OpenCore Modules (Ethos of Open Science, Open Data, Open Software, and Open Results) by enhancing the practical implementation of the Open Science concepts explained previously. While earlier modules focused on the concepts, advantages, and disadvantages of responsible Open Science practices, this module will focus more on the practical applications of responsible Open Science practices. We focus on a few key tools, and highlight how they fit across the research lifecycle.

In this first lesson, you will be introduced to the _What _and the Why of Open Science tools. First, we provide a definition of Open Science tools. Second, we discuss the differences between ‘open’ and ‘closed’ tools and highlight the advantages of using open tools. Third, we elaborate on the research lifecycle, and show how Open Science tools fit into a researcher’s project workflow.

What do we mean by “Open Science tools”?

We use the word “tools” to cover any type of resource or instrument that can be used to support your research. In this sense, tools can be a collection of useful resources that you might consult during your research, a software that you could use to create and manage your data, or even a human infrastructure, such as a community network that you could join to get more guidance and support on specific matters.

In this context, Open Science tools are any tools that enable and facilitate openness in research, and support responsible Open Science practices. It is important to note that Open Science tools are very often open source and/or free, but not necessarily.

What’s the difference between ‘open’ tools and ‘closed’ tools? Why use Open Science tools?

One can intuitively grasp the difference between open and closed in relation to the “tools”, thinking of openness in terms of exchange with the environment. One should bear in mind that it is not a black and white separation, but rather a spectrum of options.

When speaking of useful resources that you can re-use - such as text, visuals, audio, video - it is important to pay attention to the license on the possibilities and conditions for re-use. Lack of indication of a license leads to impossibility to re-use the material. As indicated in 🔗 Module 1 Ethos of Open Science, Lesson 5🔗, Creative Commons licenses is one of the most common set of open licenses given to written content of any kind, allowing re-use and requiring attribution, with a spectrum of openness, from least to most open (or CC0, equivalent to public domain).

Software can be proprietary (“closed”) or open source. It is called open source when the original source code is made freely available and may be redistributed and modified. Generally, software has a separate set of licenses designed specifically for code projects that covers both the open distribution of the code itself as well as executable versions of the program which non-programmers can run. More information and details on open software can be found in the 🔗Open Software Module🔗.

Human infrastructure refers to a network of relationships between stakeholders interested in the conduct and outcomes of responsible Open Science (more on those stakeholders can be found in 🔗Module 1, Lesson 3🔗). Communities – or groups of people who share a geographical location, affiliation, common interest, or practice – play a key role in the human infrastructure aspect of open science. As everything else, communities can vary in their degree of openness. A community can take the form of a mailing list, conference, meet-up or messaging app as a way to stay in touch. In that case, being open would imply that anyone could join the community and be welcomed to speak, decisions would be made transparent, and communications are largely public. On the other hand, a closed community implies that membership is restricted by invitation and/or a fee, resources and communications are not public, and decision processes are not necessarily transparent. More ideas on how to increase participation of stakeholders and how to build and lead inclusive communities can be found in 🔗Module 1, Lesson 3🔗 and 🔗this module, Lesson 4🔗.

Activity/exercise

Now let’s practice by looking at some typical case studies and solutions, reflecting on the benefits and obstacles of open and closed tools.

Case study #1: Closed vs open resources

Case study #2: Closed vs open software

You are a researcher who has been using a proprietary MATLAB platform to analyze data and create models. You are getting a new job, at a different institution. Unfortunately, the new workplace does not have a license for MATLAB, therefore you cannot access your own code and data, stored in the proprietary file formats, and moreover, cannot continue your routine workflow with analysis. What are your options now?

  • You can purchase individual license for this proprietary software, or persuade the institute to purchase a group or campus-wide license
  • You could consider using open source alternatives for programming and numerical computing, such as GNU Octave, Sage, or even Python programming language and its scientific packages. It would not only save you money now, but provide the continuity of the tool - if you move again, to a different institution.

Case study #3: Closed vs open communities

  • Example:

Open science tools provide numerous benefits, many of which have been discussed in the previous modules. For example, they can help you collaborate openly and share easily; organize and manage your work; track how your work is treated and shared; and follow leading responsible Open Science practices.

Open Science practices enable easier access to existing tools and resources that promote collaboration between professionals with similar interests and research objects. For example, someone in Asia wanting to study Central African rainforest species could visit an online species database made available by other scientists. Despite their physical distance, many reasons lead to inequality in access to scientific resources, from institutional barriers to paid content.

There are efficient and coordinated ways to share resources in general. One of them is using 📖version control 📖, which is a system to keep track of any changes made to one or more files over time. That also serves as a backup for your work.You might have already done that – for example, if you ever used Google Docs. It stores a version of your work as you type it, and you can invite other users to work collaboratively in the same document, keeping record of all changes made by all users.

One broadly used tool for version control is Git. It enables version control either online or on the user’s machine [see https://git-scm.com/]. Related services include GitHub, Gitlab, and Bitbucket. Information is stored in online repositories where people can clone, edit, and review each other’s content.

Another way to share your work is by using standardized 📖workflows📖. A standardized workflow is typically a sequence of steps commonly used for a given purpose, such as accessing and manipulating genomic data. A good open science practice, then, is to share those workflows in platforms such as https://galaxyproject.org/ – which allows any user to replay those steps right there for free, quickly and easily. That and other similar services enable you to show a step-by-step overview of what other researchers did, build on their work, and share your new ideas.

Including 📖metadata📖, the data that describes your data, can significantly enhance the findability of your research object. Some examples of metadata are the keywords associated with a publication, the time range and instrument name of a given observational data set, and the ORCID number for a given person. Metadata is a tool that search interfaces use to more quickly find a resource. In fact, Google uses a metadata language called ‘Schema.org’ to build its search algorithm (see https://schema.org/ for more information).

Many research fields have their own metadata standards (e.g. SPASE for space physics: https://spase-group.org/data/), but remember that each website you use has something similar behind that magnifying glass button. Taking the extra time to include some basic descriptors for your research object can make your contribution to your research field much more findable. The same way finding someone else’s work on the Internet might help you, making your own work more discoverable is a great contribution to Open Science!

Next, we’ll highlight how open science tools and resources fit in the research lifecycle.

How do Open Science tools fit into the research lifecycle?

The complex nature of research in the modern scientific community – involving multiple stages, steps, contributors, and stakeholders in the process – benefits from certain frameworks and definitions to structure, organize, and somewhat standardize the research process for the sake of responsible and reproducible practices.

The 🔗Open Results🔗 module introduced you to the definitions and nine stages of the research lifecycle and workflow. Let’s define these terms again.

Research framework

Research workflow

Research lifecycle

There is quite some theory behind the models for research frameworks, lifecycles, and workflows (REF), including linear, circular, multi-loop, and multi-step flows. For the sake of clarity and pragmatism of mapping the Open Science tools used within the research lifecycle, we will consider a concise 6-stage spiraling model for the research workflow, covering discovery, analysis, and writing as well as publication, outreach, and assessment (see Fig.)

>>>>> gd2md-html alert: inline image link here (to images/image1.png). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>>

alt_text

Ref:https://figshare.com/articles/presentation/Of_Shapes_and_Style_visualising_innovations_in_scholarly_communication/3468641

Most steps of the research workflow are supported by online applications (Kramer and Bosman, 2016). These digital (Open Science) tools have actually influenced the way in which we perform and share research, opening it up to a global audience.

Open Science tools can be used for

  • Discovery: Tools for finding content to use in your research
  • Analysis: Tools to process your research output, e.g. tools for data analysis and visualization
  • Writing: Tools to produce content, such as Data Management Plans, presentations, and pre-prints
  • Publications: Tools to use for sharing and/or archiving research
  • Outreach: Tools to promote your research

The usage of such tools by researchers across different disciplines has been surveyed and reviewed in several efforts (Kramer and Bosman, 2016, Bezuidenhout and Havemann, 2021). Numerous digital tools have been mapped on the “discovery, analysis and writing, publication, outreach, and assessment” stages of the research lifecycle (see Fig). As we saw in the previous section, all tools have varying degrees of openness. Purposefully choosing tools to use at each stage to increase transparency, findability, and reproducibility, you are able to construct and define your research workflow in alignment with responsible Open Science practices. As was discussed in Module 1, Ethos of Open Science, open should not be a thoughtless default or afterthought, but included into the design and inception of the research project. Your choice of Open Science tools can be individual, but most often it would benefit from group discussions within your research team, institution, and communities of practice.

>>>>> gd2md-html alert: inline image link here (to images/image2.png). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>>

alt_text

Note: the concepts of workflow and lifecycle are widely used and applied to parts of the research, e.g. data. Data workflow, data lifecycle are discussed in depth in 🔗Lesson X of the Module Open Data🔗.

How do Open Science tools address responsible practices?

The 🔗Open Data and Open Results🔗Modules introduced the concept of FAIR principles and discussed how their application according to best practices can increase the visibility and uptake of our research.

Let’s refresh the terms:

  • FAIR Data Principles - Findable, Accessible, Interoperable, & Reusable. Wilkinson et al. (2016) provided FAIR Guiding Principles for scientific data management and stewardship; Hong et al. (2022) establish FAIR principles for research software.
  • CARE Principles - Collective Benefit, Authority to Control, Responsibility, & Ethics. Carroll et al. (2020) established the CARE Principles for Indigenous Data Governance, complementing the FAIR data principles.

>>>>> gd2md-html alert: inline image link here (to images/image3.png). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>>

alt_text

Best practices to implement these principles include describing data using metadata standards and controlled vocabularies, assigning licenses, and uploading data to repositories that allow for creation of “📖persistent identifiers📖”. Examples of useful Open Science tools include:

  • Data Management Plan (DMP) tool, which allows you to create and share your data management plans to meet funder requirements and as a best practice for managing your data (link to website, to Lessons)
  • Data Repositories, which assign persistent identifiers to your data (example or link)
  • Tools for integration research management with DMPtool and repositories (example or link)
  • Communities - national and international, discipline-specific, or open science-centered - can be of incredible value in curating resources and building communities of practice for researchers and other stakeholders in adopting FAIR principles. Examples include the FAIR Data Forum https://fairdataforum.org/ and the Research Data Alliance (RDA) https://www.rd-alliance.org/

Working within the ethos of the FAIR and CARE principles can help to ensure that research is accessible, inclusive, ethical, and responsible. More about FAIR principles and practical steps to make your data FAIR can be found here: https://www.go-fair.org/fair-principles/

Self-Assessment: Questions for reflection:

  1. Assessment of your (open science) tools and resources

Most probably you are already using some tools and resources, even if you are new to open science practices. Here we invite you make a preliminary revision of them:

  • Think of all the tools and resources you use in your study/research/work and rely on - resources (content with text/media), software and communities. Think of all stages of your research - discovery, analysis, writing, publication, outreach and assessment.
  • Tools have varying degrees of openness, dictated by various factors. Imagine (or draw) the scale from 0 to 10, where 0 stands for completely closed and 10 for completely open.
  • For which of the tools (from categories of resources, software and communities) place it on the scale on a number that reflects the degree of openness.
  • How many tools do fall towards the lower part of the scale (0 to 4)? Take a moment to reflect if these tools are in line with your actual preference, goals and necessities in the long-term run.
  • Perform a quick search using search engine or this open dataset of Open Science tools (https://kumu.io/access2perspectives/dost#dataset) for more open alternatives (e.g. free, open source) and jot them down “for your information”.

In the next lessons we will introduce you to various tools, which you may not have heard yet. Stay tuned!

  1. Suggestion: Add an exercise/ question for reflection in FAIR/CARE

Lesson 2: Open Science Tools across the Research Lifecycle

In the first lesson, we briefly defined Open Science tools, distinguished open from closed tools, and highlighted the advantages of Open Science tools. We also gave a brief introduction to the Research Lifecycle, and discussed how open tools fit in this workflow. In this second lesson, we’ll highlight a few key tools for each aspect of the research lifecycle.

In this module, we’ll focus on the following elements of the project workflow rather than distinct research stages, because many tools support more than one stage. We will cover tools specifically for protocols; data; code; results; and authoring. We’ll only highlight a few tools; more tools and resources are currently available than we could possibly list (see Figure below).

>>>>> gd2md-html alert: inline image link here (to images/image4.jpg). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>>

alt_text

Ref: http://46eybw2v1nh52oe80d3bi91u-wpengine.netdna-ssl.com/wp-content/uploads/2021/12/Data-and-AI-Landscape-2021-v3-small.jpg

Open Science tools for protocols

In the last decades, we have seen an avalanche of development of the tools for management of research projects and laboratories, which address the ever-increasing need for speed, innovation, and transparency. Such tools are developed to support collaboration, ensure data integrity, automate processes, create workflows and increase productivity.

Some research groups have been adapting commonly used project management tools for their own team needs, such as Trello, a cloud-based online tool. Such software facilitates sharing materials within the group and managing projects and tasks, while allowing space for some customization.

Platforms and tools, which are finely tuned to meet researchers' needs (and frustrations), have appeared as well, often founded by scientists - for scientists. To give you a few examples, let’s turn to experimental science. A commonly used term and research output is📖 protocol📖.

Protocol can be defined as “A predefined written procedural method in the design and implementation of experiments. Protocols are written whenever it is desirable to standardize a laboratory method to ensure successful replication of results by others in the same laboratory or by other laboratories.” (REF According to the University of Delaware (USA) Research Guide for Biological Sciences)

In a broader sense, protocol also comprises documented computational workflows, operational procedures with step-by-step instructions, or even safety checklists.

Protocols.io (https://www.protocols.io/) is an online and secure platform for scientists affiliated with academia, industry and non-profit organizations and agencies. It allows them to create, manage, exchange, improve, and share research methods and protocols across different disciplines. This resource is useful for improving collaboration and recordkeeping, increasing team productivity, and even facilitating teaching, especially in the life sciences. In its free version, protocols.io supports publicly shared protocols, while paid plans enable private sharing, e.g. for industry.

Some of the tools are specifically designed for open science with an open by design idea straight from the beginning, and aim to support the research lifecycle at all stages, and allow for integration with other open science tools.

Most prominent one includes Open Science Framework (OSF), developed by Center for Open Science (link). OSF is a free and open source project management tool that supports researchers throughout their entire project lifecycle through open, centralized workflows. It captures different aspects and products of the research lifecycle, including developing a research idea, designing a study, storing and analyzing collected data, and writing and publishing reports or papers.”

OSF is designed to be a collaborative platform where users can share research objects from several phases of a project. It serves as support for a broad and diverse audience, including researchers that might not have been able to access so many resources due to historic socioeconomic disadvantages. OSF also contains other tools in its own platform:

“While there are many features built into the OSF, the platform also allows third-party add-ons or integrations that strengthen the functionality and collaborative nature of the OSF. These add-ons fall into two categories: citation management integrations and storage integrations. Mendeley and Zotero can be integrated to support citation management, while Amazon S3, Box, Dataverse, Dropbox, figshare, GitHub, and oneCloud can be integrated to support storage. The OSF provides unlimited storage for projects, but individual files are limited to 5 gigabytes (GB) each.”

(maybe a note on preregistration offered by OSF, which can be powerful)

Open Science tools for data

“Research data means any information, facts or observations that have been collected, recorded or used during the research process for the purpose of substantiating research findings. Research data may exist in digital, analogue or combined forms and such data may be numerical, descriptive or visual, raw or processed, analyzed or unanalyzed, experimental, observational or machine generated. Examples of research data include: documents, spreadsheets, audio and video recordings, transcripts, databases, images, field notebooks, diaries, process journals, artworks, compositions, laboratory notebooks, algorithms, scripts, survey responses and questionnaires.” Ref: https://policy.unimelb.edu.au/MPF1242#section-5

Data is the one type of research object that is universal. Sharing your datasets publicly allows other researchers (and you!) direct access to the data to allow further study.

Tools for Data Management Plans

Every major research foundation and federal government agency now requires scientists to file a data management plan (DMP) along with their proposed research plan. Data as research in its whole, and as other elements (code, publication) have their own lifecycle and workflow, which needs to be in the plan. DMPs are a critical aspect of Open Science and they help keep other researchers informed and on track throughout the data management lifecycle. DMPs that are successful typically include a clear terminology about FAIR and CARE and how they will and are applied.

The data management lifecycle is typically circular. Research data are valuable and reusable long after the project's financial support ends. Data reuse can extend beyond our own lifetimes. Therefore, when designing a project or supporting an existing corpus of data, we need to remain cognizant of what happens to the data after our own research interaction ends.

There are a few Open Science resources available to get you started and to keep you on track. The DMPTool https://dmptool.org/ in the US helps researchers by using a template which lists each funder’s requirements for specific directorate requests for proposals (RFP). The DMPTool also publishes other open DMP from funded projects which can be used for improving your own DMP. The Research Data Management Organizer (RDMO) enables German institutions as well as researchers to plan and carry out their management of research data. ARGOS is used to plan Research Data Management activities of European and nationally funded projects (e.g. Horizon Europe, CHIST-ERA, the Portuguese Foundation for Science and Technology - FCT). ARGOS produces and publishes FAIR and machine actionable DMPs that contain links to other outputs, e.g. publications-data-software, and minimizes the effort to create DMPs from scratch by introducing automations in the writing process. OpenAIRE provides a guide on how to create DMP.

Sharing data with your (research) team

Data repositories

Originally data repositories appeared in different disciplines of research around the needs of research communities and dataset types, such as _Protein Data Dank _(PDB) https://www.rcsb.org/ for 3D structures of proteins and nucleic acids, or Genbank - NIH genetic sequence database, containing annotated publicly available nucleic acid sequences. Another example is a public repository of microscopy bio-image datasets from published studies, The Image Data Resource (IDR) (ref). _The Electron Microscopy Public Image Archive (_EMPIAR) https://www.ebi.ac.uk/empiar/, is a public resource for raw cryo-EM images. OpenNeuro https://openneuro.org/ is a open platform for validating and sharing brain imaging data. These tools enable easy access, search, and analysis of these annotated datasets.

As noted in Lesson 2, open science tools such as data repositories should ensure the guidelines for FAIR data, mainly attribution of persistent identifies (e.g. DOI), metadata annotation, machine-readability.

Data repositories that include FAIR principles and work across borders and disciplines include Zenodo (https://zenodo.org/), funded by the European OpenAire project and hosted by CERN. It is probably one of the most known and widely used, as it has an easy interface, support of community curation, and allows depositing diverse types of research outputs - from datasets and reports to publications, software, multimedia content.

The main drawback for this choice is that Zenodo is relatively lacking in documentation and metadata; a dataset stored on this site is not as easily findable or visible to the community compared to storing the data at a domain-specific repository (e.g. EarthData: https://www.earthdata.nasa.gov/, BCO-DMO for marine ecosystem research data, or Environmental Data Initiative for environmental or ecological data), or a cross-domain repository (e.g. DataOne: https://www.dataone.org/).

Noted exceptions to this rule include communities hosted on Zenodo that curate their materials to enhance findability (e.g. Open Science Community Saudi Arabia (OSCSA): https://zenodo.org/communities/1231231664/?page=1&size=20, Turing Way community: https://zenodo.org/communities/the-turing-way/?page=1&size=20). More on the role and power of communities will be covered in Lesson X (communities).

Another example of a non-profit data repository is Dataverse https://dataverse.org/, hosted by Harvard University. The Dataverse Project is an open source online application to share, preserve, cite, explore, and analyze research data, available to researchers of all disciplines worldwide for free.

The Dryad Digital Repository https://datadryad.org/ is a curated online resource that makes research data discoverable, freely reusable, and citable. Unlike previously mentioned tools, it operates on a membership scheme for organizations such as research institutions and publishers.

_Datacite _https://datacite.org/ is another global non-profit organization that provides DOIs for research data and other research outputs, on a membership basis.

Data services and resources for supporting research require robust infrastructure which relies on collaboration. Some examples of initiatives on the infrastructures of data services include The EUDAT Collaborative Data Infrastructure (or EUDAT CDI) https://www.eudat.eu/, sustained a network of more than 20 European research organizations,

Private companies as well host and maintain online tools for sharing research data and files. Figshare https://figshare.com/ is one of the examples of a free and open access service, giving a DOI for all types of files and recently developing a restricted publishing model to accommodate intellectual property (IP) rights requirements. It allows sharing the outputs only within a customized Figshare group (could be your research team) or with users in a specific IP range. Additional advances include integration with code repositories, such as GitHub, GitLab, and Bitbucket.

GitHub https://github.com/, owned by Microsoft, is often the default data repository for coders. It allows collaborative work, version control, project management, and is widely used by researchers for uploading datasets, files, notes, hosting simple static webpages to showcase their achievements. Github does not give you a DOI, but allows you to state the license for re-use and ways to cite your work.

Much more research data repositories could be found in the publicly open Registry of Research Data Repositories https://www.re3data.org/. OpenAire-hosted search engine https://explore.openaire.eu/search/find/dataproviders provides a powerful search function of data and repositories, with country, type, thematic and others filters, and enables downloading of the data.

Caution: Amount of data, repositories and different policies can be overwhelming. When in doubt, which repository is for you, make sure you consult librarians, data managers and/or data stewards in your institution, or check within your discipline-specific or other community of practice.

Open Science tools for code

If your project involves coding, such as custom analysis code, you can share it or collaborate using tools such as Jupyter Notebooks. These notebooks can be shared with a variety of permissions on JupyterLab, Google Colab, and similar websites. For a more permanent solution, you can use containerized environments to share the entire analysis environment, which includes the installed software packages, the data used, all custom analysis and plotting routines, and even the publication draft. A few examples of containerized environment services are DeepNote and Binder (DeepNote: https://deepnote.com/, Binder: https://mybinder.org/).

Collaborative development tools

Code repositories

  • Github
  • GitLab
  • BitBucket
  • SourceForge

Open Science tools for results

  • Visual tools for graphs, dataviz, sharing

Open Science tools for authoring

Collaborative writing tools

One of the commonly used processes in research is creation and editing of documents, such as meeting notes, conference abstracts, manuscripts, checklists etc.

Collaborative editing process has become really easy with online tools like Google Docs, Bit AI and others, because of their easy interface and version history. However, these tools are proprietary, so not fully open.

Open-source, web-based collaborative tools for editing include tools such as Etherpad https://etherpad.org/, HackMD https://hackmd.io/ and HedgeDoc https://hedgedoc.org/ (formerly known as CodiMD). These editors use a Markdown language, lightweight markup language, for creating formatted text for the web. It has a simple syntax, and therefore allows more users to be engaged and focus on content, including graphics, tables, lists. Moreover, Markdown is useful when creating documentation in GitHub, as we discussed in the previous sections, commonly used data and code repository and collaboration space.

LaTex / TeX markup language provides a steeper learning curve, but allows much more nuanced features for scientific and technical documentation, such as formatting of books, articles, mathematical formulas etc. Collaborative online tool utilizing LaTex is called Overleaf https://overleaf.com/, and it is widely used in the research community to share and edit LaTex files.

Reference management tools

At the Discovery and Publication stages of the research lifecycle reference management tools are particularly useful to search for publications, collect and organize them, annotate, cite, and share. Such tools should facilitate your research workflow by easy addition/import of references, bibliography construction, adaptation to various citation styles requested by different journals/publishing houses.

_EndNote _is a citation manager tool owned by Clarivate Analytics. However, it is proprietary software and not free for researchers (closed tool), so it is beyond our interest.

Mendeley https://www.mendeley.com/ - now owned by publisher Elsevier, is a free software with very similar functionality.

Zotero https://www.zotero.org/ is an open-source and independent organization-hosted online tool.

Both Zotero and Mendeley tools allow easy addition of the publication from the browser or file upload, offer compatibility with major editing tools (like Microsoft Word, OpenOffice, LaTex but not fully with Markdown-based online tools). Important feature of reference management tools is groups and collections of articles (libraries), which can be shared and therefore, provide capabilities of social networking and communication among researchers (community of practice).

Publishing Open Science and Open Access

📖Open Acess📖 is a set of principles and practices that make research publications freely available to anyone. Here we will focus on open access implementations both in the peer-reviewed journal publications and preprints uploaded on repositories.

When the data, workflows, or any results of your investigation are ready to be shared as publications, they can be uploaded to certain open websites. Many scientific journals and websites require payment for accessing materials, but a growing number now offer open access publications where the author is charged an additional fee (e.g. AGU publications: https://www.agu.org/Publish-with-AGU/Publish/Open-Access).

We discourage publishing in a journal that is not open access because it prevents researchers from marginalized groups from participating in knowledge sharing. In the case of open science platforms, one can usually share research objects for free (e.g. Zenodo: https://zenodo.org/ and FigShare: https://figshare.com/). Example research objects include executable notebooks, software packages, pre-prints, figures, presentations, and datasets.

Journals usually provide peer review for submitted manuscripts, and after acceptance and publication, there are few options to ensure an open access to the article. It is important to carefully choose the journals with suitable open access publishing models.

Here we list different types of Open Access (OA) publishing models, how to find out which type of Open Access model journals use and where publishing costs are associated.

  • Closed Access/Subscription Journal: This is a traditional publication, where the reader (or their institution’s library) pays a subscription fee for a year’s access to the journal contents. The Subscription can be physical and/or digital. Many journals have reduced the print copies; some are digital only and some can be print and digital, both. Subscription can also be pay-per-article instead of complete journal contents subscription.
  • Gold OA: This form of Open Access requires Article Processing Charge (APC), which may be paid by author(s) or a funding body. The final published version or record is immediately freely available & accessible in the journal by the publisher. The article is freely accessible under a Creative Commons license.
  • Green OA: There is an embargo period set by the journal’s publisher such as 6, 12 or 24 months. The version of the manuscript is freely available in a repository. No charges are paid.
  • Delayed Open Access: In the subscription journals, the publisher provides free access to online articles at the expiry of a set embargo period.
  • Hybrid: In the subscription journals, author(s) have an option to make their article Open Access but it has significantly higher open access publication fee in comparison to **_GOLD OA journals; _**other articles remain toll access (articles behind paywall).
  • **Gratis OA: **Publisher(s) optionally offering articles free to read at no charge to the author. This form of OA may be temporary and may be done for promotional purposes.
  • **Libre OA: **Publisher(s) offering articles free to read and permission to re-use, share under Creative Commons licenses.
  • **Diamond OA: **The journals/publishers charge no fee/Article Processing Charge (APC) by author(s) to publish. The readers are also free to access and read the articles. Hence, publishers charging no fee are normally funded by external sources like learned societies, funding associations, government grants, academic institutions.

Caution: There are also **predatory journals and publishers, **who advertise open access but are but are not part of responsible open science.

  • Open access doesn't guarantee journal quality
  • Open access doesn't imply that author(s) can pay to publish without any editorial and/or scientific review.
  • Open access does not always require payment from author(s).

Please see COPE discussion document on Predatory Publishing and refer to leading indexing databases such as Clarivate Journal master list, Scopus Journal search, DOAJ, Sherpa Romeo.

Directory of Open Access Books provides access to scholarly peer reviewed open access books.

Many journals with Closed Access/Subscription model provide you permission to publish manuscripts on repositories, even before submitting to the journal. Such manuscripts without peer review are called 📖preprints📖. Journals usually state the policies on their websites in regards to preprints.

Speaking of open science tools, Sherpa Romeo platform https://v2.sherpa.ac.uk/romeo/ is a valuable online resource that aggregates publisher open access policies from around the world and provides summaries of publisher copyright and open access archiving policies in one place.

ArXiv is one of the oldest preprint repositories (since 1991), used by physicists and mathematicians. Nowadays, there are numerous preprint repositories, each for every discipline and community. Non-exhaustive list include severs of ChemRxiv – a preprint repository for papers in chemistry, BioRxiv – for preprints of research in biology and life sciences, MedRxiv – in health sciences, PsyArXiv – in psychology, SocArXiv - in social sciences, engrXiv - in engineering.

Local open access knowledge and dissemination is maintained and enhanced by communities servers like AfricArXiv, a community-led digital archive for African research and - the most recent - Jxiv, Japan-specific preprint repository.

Many of country- and discipline-specific smaller “Rxivs” are run by volunteers around the world, but the servers are hosted online by the non-profit Center for Open Science. Substantial costs pose the question of sustainability of maintaining the repository, and some of the repositories like IndiaRxiv closed down but were able to relaunch.

Preprints concept and infrastructure allow researchers to disseminate their results months to years ahead of final traditional journal publication. This definitely accelerates progress of science, which is crucial during societal challenges like e.g. COVID-2019 pandemics. However, lack of peer review is reducing the impact of the publication in terms of its rigor and credibility.

Here we will cover some of the key tools that use community/crowd to evaluate and curate the preprints by providing transparent feedback and peer review.

  • F1000Research https://f1000research.com/ has been the first open research publishing platform allowing for rapid publication of research articles and other outputs with transparent peer review, without editorial bias.
  • PREreview https://prereview.org/ is a platform encouraging early career researchers to provide peer review to preprints, with a mission to increase equity and transparency in scholarly communications.
  • ASAPbio https://asapbio.org/ stands for Accelerating Science and Publication in biology. It is a major crowd-sourced peer review by scientists in the life science discipline.
  • T_he PubPeer _https://pubpeer.com/ is an online platform for post-publication peer review, “online journal club”, as the founders name themselves.
  • Sciety https://sciety.org/ is an online platform for public evaluation of preprints, and allows self-organization of peer review groups.

Case study: SciPost https://scipost.org/ is a scientific publication portal managed by the SciPost Foundation, in the hands of the academicof academic community, by scientists. It is 100% online, offers global, open access and free research publications. As of 2022, it hosts around 10 journals in disciplines of Physics, Chemistry, Astronomy and some others. Submissions can be made directly or via preprint from well establish preprint repository arXiv. The peer review is provided by professional scientists (=with PhD and beyond) - anyone could register and serve, the reviews and author responses are published as well. Unlike most publishing houses, it is entirely not-for-profit, not charging any subscription fees to its readers, not charging any publication fees to its authors. The business model is based on the sponsorship from research institutions and foundations, and all agreements and subsidy amounts are openly shared on the website. Does it seem too idealistic?

Question for reflection:

  • What are the limiting factors to developing and maintaining Open Science tools?
  • What are the advantages and disadvantages for working with Open Science tools?
  • What are your next 3 simple steps you could take to increase the openness of the research tools in your practice?
  • What is the future of scholarly communications that embraces responsible Open Science practices? Check the Ethos Module, if necessary.
  • How does the publication workflow should look to provide the robust, rapid and transparent communication of research results - to the peers, wide scientific community, public, policymakers?

Lesson 3: Open Science tools for reproducibility

SEE CONTENT OF THIS LESSON AT https://tyson-swetnam.github.io/TOPS-OC5-tools/lesson3.html

This lesson is the third of the OpenCore Open Science Tools and Resources Modules. In this lesson, we take a deep dive into a few available tools for (computational) reproducibility. First, we define reproducibility. Then, …

What is reproducibility?

Reproducibility ** - the National Academies Report 2019 **defined reproducibility as:

  • Reproducibility means computational reproducibility—obtaining consistent computational results using the same input data, computational steps, methods, code, and conditions of analysis
  • Replicability means obtaining consistent results across studies aimed at answering the same scientific question, each of which has obtained its own data.

In practice, reproducibility is taken further by an additional step. The goal of reproducibility is not only reproducing the same result given by using the same steps, such as re-executing a notebook in a containerized environment, but also allowing a given user to copy the environment and build upon the new technology and result by editing the environment to apply to a similar problem (e.g., a shareable, copyable executable paper). This small additional step gives others the ability to directly build upon previous work and get more science out of the same amount of funding.

Computational notebooks

Jupyter

R Markdown

Quarto

Note:

As you might have noticed, a lot of Open science tools require intermediate to advanced skills in data and information literacy and coding, especially if handling coding - intensive research projects. One of the best ways to learn these skills is through engaging with the respective communities, which often provide training and mentoring.

Self Assessment Questions: Reproducibility

Scenario 1: You stumble upon a research paper published a few years ago which used LANDSAT data and techniques similar to a project idea you want to apply for another area of interest. When you read the methods section of the paper, you find they published their derived data set in an international data repository (Dryad), but their algorithm code to generate the processed data from LANDSAT Real-Time (raw) data are not provided, only the description of the technique which they used is given in their Methods section and the mathematical equations for calculating their new index are in the Supplementary Materials.

Question S1-1: From the hypothetical Scenario above, when there is access to the raw data, results data, and some written methods are provided, does the research paper meet the definition of being “reproducible”?

Answer S1-1: No, the paper fails to provide a necessary level of detail to allow a different team, with a different experimental setup to obtain the same results exactly. The paper may support some aspects of “Replicability”, but only if someone is able to write their own code using the provided methods. With the same raw data product you could test your code and compare your results data to their results data. This would not be easy and is prohibitive.

Lesson 4: Practicing Open Science in a team

This lesson is focused on how you can practice Open Science in a team. First, we go through team open science practices, where we empower you to develop and use open science practices for your lab or research team. Second, we provide a resources and team guidelines checklist, where we help you ensure your team is working openly across all members and has access to common resources and guidelines that support collaboration, transparency, and openness. Third, we provide a preservation checklist for the research outputs and results generated by your team, to help you ensure that all outputs (e.g., for a mid-term report or project completion) are fully documented, preserved for the long-term, and made accessible to your team.

Team Open Science Practices

Develop and use Open Science practices for your lab or research team. Use this checklist to improve your team’s data and software management practices supporting Open Science. Codify them in your team’s Code of Conduct.

Note: The checklist is generalized and will need to be adjusted based on your institution, lab, research team, and/or funder requirements.

Build Team and Align Trust, Expectations, and Conduct

  • **Co-build the team composition. **It is not just skill sets and needed disciplinary expertise, but the attributes and qualities of members that can make a team successful, such as the proportion of women, bridge-builders, record-keepers, and leaders.
  • Give the team time to converge and align on an agreed goal and periodically revisit that as things may change (adaptive management [need a link to this])
  • Ensure team members do not discriminate against others in the course of their work
  • Ensure team members comply with the team practices and guidelines for conducting research, managing digital objects (e.g., data, software), authorship and publications, preservation of digital objects, and communication.
  • Ensure team members adhere to the appropriate community, national, and international standards for reporting the results of their scientific activities including respecting the intellectual property rights of others consistent with the European Code of Conduct for Research Integrity (2011) downloadable from: http://www.esf.org/coordinating-research/mo-fora/research-integrity.html.

Work As Collaboratively, Transparently, and Openly as Possible

  • **Work collaboratively: Make an initial priority to establish trust and good communication **between the team members and clear roles and responsibilities.
    • Establish a common purpose with the leadership and members of the team.
    • Co-design and co-own the project goals.
    • Establish a realistic understanding of the progress to be made and estimated timelines.
    • Create bridges between members from different disciplines.
  • Work transparently (as possible): Share status, information, digital objects using the common project resources
    • Team meeting notes, progress updates, presentations, recordings, shared folders, data/software.
  • Work openly (as possible): Provide a way for all team members to participate and be included in the various aspects of the project work
    • Openness builds on transparency by providing the understanding needed to use and contribute to the work of another team member. This is an excellent way to support early career researchers and members from other disciplines in the objectives of the project.
    • Teams that are working openly have access to all the project research products, the training and support to understand and use the research products, and an expectation to contribute based on their roles and project protocols.

Establish Team Tasks and Responsibilities

Team tasks emerge from shared common goals, and the pathway to achieving them. Each project might require different tasks and team members should work together to define their responsibility.

  • Ensure digital output management tasks have responsible team members
    • Develop the Data and Digital Output Management Plan (e.g, DMP or DDOMP)
    • Communicate tasks and responsibilities
    • Management of data and/or software
    • Quality check of the data and/or software
    • Management of archives and preservation for the project (and long-term preservation)
  • Review tasks and assignments periodically. Especially when:
    • Improvements need to be made.
    • Team members change
    • To ensure there is a backup person - no single point of failure

Review Ethical Concerns

Consider what ethical concerns apply based on the nature of the research and the data. Ensure use of Institutional Review Board (IRB) or your local ethical committee. Areas to consider:

Establish Team Communications

Establish shared communication practices that facilitate the creation of continuities within a group/team

  • Establish a regular set of contact points and times for meetings and discussions. For example, recurring meetings for leadership and work package tasks.
  • Use password-protected modes of file sharing and note taking, such as Google Drive.
  • If the group is multilingual, conduct meetings using both discussion and text to ease translation efforts.
  • Allow sufficient time for continuities to develop. Good team approaches take time to build, and may need refreshment as new members join and others leave.
  • Ensure the team has ample time to develop personal relationships, preferably in-person, to establish team cohesion, trust, and long-term collaboration. For example, projects that last more than one year, conduct a yearly in-person workshop. For international teams, these workshops should alternate locations between countries.

Resources and Team Guidelines Checklist

Ensure your team has access to common resources and guidelines that support collaboration, transparency, and openness. [Ensure the team is working openly across all members.]

Establish Common Team Resources

**☐  Before or near the start of the project, make decisions on what resources the team will use to:**


    **☐   Communicate and disseminate information. **e.g., Slack channel, email**  **


    **☐   Develop and manage documents during the project.** e.g., Google Drive   


    **☐   Store datasets during the project, considering size and access/controls. **e.g., [Open Science Framework](https://osf.io) (OSF), [GitHub.com](https://github.com), institutional repository


    **☐   Preserve datasets, images, and associated digital objects (except for software, workflow and training/workshop materials). **e.g., FAIR-aligned repository


    **☐   Develop software, scripts, and/or workflows. **e.g., GitHub: establish a team repository


    **☐   Preserve software, scripts, and/or workflows. **e.g., Zenodo: establish a community    


    **☐   Preserve conference, training or workshop materials. **e.g., Zenodo: establish a community 


**☐  Develop digital object management tracking tools (such as a spreadsheet, or database) for datasets, software, conference presentations, posters, preprints, and publications. **e.g., Sheets in Google Drive.  


**☐  Once determined provide each team member with a “summary list” of the team resources. **Ensure each team member has access and provided with any needed overview/training. See [PARSEC example](https://doi.org/10.5281/zenodo.4909852), section “PARSEC Team Resources”.

Use Reminders and Milestones to Manage and Track Data and Digital Objects

**☐  Once for each team member: Automatically connect your peer-reviewed papers and registered digital research objects to the digital research ecosystem. **


    **☐   Activate the automatic updates of your ORCID profile. Your ORCID ID identifies you uniquely and provides a hub to connect your scholarly work in one place.  To complete the actions necessary, **see[ this page](https://support.orcid.org/hc/en-us/articles/360006896394-Auto-updates-time-saving-and-trust-building)** **for instructions for both **Crossref** (English language scholarly publications) and **DataCite** (primarily datasets and software, as well as other objects). For more information on establishing your ORCID ID and profile, review the Ethos Module. 


**☐   Twice a month for team members: Ensure datasets and software are tracked. This supports efficiency especially when working with many digital objects.**


    **☐   Review the datasets and other digital material you are exploring. **If you find them to be relevant, track them in the team resource defined above [add bookmark/item id]. Include descriptive information. 


    **☐   Store datasets created by the team in the team resource defined above [add bookmark/item id] and tracked along with other datasets/digital objects you are exploring.**


    **☐   Develop software in the team resource defined above. [add bookmark/item id] **Ensure good version control. 


    Slides/Video - clarify primary/secondary dataset definitions (1B checklist -- include the link)


**☐   Monthly for team members: Ensure all materials and presentations are preserved**


    **☐   Include** **posters, oral presentations, training, workshops, and any other disseminated materials. **Provide information on the event such as the name of the conference and session, dates, website links, funder acknowledgement. Track this in the defined team resource. **[add bookmark/item id]**


**☐   Every three months for each team member (individual action): Ensure your digital profile reflects your current work. **


    **☐   Review your ORCID profile, and any other online profile (e.g., LinkedIn, Scopus, Researcher ID) and ensure that it is current and complete. Link all profiles to your ORCID account.  [link to Digital Presence.] **


    **☐   Ensure your CV is current, available digitally, and linked to your ORCID account**

Improve Practices Through Use and Feedback

**☐  Establish periodic team meetings to review effectiveness of resources and team guidelines. **Review with the team their experiences and challenges using the resources and guidelines.  Adjust as necessary working towards better support of Open Science objectives for the team. 

Team Results Preservation Checklist

Ensuring all research/project team outputs (for a mid-term report or project completion) are fully documented, preserved for the long-term, and made accessible to the team. “As open as possible, as closed as necessary.”

Plan to Preserve and Share for the Long-term

  • **Determine what needs to be preserved. Research **project components should include: project description, README files, datasets, software, physical samples, posters, oral presentations, workshop reports, training materials, and any other digital materials.
  • Determine which components should remain open just to the team, and which should be made openly accessible to others.
    • Reference your data management plan for what is required for the project, or the lab.
    • Reference your community best practices.
    • Reference country, funder, publisher, and institutional requirements for further consideration.
    • Comply with the licenses (e.g. data created by others).
    • Comply with any data request agreements (e.g., sensitive data).
    • For data created for the project, or derived data products, ensure that the full set of data are preserved. Note that most publishers will only require the data that supports the publication to be available in a trusted repository. The full set of data can be cited with description in the availability statement as to which data were used. This approach allows all the data to be preserved together and improves interoperability and reuse. [link spiral 3]
  • Determine where to preserve the research/project outputs. Consult the team’s Resources Summary Checklist that was created in 2B [add link to the bullet point] (see PARSEC example, section “PARSEC Team Resources”). If your team has not yet determined a preservation repository for the project components see “Resources and Lab/Team Guidelines Checklist” [add link and item number]. Ensure all the links and persistent identifiers are included in the project registry [bookmark to below].
    • Reminder: ensure the repository selected has the necessary protections (access/controls) for the project components.
    • Ensure the repository selected is community-accepted and trusted. [Link to repository selection document.]

Preserve the Research/Project Components as open and FAIR as possible

[Link to the FAIR principles]

**☐ Datasets: ** Your data may require cleaning, reorganization, or documentation to make it understandable. If there is a version that you routinely use for sharing within your group, this is likely to be the version you will archive. It is important that a data file can be read by a computer program without error, i.e., that it does not require human interpretation or proprietary software. Reference for information [add link].

☐ Software, code, scripts, algorithms: Your software may require documentation and reorganization to make it understandable. Ask a teammate to review it for understandability and future use. It is important that you document any relevant configuration information for using your software.** **Reference for information [add link].

**☐ Images and associated digital objects: Consult the team’s Resources Summary Checklist for the preservation locations. Review repository guidance for depositing these objects to ensure they are well-documented and in the best possible format for preservation. **

☐ Conference, training, workshop reports and materials: Consult the team’s Resources Summary Checklist for the preservation locations. Review repository guidance for depositing these objects to ensure they are well-documented and in the best possible format for preservation**. **

Manage a Project Registry (or Directory) for the Outputs

It is common for different types of outputs to be preserved in different places to optimize discovery and reuse. An up-to-date Project Registry provides a quick overview of all the outputs.

**☐ Create and update a Project Registry **in conjunction with preserving outputs as described above in the form of a spreadsheet, or other type of list. This can be one registry for the entire project that is updated, or a new registry for each milestone.

**☐ **Include in each registry entry a description of the object, preferred citation, and the persistent identifier (e.g., DOI), and any other useful information supporting the project. For outputs that do not have a persistent identifier, provide a URL and description.

**☐ **Preserve the Project Registry as a project component. Many funders require in their yearly reports a list of both peer-reviewed publications and all project outputs. The Project Registry can be provided to the funder during the reporting process, or used as a tracking tool to assist with completing the report.

Lesson 5: **Open Science communities. **(Where to find (sustainable) support and help for identifying and using OS tools and resources?)

This lesson is the fifth of OpenCore Module 5: Open Science Tools and Resources. It provides a curated list of communities supporting the dissemination of open principles and practices in research and beyond. The lesson complements the previous modules of the OpenCore Course by providing supportive environments fostering the gradual integration of the Open Science concepts explained in them.

The transition to open science requires a profound cultural change in academia and research, and communities are at the heart of a comprehensive change strategy. Hence, it is often extremely helpful to gain the support and help of communities and initiatives that help implement, contextualize, and sustain your open science work. These communities and initiatives can often turn out to be reservoirs of knowledge that could help sustain your open science project in the long run. In this lesson, you will be introduced to a number of communities that you could participate in and engage with to enhance your open science project experience.

Fostering a culture of open scholarship practices through communities can bring unique benefits to learners, practitioners, and trainers. Even if different communities have different missions and scope, all are working towards integrating open scholarship principles into research and education and positively contributing to the advancement of research transparency, reproducibility, rigor, and ethics.

Why engage with Open Science Communities?

  • Communities offer a low-entry point into improved research and pedagogical practices. As pedagogical communities welcome scholars from all levels, including early career researchers, they are an accessible space for all wishing to learn and practice open scholarship. By cutting across career stages, these communities, then, become essential to instilling the new and improved values and norms of open scholarship.
  • Communities facilitate the co-creation of open scholarship training materials which are crucial in facilitating the integration of open scholarship into research projects.
  • Communities also offer a much-needed environment wherein scholars share individual experiences, identify common hurdles, and iteratively enhance their knowledge and advance addressing the unique challenges ensuing from members’ needs.
  • Through peer-to-peer exchanges, communities help create a culture of open scholarship, benefiting those within the community, and those that interact with it.

What is a community of practice?

📖Communities of practice📖 are social learning spaces, where individuals come together to learn a new skill, exchange knowledge and experiences, gain new skills, and then apply what they've learned in the contexts of their day-to-day work out of the community (ref)

Well-designed and managed communities of practice can support behavioral changes in individuals by connecting them and providing a safe environment where members can exchange ideas and best practices. They can also empower members with the freedom to set and accomplish goals that they are unable to attain on their own. (ref)

Communities of practice list

List of communities sorted by country: https://docs.google.com/spreadsheets/d/1cge1elDTEnTSxV_sa-HKmaIrHKV8rQMhkq_XhA9fjsQ/edit#gid=1101212462

Old list:

Repository where the map is created (WIP) https://github.com/Open-Science-Community-Saudi-Arabia/CoP

Table will have Global CoPs whereas I added some of the local CoPs to a map here:

https://rpubs.com/batool/cops

Software Communities Data Communities Gender Inclusive Communities Research-based Communities Pedagogical & Education Communities Community of communities
PyData OpenAIRE R-Ladies UKRN (and other national networks) FORRT CSCCE
SPEC SPDF PyLadies PSA ReproducibiliTea OLS
rOpenSci CCMC Julia Gender Inclusive Community SIPS ProjectTIER Reproducibility Networks
pyOpenSci RDA Women of Color Code CREP SIOS Deep Learning Indaba (collective African ML community)
PyHC Women who code OpenMOOC CREP Deep Learning IndabaX chapters - different countries in Africa
Research Software Engineering IGDORE NowhereLab Swedish Youth Astronomical Society - official emails can be sent to: kansli@astronomiskungdom.se
NumFOCUS Centre for HelioAnalytics RIOT
FORRT ReplicationWiki
z Open Education Group
Masakhane - A grassroots NLP community for Africa, by Africans Open Education Network
SisonkeBiotik - Lowering barriers in participatory research for machine learning and health across Africa NASA HEAT
Bioinformatics Hub of Kenya Initiative The Carpentries
ABRIR
Open Hardware Community
Swedish Youth Astronomical Society - official emails can be sent to: kansli@astronomiskungdom.se

How to engage with Open Science communities

There are various ways through which you can start engaging with a community. Usually the websites of most of the communities provide information on where a new member can join a community platform and get involved. If there is a newsletter available, you can subscribe to it to know more about the activities taking place within the community. Communities may also have a presence on platforms like Twitter, Facebook, and LinkedIn where they might make announcements about their upcoming initiatives. Community co-working platforms are excellent places to get to know more and interact with current members. Some of the communities also provide onboarding calls that provide a chance of joining the community in a more formal way.

For individuals who prefer written interactions and discussions, GitHub discussions, Discourse, StackOverFlow, and Slack spaces could be excellent to start with. Such written platforms tend to have lots of past knowledge and interactions available that give newcomers an idea of the discussions that take place within a particular community. While all these are excellent places to start with and ask questions, one should be mindful of the fact that most communities are volunteer-run and located across various time zones, hence sometimes it might take longer than usual to receive a response. Always try to be kind, patient, and appreciative.

Pathways for contribution

It is no surprise that newcomers in a community often go on to become future contributors if they find the right pathway. These pathways are explained using personas in the Contributor Pathways subchapter of The Turing Way book. This subchapter defines the different phases of community membership, as below:

  1. Discovery - How an individual first hears about the project or group or community
  2. First Contact - How they first engage with the project or group or community, their initial interaction.
  3. Participation - How they first participate or contribute.
  4. Sustained Participation - How their contribution or involvement can continue.
  5. Networked Participation - How they may network within the community.
  6. Leadership - How they may take on some additional responsibility on the project, or begin to lead.

Top Tip: Many communities and open source projects participate in Google Summer of Code and Outreachy. There are many contributors who had their first contact with open science through Google Summer of Code and Outreachy, and then developed into core contributors with leadership positions.

>>>>> gd2md-html alert: inline image link here (to images/image5.png). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>>

alt_text

Pathways for collaboration

There are different ways to collaborate with a community or open project. Contributions spread over many pathways which include sharing resources, reviewing and updating other contributions, fixing typos, improving documentations, mentoring other contributions, or helping in localisating the project and the resources within the project to different languages to support and satisfy the needs of multiple locales**. Many of the communities have a low-entry point and don't require expertise in open science or its digital tools. **

The image below shows some pathways of collaboration in the Turing Way, which is an open-source, community-led guide to reproducible, ethical and inclusive data science. Other communities of practices have similar pathways that allow you to interact with their community members without little know-how in open science.

>>>>> gd2md-html alert: inline image link here (to images/image6.png). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>>

alt_text

Pathways for engagement

Communities of Practice are designed to offer plural and creative ways to engage with its members. Perhaps the easiest way a member can interact with the community is to introduce yourself on the community’s platform. Another low-stake engagement is to share a relevant resource with the community in appropriate channels. Asking a question, or raising a point of discussion on the community platform, is not only welcomed but potentially instructive and beneficial to other members and the community.

Frequently, communities provide opportunities to give feedback—positive or negative, anonymous or not—which can be very useful to community managers and organizers. Communities of Practice often hold regular meetings—and some also hold seminars featuring pertinent content—and attending these meetings is another form to engage with OSCs.

Some communities offer ways for members to submit resources they know to a database so that others can find it, enriching the community. Reading and learning from a Community of Practice’s own resources and approach is certainly one of the best ways to engage with it. Members can also engage with the community of practice by spreading the word or taking part in ambassadorship programs, which aim to (briefly) train members on the main issues a community is trying to tackle or improve.

As open communities tend to produce resources themselves, and most do so in one language (at least at first), translation efforts are fairly commonplace. These are extremely advantageous to those who would otherwise be disenfranchised and help foster an inclusive and accessible community atmosphere. A mutually beneficial pathway is to contribute to a community’s existing projects and resources which often require constant review and update of its substantive content.

Folks with technical skills can volunteer their expert skills to maintain and improve the community's internal documentation, resources, modus operandi, databases, code of conduct, and website. Some communities offer mentored contributions on a community-supervised project - for example, in the context of STEM & Data Science - while others offer different types of mentorships such as helping with the supervision of Bachelor/Undergraduate or Master/Graduate theses.

Research- and education-oriented communities of practice often tackle projects collaboratively where members can take part in the process of science-making. Members can join these projects and contribute to them, and be acknowledged for their efforts and work. Some communities extend further on this open-collaboration ethos to allow its members to propose new ideas for research and educational projects.

Case study: FORRT

FORRT stands for the Framework for Open and Reproducible Research Training. It is an interdisciplinary Community of Practice of almost 500 early-career scholars aiming to integrate open scholarship principles into higher education and to advance research transparency, reproducibility, rigor, and ethics through pedagogical reform and meta-scientific research.
Anyone interested in engaging with FORRT can visit its website (forrt.org) and find an explanation of the initiative’s mission, its projects, its open educational resources, and its publications.

Interested individuals can find ways to get involved in several places, with specific attention to FORRT’s Slack community. Once a member enters the community, they are given access to three main channels: #-welcome-and-introductions, where anyone can introduce themselves and be welcomed by our community members; #-general-interest-posts-and-announcements, where anyone can share resources, links, and projects, ask a question, publicize other relevant communities of practice, start discussions, etc.; and #-forrt-community, where organizers post about onboarding, projects, people, etc.

After joining the Slack, a bot sends a DM to users with onboarding information and the ‘Getting Started with FORRT’ document, containing important links, a code of conduct, a description of FORRT’s collaborative projects (their teams, leads, and Slack channels), how FORRT is structured organizationally, and a description of FORRT’s contributorship model and guidelines. Folks can submit resources to a database of curated open science resources, give (anonymous) feedback, subscribe to mentorship programs, and learn how to contribute to FORRT’s research & educational projects (including inclusion, reviewing, and translations efforts— e.g., project Reversals, Glossary and Summary). Lastly, members can propose research and educational projects in the #team-ideas channel.

How to build and lead a community

_
_As individuals, we look for opportunities to apply our knowledge to address problems. The most recent example of this is how the research community has reacted to the pandemic by organizing an unexpectedly large number of hackathons, data modeling initiatives, task forces, and working groups. While joining existing communities can provide rich learning experiences, at times we might realize the need to build a new community. Such communities might come into existence when we discover a lack of a community of our interest close to our geographical region, when we meet like-minded individuals closer to our existing time zones, or when we learn how other communities are developing in their local regions.

A key aspect in building community is to design and build projects that empower others to collaborate within inclusive spaces. Openness shouldn’t be a thoughtless default, but something that is consciously designed into what you and your team are doing, while carefully thinking about the ethics and implications at every step.

Guidelines for building communities

In this section, we have assembled suggestions from the Turing Way, which are derived from the experiences of community and technical specialists to assist researchers in addressing this challenge, particularly when launching a community or a team-oriented project.

>>>>> gd2md-html alert: inline image link here (to images/image7.png). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>>

alt_text

  • Choose a Communication Platform
  • Provide a Project Summary File
  • Select a Code of Conduct
  • Provide Contribution Guidelines and Interaction Pathways
  • Create a Basic Management/Leadership Structure
  • Provide Contact Details Wherever Useful
  • Identify Failed Approaches, and Stop Them
  • Have Documentation and Dissemination Plans for Your Project

You can find more details about these guidelines within the Turing Way and contribute to refining them further.

>>>>> gd2md-html alert: inline image link here (to images/image8.png). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>>

alt_text

Mountain of engagement

  1. Leading: A high-touch relationship; we maintain relationships and co-branded events and trainings with alumni and allies to increase the impact, prestige, and reach of both parties’ work.
  2. Collaborating: A high-touch relationship; we offer professional development through our own events in return for co-creation, localization, and spread.
  3. Participating: A high-touch relationship; we offer community management and professional development through our own trainings and events in return for soliciting ideas & learning through use.
  4. Endorsing: A low-touch relationship; we share information with people who gain social capital by spreading it and networking with others who share common interests.
  5. Learning: A low-touch relationship; we gift resources like open curriculum and get back aggregate data (like downloads, registrations, and views) showing people use our resources and pay attention to us.

>>>>> gd2md-html alert: inline image link here (to images/image9.png). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>>

alt_text

Model describes four modes of member engagement that can occur within a community – CONVEY/CONSUME, CONTRIBUTE, COLLABORATE, and CO-CREATE (Need to expand).

>>>>> gd2md-html alert: inline image link here (to images/image10.png). Store image on your image server and adjust path/filename/extension if necessary.
(Back to top)(Next alert)
>>>>>

alt_text

Open Science Skills with the Communities; learning and practicing

_Skills for Open Science are vast https://libereurope.eu/article/open-science-skills-diagram/ _

_Many of them - as you saw from the Modules XYZ, and Lessons XYZ of this Module are digital, data and information skills. _

  • Data skills - The Carpentries teaches foundational coding and data science skills to researchers worldwide.
  • Conceptualization, mentoring, Community building, ethos, inclusivity - Open Life Science
  • Open Hardware - OHM
  • FORRT?
  • Scientific Collaboration and Community Engagement - CSCCE
  • TOPS (this course) community - future is now

Self-assessment: Questions for reflections

  • Idea: envision questions on reflection in relation to communities, are you a part of any community? What is the value that you take from it? What do you bring to it? Does the balance seem right? What next 3 simple steps could be done to change it, to improve?