Difference between revisions of "2019 Workshop:Software Engineering for Heliophysics"

From CedarWiki
Jump to: navigation, search
m (Format of the Workshop)
(Results)
 
(38 intermediate revisions by one user not shown)
Line 1: Line 1:
==Software Engineering for Heliophysics==
 
 
 
__notoc__
 
__notoc__
===Location, Date/Time and Duration===
 
2 hours
 
  
===Conveners===
+
'''Location''': Monday, 13:30-15:30, Zia/Eldorado
[mailto:mhirsch@bu.edu Michael Hirsch]<br>
+
[mailto:zettergm@erau.edu Matthew Zettergren]<br>
+
[mailto:guy.grubbs@nasa.gov Guy Grubbs]
+
  
===Workshop Categories===
+
'''Conveners''': [mailto:mhirsch@bu.edu Michael Hirsch], [mailto:zettergm@erau.edu Matthew Zettergren], [mailto:guy.grubbs@nasa.gov Guy Grubbs]
  
* Altitudes: all
+
===Results===
* Latitudes: global
+
Peak attendance about 35, with people in and out. Student percentage seems 20% or so. Wondered why higher student percentage didn't show up.
 +
At both this session and Python library session following, group generally seemed supportive of a third session in 2020 just for Python performance development.
  
===Format of the Workshop===
+
It's highly useful to have slides usable offline. All of this session's talks did that.
Tutorials (2 hours)
+
  
To foster growth in subsections of geospace software development, we plan to additionally / separately have smaller side-meeting workgroups (say 15 people each) on specific topics that will be linked here once confirmed, including:
+
That is, a possible 2020 CEDAR software session lineup including:
  
* Fortran / C++ / HPC users group
+
* general software engineering best practices (Git, build systems, cloud, docker, HPC)
* integrating compiled code/models (C++, Fortran) with scripted analysis languages (Python, Matlab)
+
* Python performance development (parallel, big data, PEP8, type hinting, setuptools-agnostic, Python 3 transition, Numba, Pycuda
* Gemini / GLOW users group
+
* Python libraries (this has been going for a few years)
 
+
* hackathon for Python (specific list of issues to tackle)
===Estimated attendance===
+
* data science sessions (Sunday + two like 2019?)
100
+
* participatory data science tutorial (like 2019 InGeo)
 
+
Instead of doing this topic on Sunday, which is already jam packed, we will encourage students to come as well as every other career stage
+
 
+
===Requested Specific Days===
+
non-conflicting time with:
+
* [[2019_Workshop:Python_for_Space_Science|Python for Space Science]]
+
* [[2019_Workshop:Geospace_Data_Science|Geospace_Data_Science]]
+
 
+
===Special technology requests===
+
tables so attendees can use laptops.  WiFi.
+
 
+
===Justification===
+
ST #5: Fuse the Knowledge Base across Disciplines
+
-------------------------------------------------
+
 
+
# Good software engineering practices expedite reliable, repeatable science results and encourage diverse outside collaborator participation. Repeatable, traceable science analyses are better trusted and solidify community and stakeholder confidence in published results.
+
# Software sharing / collaboration sites like GitHub have matured and are widely used by the heliophysics community. We address minor changes in practice that reduce time to science closure.
+
# Progress is readily measured by semi-automated metrics such as:
+
#* quantity and diversity (institutional, geographical, participant) of code contributions and issues opened
+
#* an increase in the use of continuous integration facilities such as Travis-CI
+
#* increased use of public data sharing such as Zenodo
+
 
+
ST #6: Manage, Mine, Manipulate Geoscience Data and Models
+
-----------------------------------------------------------
+
 
+
# Increased scientific computing efficiencies are essential for computer-aided discovery of the growing petabytes of data collected. Extracting value from the decades of diverse existing data sources can be greatly aided by more efficient software engineering practices
+
# Effective use of software toolchains can be a significant force multiplier in avoiding mistakes and repeated or manual work.
+
# Progress might be measured by mining papers for citations / keywords used such as links to software repos used, which can themselves be mined for use of continuous integration tools, build system type and specific software libraries
+
  
 
===Description===
 
===Description===
Would you like to be more effective at developing software and getting science closure with less time spent debugging and supporting users? This workshop is for you!
 
  
The intended audience is as broad as possible: students to senior career--we will discuss intermediate to advanced geospace software engineering at a level '''accessible and useful to all'''.
+
This session targets those developing software and doing analysis, modeling or data collection every day (students and early-mid career) as well as more senior scientists interested in the best trends and techniques from industry as applied to heliophysics.
 +
We will discuss intermediate to advanced geospace software engineering at a level '''accessible and useful to all'''.
  
Scope includes '''coding languages commonly used in heliophysics''', including: C++, Fortran, Matlab, Python
+
Scope includes '''coding languages commonly used in heliophysics''', including: C++, Fortran, Julia, Python, R
  
 
Use cases we address include:
 
Use cases we address include:
Line 69: Line 34:
 
* ensure code will be usable on most current computing platforms and easily adaptable to future systems
 
* ensure code will be usable on most current computing platforms and easily adaptable to future systems
  
We intend that most participants will be able to apply industry best-practices to their own work tonight, if not in the workshop itself.  
+
We intend that most participants will be able to apply industry best-practices to their own work the same day or week.
  
====Tutorial topic areas====
+
====Tutorials====
  
Please let the organizers know if you have any additions or interest in these areas.
+
We also welcome additional "lightning talks" -- reach out to Michael, Matt or Guy by email or in person!
We intend to give a tutorial in each of these areas, but probably can't cover each little bullet point.
+
 
We will distribute a simple Google user survey in May to gauge topics of highest community interest.
+
* '''Intro''': What aspects of software engineering are most important to heliophysics community (Hirsch 13:30-13:35)  [[https://cedarweb.vsp.ucar.edu/wiki/images/3/35/0-Intro.pdf]]
 +
 
 +
* '''Version Control''': working effectively in large and small teams with Git and GitHub (A. Ridley / Hirsch 13:35-13:45) [[https://cedarweb.vsp.ucar.edu/wiki/images/9/92/1-version_control.pdf]]
 +
* '''Continuous Test & Integration''': catching and tracking problems before you know they exist (A. Ridley / Hirsch  13:45-14:00) [[http://cedarweb.vsp.ucar.edu/wiki/images/5/50/Software_ridley.pdf]] [[https://cedarweb.vsp.ucar.edu/wiki/images/b/b4/2-continuous_integration.pdf]]
 +
* '''Software / Data connections''': Making heliophysics data accessible via lightweight standard: HAPI (R. Weigel, 14:00-14:15) [[https://cedarweb.vsp.ucar.edu/wiki/images/4/47/2018_HAPI_CEDAR.pdf]]
 +
* '''OpenMPI physics model''': Fortran 2018 design patterns and Python integration (Zettergren 14:15-14:30)  [[https://cedarweb.vsp.ucar.edu/wiki/images/a/a7/GEMINI_Software_Engineeringv2.pdf]]
 +
* '''Mobile / Web app development''': crowd-sourced science - Aurorasaurus (E. MacDonald 14:30-14:45)
 +
* '''Scientific software workflow''': The science software stack--going from ideation to simulation to publication (M. Young 14:45-15:05) [[http://cedarweb.vsp.ucar.edu/wiki/images/d/d4/CEDAR2019-Python_IDL.pdf]]  [[http://cedarweb.vsp.ucar.edu/wiki/images/6/67/CEDAR2019-full_stack.pdf]] [[https://cedarweb.vsp.ucar.edu/wiki/images/6/60/6-workflow.pdf]]
 +
* '''Parallel Python''': introductory Python examples (asyncio, ProcessPool, ThreadPool) (Hirsch 15:05-15:15)  [[http://cedarweb.vsp.ucar.edu/wiki/images/3/3f/98-Concurrent_Python.pdf]]
 +
* '''Room discussion''': what topics did we miss? What should we do more of? Should we have side sessions this year? (15:15-15:30)
 +
 
 +
===Other sessions of interest===
 +
 
 +
* [[2019_Workshop:Python_for_Space_Science|Python for Space Science]]  (Monday 16:00-18:00)
 +
* [[2019_Workshop:Hackathon|Hackathon]] (Monday 18:30-evening)
 +
* Integrated Geoscience Observatory (A. Bhatt) (Wednesday 19:00-20:30)
 +
* [[2019_Workshop:Geospace_Data_Science|Geospace_Data_Science]]  (Thursday morning & afternoon)
 +
 
 +
===Priority topics===
 +
 
 +
We didn't get to cover all of these, but they're important in general for heliophysics research and in general
  
 
* Modern / efficient coding practices (for any language)
 
* Modern / efficient coding practices (for any language)
Line 105: Line 90:
 
*** MSIS
 
*** MSIS
 
*** IRI
 
*** IRI
*** [https://github.com/mattzett/GEMINI/ Gemini]
+
*** [https://github.com/gemini3d/GEMINI/ Gemini]
  
=== Tutorials ===
 
  
Please let the organizers know if you have something to present.
+
===Special technology requests===
 +
tables so attendees can use laptops.  WiFi.
  
Preliminary timeline:
+
===Justification===
----
+
ST #5: Fuse the Knowledge Base across Disciplines
 +
-------------------------------------------------
 +
 
 +
# Good software engineering practices expedite reliable, repeatable science results and encourage diverse outside collaborator participation. Repeatable, traceable science analyses are better trusted and solidify community and stakeholder confidence in published results.
 +
# Software sharing / collaboration sites like GitHub have matured and are widely used by the heliophysics community. We address minor changes in practice that reduce time to science closure.
 +
# Progress is readily measured by semi-automated metrics such as:
 +
#* quantity and diversity (institutional, geographical, participant) of code contributions and issues opened
 +
#* an increase in the use of continuous integration facilities such as Travis-CI
 +
#* increased use of public data sharing such as Zenodo
 +
 
 +
ST #6: Manage, Mine, Manipulate Geoscience Data and Models
 +
-----------------------------------------------------------
 +
 
 +
# Increased scientific computing efficiencies are essential for computer-aided discovery of the growing petabytes of data collected. Extracting value from the decades of diverse existing data sources can be greatly aided by more efficient software engineering practices
 +
# Effective use of software toolchains can be a significant force multiplier in avoiding mistakes and repeated or manual work.
 +
# Progress might be measured by mining papers for citations / keywords used such as links to software repos used, which can themselves be mined for use of continuous integration tools, build system type and specific software libraries
 +
 
 +
===Workshop Categories===
 +
 
 +
* Altitudes: all
 +
* Latitudes: global
 +
 
 +
===Format of the Workshop===
 +
Tutorials (2 hours)
  
* Intro: How to do X with Y--discuss geospace software survey results (5 minutes)
+
===Estimated / Actual attendance===
* software modernization talks (25 minutes)
+
50 / 35 steady plus in/out several more
* code / data sharing (15 minutes)
+
* build and test across languages (20 minutes)
+
* asynchronous / parallel / concurrent programming (15 minutes)
+
* specific geospace examples (30 minutes)
+
* closing survey -- what should we do until next CEDAR (online collaboration, more sessions next year?) (10 minutes)
+
  
 
===Workshop Summary===
 
===Workshop Summary===
Line 130: Line 133:
 
[[Special:Upload|Upload Files Here]]
 
[[Special:Upload|Upload Files Here]]
  
* Add links to your presentations here, including agendas, that are uploaded above.  Please add bullets to separate talks.  See further information on [[Help:Uploading_Files|how to upload a file and link to it]].
 
  
 
[[Category:2019 Workshop|Software Engineering for Heliophysics]]
 
[[Category:2019 Workshop|Software Engineering for Heliophysics]]

Latest revision as of 07:36, 18 June 2019


Location: Monday, 13:30-15:30, Zia/Eldorado

Conveners: Michael Hirsch, Matthew Zettergren, Guy Grubbs

Results

Peak attendance about 35, with people in and out. Student percentage seems 20% or so. Wondered why higher student percentage didn't show up. At both this session and Python library session following, group generally seemed supportive of a third session in 2020 just for Python performance development.

It's highly useful to have slides usable offline. All of this session's talks did that.

That is, a possible 2020 CEDAR software session lineup including:

  • general software engineering best practices (Git, build systems, cloud, docker, HPC)
  • Python performance development (parallel, big data, PEP8, type hinting, setuptools-agnostic, Python 3 transition, Numba, Pycuda
  • Python libraries (this has been going for a few years)
  • hackathon for Python (specific list of issues to tackle)
  • data science sessions (Sunday + two like 2019?)
  • participatory data science tutorial (like 2019 InGeo)

Description

This session targets those developing software and doing analysis, modeling or data collection every day (students and early-mid career) as well as more senior scientists interested in the best trends and techniques from industry as applied to heliophysics. We will discuss intermediate to advanced geospace software engineering at a level accessible and useful to all.

Scope includes coding languages commonly used in heliophysics, including: C++, Fortran, Julia, Python, R

Use cases we address include:

  • scripting languages to analyze large data sets quickly
  • model developers reduce the time spent tutoring new users in building / modifying / using the model
  • reduce debugging effort by adding automated self-tests
  • ensure code will be usable on most current computing platforms and easily adaptable to future systems

We intend that most participants will be able to apply industry best-practices to their own work the same day or week.

Tutorials

We also welcome additional "lightning talks" -- reach out to Michael, Matt or Guy by email or in person!

  • Intro: What aspects of software engineering are most important to heliophysics community (Hirsch 13:30-13:35) [[1]]
  • Version Control: working effectively in large and small teams with Git and GitHub (A. Ridley / Hirsch 13:35-13:45) [[2]]
  • Continuous Test & Integration: catching and tracking problems before you know they exist (A. Ridley / Hirsch 13:45-14:00) [[3]] [[4]]
  • Software / Data connections: Making heliophysics data accessible via lightweight standard: HAPI (R. Weigel, 14:00-14:15) [[5]]
  • OpenMPI physics model: Fortran 2018 design patterns and Python integration (Zettergren 14:15-14:30) [[6]]
  • Mobile / Web app development: crowd-sourced science - Aurorasaurus (E. MacDonald 14:30-14:45)
  • Scientific software workflow: The science software stack--going from ideation to simulation to publication (M. Young 14:45-15:05) [[7]] [[8]] [[9]]
  • Parallel Python: introductory Python examples (asyncio, ProcessPool, ThreadPool) (Hirsch 15:05-15:15) [[10]]
  • Room discussion: what topics did we miss? What should we do more of? Should we have side sessions this year? (15:15-15:30)

Other sessions of interest

Priority topics

We didn't get to cover all of these, but they're important in general for heliophysics research and in general

  • Modern / efficient coding practices (for any language)
    • What should be object-oriented vs. functionalized
    • use of linters / type checkers (Python: flake8, mypy; C++: clang-tidy)
    • deciding what language(s) are best for a project and the development team
  • Language transitions / interfaces
    • proprietary software ↔ open world (e.g. IDL → GDL → Python)
    • using open software in a closed / proprietary environment
    • Python for Matlab / IDL users
  • Distributing code more easily via:
    • code sharing sites (GitHub)
    • Build systems (Meson)
    • Package managers (Julia, Python)
    • proprietary (IDL) or less common languages
  • Version control (Git):
    • sharing and developing code effectively across diverse teams
    • sharing versioned big data files as part of a program/library
  • Build systems (CMake, Meson, Pip)
    • make it easy for users to get prereqs and build your code on any computer
    • Deploying a complex application anywhere, from Raspberry Pi to Windows/Mac laptop to CentOS HPC
  • Continuous test / integration (Travis-CI)
    • automatically run tests "in the cloud" on Linux, Mac, Windows for each code change
    • examples with Meson and CMake + compiled languages (C++, Fortran)
    • Seamless integration and testing of scripted languages (Matlab, Python) with compiled (C++, Fortran)
    • how to create tests in your preferred language
  • Asynchronous architectures: parallel and concurrent processing
    • Examples in Python and Fortran:


Special technology requests

tables so attendees can use laptops. WiFi.

Justification

ST #5: Fuse the Knowledge Base across Disciplines


  1. Good software engineering practices expedite reliable, repeatable science results and encourage diverse outside collaborator participation. Repeatable, traceable science analyses are better trusted and solidify community and stakeholder confidence in published results.
  2. Software sharing / collaboration sites like GitHub have matured and are widely used by the heliophysics community. We address minor changes in practice that reduce time to science closure.
  3. Progress is readily measured by semi-automated metrics such as:
    • quantity and diversity (institutional, geographical, participant) of code contributions and issues opened
    • an increase in the use of continuous integration facilities such as Travis-CI
    • increased use of public data sharing such as Zenodo

ST #6: Manage, Mine, Manipulate Geoscience Data and Models


  1. Increased scientific computing efficiencies are essential for computer-aided discovery of the growing petabytes of data collected. Extracting value from the decades of diverse existing data sources can be greatly aided by more efficient software engineering practices
  2. Effective use of software toolchains can be a significant force multiplier in avoiding mistakes and repeated or manual work.
  3. Progress might be measured by mining papers for citations / keywords used such as links to software repos used, which can themselves be mined for use of continuous integration tools, build system type and specific software libraries

Workshop Categories

  • Altitudes: all
  • Latitudes: global

Format of the Workshop

Tutorials (2 hours)

Estimated / Actual attendance

50 / 35 steady plus in/out several more

Workshop Summary

This is where the final summary workshop report will be.

Presentation Resources

Upload presentation and link to it here. We will also try to archive talks in Zenodo.

Upload Files Here