2019 Workshop:Python for Space Science
Snakes on a Spaceship and the Goblet of Plots
Location, Date/Time and Duration
Altitudes: IT - Latitudes: global - Other:
Format of the Workshop
Requested Specific Days
Monday, first afternoon session. There are independent plans to host a Hackathon starting late Monday afternoon. Covering existing Python packages beforehand would provide attendees a nice foundation to start contributing to open source science. Please avoid conflict with software development session (Hirsch/Zettergren/Grubbs)
Special technology requests
Internet access, tables for attendees to use (not just chairs)
CEDAR justification: Strategic thrust #5: Fuse the Knowledge Base across Disciplines Strategic thrust #6: Manage, Mine, and Manipulate Geoscience Data and Models
1) How the questions will be addressed: The challenge of performing system science across and within disciplines is addressed by teaching the community about the existence and use of enabling open source science software. 2) What resources exist, are planned, or are needed: Science python software already exists that helps the community achieve these goals, pysat, davitpy, spacepy, madrigal, etc. 3) How progress should be measured: Participation rates in open source science python software. Publications that use community tools can also be tracked.
'Snakes on a Spaceship' is focused on introducing the Python language, associated tools, and science software packages developed for the CEDAR and GEM community. This year, we have a focus on presenting new and/or interesting plots from across the field.
The pursuit of system science requires integrating measurements from multiple platforms into a coherent system for analysis. The variety of instrument types and data formats makes this a challenge. Typically these challenges are solved separately by different research teams leading to duplicated efforts. The study of the magnetosphere and the ionosphere as a system would be enhanced if solutions to these problems were made broadly available to the community. The use of community developed software has found acceptance in astronomy (astropy) and solar science (sunpy). ‘Snakes on a Spaceship’ is dedicated to fostering the same collaborative and open development practices within CEDAR and GEM.
Please bring your computer, since there will be several tutorials that you will be able to work through with the speaker.
0) Introduction (5 min)
1) Ashton Reimer -- Resen: Towards a Reproducible Software Environment using Python and Docker (15 min)
2) Chih-Ting Hsu -- Comparison of TIE-GCM and C/NOFS ion velocity using PysatMagVect (15 min)
3) Marina Schmidt -- pydarn (15 min)
4) Leslie Lamarche -- Visualization of Diverse Geospace Data in Python (15 min)
5) Asher Pembroke -- Kamodo Analysis Suite (15 min)
6) Discussion (40 min)
Resen: Towards a Reproducible Software Environment using Python and Docker
InGeO: An EarthCube project supported by the NSF Cyberinfrastructure for sustained and scientific innovation with two goals: provide tools to facilitate collaborations and reproducibility and to help educate the geospace community on best practices. Resen provides a portable environment for reproducing analysis. Installation is hard, accessing data can be difficult, and even more. Resen uses a cross platform containerized environment with software pre-installed. This allows work on linux to be reproduced as easily in windows or mac. You can save a completed analysis and share it with colleagues.
There are other tools that solve the installation and reproducibility problems: e.g., anaconda and scientific linux operating systems. Nothing does everything we want it to, though. To solve this, Resen was developed.
Resen is a command line tool for creating, importing and exporting software environments. It provides a simplified and abstracted interface. A beta version is available. It has numpy, matplotlib, scipy, pandas, and more installed. It also has community tools such as apexpy, davitpy, madrigalweb, spacepy, etc. To use this you need to:
1) Install docker
2) use pip to install
3) Type resen from the command line
Within Resen you:
1) create a bucket. This creates a jupyter interface with notebooks or command line. It saves your work as you go as long as you have the bucket open.
2) export and import bucket commands are being developed
Other packages can also be installed within a bucket. This can effectively encapsulate the work for a paper, making an environment accessible even after those versions of the code are no longer supported.
On Wednesday at 19:00 at 109 N Guadalupe St there will be a guided tutorial. Feedback, questions, suggestions can be made at firstname.lastname@example.org.
Comparison of TIE-GCM and C/NOFS ion velocity using PysatMagVect
The motivation was to improve the understanding of the variability of the dayside, low-latitude, and global-scale ionospheric electrodynamics. COSMIC and C/NOFS are assimilated into TIE-GCM by assimilative mapping, using ensemble methods (DART, the Data Assimilation Research Testbed).
Complications: TIE-GCM covers 400-700 km depending on solar conditions, while C/NOFS measures between 400-860 km. PysatMagVect is used to map the C/NOFS observations along the magnetic field lines down to 120 km. This location is then converted into geographic coordinates and assimilated using the ensemble adjustment Kalman filter module in DART.
1) compute scalars for mapping
2) compute the direction of geomagnetic field lines
3) Determine the velocity at the desired location
The vertical drift matches the best, but work still needs to be done with the meridional and zonal drift.
SuperDARN looks at both hemispheres with a network of High Frequency Radars. We can do a lot with SuperDARN data. There are four data products provided to users. Most scientists use the FitACF files, which allow you to examine phenomena like TIDs, ULF waves, and more. These are typically examined through RTI plots, but fan plots are useful for seeing the spatial extent. SuperDARN also provides map files, which allow you to investigate things like sub-storms as you examine the spatial extent and characteristics of plasma convection.
Previously a lot of data visualization was done with davitpy, but it was difficult to install. Last year, we decided to have a fresh start with pydarn. Marina began development a few months ago. The goal is to institute best practices and limit the scope to keep the package maintainable. Additionally will include testing for unit tests, integration tests, benchmarking performance, and coverage.
Flexibility and extendability are also important. One should be able to easily add and remove features and build off of other packages. Documentation is all important, even for simple code.
pyDARN will provide a data visualzation library for SuperDARN. SuperDARN files are not easy to read in, because we have a custom data binary format. Reading is done through the DarnRead class. This class reads fitacf, rawacf, map, grid, et cetera through their own functions. These are clearly named (i.e., .read_fitacf()).
Plotting for range-time plots is currently supported. through pydarn.RTP (range-time-parameter). It only plots the data and lets you declare the figure, axis, and formats outside of the plotting routine.
For data visualization, you can make your colormaps more colorblind friendly by having yellow at one end of the scale. Almost any other color (but blue) can go on the other end of the scale.
The current timeline has the development of fan plots and convection plots this year. Then the IO will be removed to a separate library to avoid scope creep. Finally, a SuperDARN FAC data product will be added to RST and then a visualization product will be created.
Visualization of Diverse Geospace Data in Python
- Mangopy* is a python package for using the data from the Midlatitude All-sky-imager Network for Geophysical Observations. It fetched data from the ftp server, reads the composite hdf5 files and retrieves data arrays. It is python 2/3 compatible and available for download.
A mosaic can consider all or only certain sites and provide a composite image of all the data. A single image can be produced as well. In the future, movie creation will be supported, keograms will be included, difference images will be availalble, and performance will be improved.
- MIVIT*: Multi-Instrument Visualization Toolkit. The goal of MIVIT is to make it easier to compare diverse geospatial data sets. Comparing linear plots is easy enough, but it’s not easy for data with a spatial extent.
You provide a data set and a plot method to create a data visualization object. You can add as many of these as you want. Then, you can include all of these in the same figure.
The DataSet object is consuming to write for many different instruments. Contributions to this are very welcome.
This is a very new project that is being actively developed. Improvements include: fixing labeling and color bar options, expanding the available helper scripts, incorporating common models, introduce 3D visualization options, add data gridding and interpolation, and optimize the performance.
Kamodo Analysis Suite
SSMS enables, supports, and performs research for the space weather community. It gathers data and models from the solar, heliospheric, magnetospheric, thermosphere, and ionosphere communities. The endeavor to support a wide variety of users, models, and data sources means that a new tool deeded to be developed to:
- quickly integrate new models and data - provide API support for scientists and educators who don’t code - model agnostic API - format agnostic API - Transparent, permissive metadata - automatic unit conversion - support coordinate transformations - compatible with helio-python ecosystem - provide instant visualization
Kamodo architecture is designed to serve modelers (c/c++/fortran), data scientists (python), and physicists (LaTeX). It uses pysat, spacepy, sunpy, plasmapy, and sympy to deal with each of these things as it needs to.
Scientists work with models and data through Kamodo objects, which map symbols to interpolating functions or mathematical expressions. It converts each expression to a highly optimized python function capable of operating on large arrays.
Any python function can be decorated with the @kamodofy decorator, adding metadata to a function.
A visualization API allows users to make plots without doing any real programming.
A model or data source is considered ‘kamodofied’ when all scientifically relevant variables are exposed as Kamodo objects. This requires:
1) It must be accessible from python
2) It must provide an interpolating function for each variable (models)
3) Interpolating functionns should supply default values as arguements, indicating the valid domain for their inputs
4) Variable names should follow Kamodos standards
5) Several more...
NASA software release process is underway to make this open source.
Jeff K.: We need a tool to identify conjunctions between satellites and ground-based facilities.
Marina S.: Would the community appreciate a best practices guide?
Angeline B.: Yes
Liam K.: Yes, but educating the community (primarily students, because there are people who refuse to be educated) by provided first steps would be very useful. There’s a tool called ‘good enough practices for scientific computing’ that would be useful. A software carpentry session would be useful, but only if students would attend.
Marina S.: There was a lot of talk at the student day about many things to do with code, but not how to start doing these things through code. Another point, is that if there is a python session next year or during the student tutorial it would be useful to have, basically, an interactive class.
Liam/Angeline/Marina: Having a tutorial session (in the student day) would be the best place to teach best practices.
Liam K.: Heliopython.org has a list of python packages that exist. They are also trying to combine everything in heliophysics into a common package like astropy (a lofty goal).
XA: There is an ISR school, could we put together a remote coding school?
Marina S: Putting together a youtube tutorial or doing a twitch or webcast could work.
XA: Even a tutorial of various tools in the community would be useful.
Russell S: Going to a scipy conference is a great way to learn. They also provide webcasts and save them in youtube for all of the sessions. Perhaps even on the day before the student day?
Angeline: Shakes head no.
Liam K.: You can get a lot done in an hour.
Jeff K.: You can have short little sessions tool by tool.
Liam K.: Logistically it would be hard
Russell: Perhaps it’s time to propose parallel sessions
Ashton R.: It seems there are two problems: the fundamental education problem, and how to build the community better. Perhaps our role here is to solve the second problem and the first problem can be done by providing resources. Teaching people to code is very difficult
Marina: You can give them a head start, though, teach them what they don’t know that they don’t know.
Liam K.: I don’t think it’s necessarily teaching them to code, but showing them how to apply some basic workflows in a scientific context. For example, how to use github, how to set up a workflow, how to work in a small team. This would perhaps be more useful than “how to program in python”
Jeff K.: Students, what do you really want?
XB: You learn to program on your own, how can you learn to program in an hour.
Liam K.: Should we assume you can learn to program on your own? That could be a valid assumption, after all we do assume people know calculus when they come in.
XB: Should you perhaps require programming as a prerequisite
Micheal: Schools are improving about teaching programming in elementary school, so this will improve with time
XA: It isn’t necessarily that they don’t know how to code, but that they don’t use what we use
Russell S.: And teaching comments, and unit testing, and so on
XA: A typical graduate course has 0-1 courses available that deal with programming
Russell S: Agrees, at UTD
Marina S.: How many people don’t know how to use git? (many hands)
Many: This would be useful
Marina S: Version control is very useful
Russell S.: There’s agreement that we should promote version control, and good software practices to the students. We can also point to youtube videos and so on for more resources.
Marina S.: Now you’re putting expectations on us!
Russell S.: I don’t want to teach GitHub when I basically muddle along
Liam K.: That’s better than most! Teaching this, even anecdotally will help the community and point them in the right direction
Marina S.: Agrees, this would be valuable and could be tutorialized
Zach: We don’t need to and don’t have time to teach all best practices, but it would be the most useful for everyone, especially with someone who knows what they’re doing in CS (like Marina).
Russell: What do we need
Ashton: More wrappers for IRI
Jeff: There’s a lot out there
Ashton: the heliopython website is a big help
Angeline: There’s also a wiki page on the CEDAR wiki where CEDAR code can be listed
Micheal: Learning what versions of different things people use could also be useful, or making a basic list of utilities people commonly use (provide a place for newbies to start)
Liam: Where would that go?
Micheal: Another page on the CEDAR wiki
Liam: How do we come to a consensus as to what that is?
Ashton: Start a discussion on the wiki?
XB: As a student, I think that would be very useful. Often the biggest barrier is that things won’t install. Having students being able
57% men/43% women through the entire session.
74% men/26% women through the discussion session.
Upload presentation and link to it here. Links to other resources.
- Add links to your presentations here, including agendas, that are uploaded above. Please add bullets to separate talks. See further information on how to upload a file and link to it.