2021 Workshop:Data Science in CEDAR

From CedarWiki
Jump to: navigation, search

Data Science in CEDAR: CEDAR Data Science as a guide for the Heliophysics Decadal Survey

Location, Date/Time and Duration

1.5-2 hours


Ryan McGranaghan
Asti Bhatt
Dogacan Ozturk

Workshop Categories

Altitudes: IT - Latitudes: global - Inst/Model: modeling - Other: Cross-cutting (observations and models)

Format of the Workshop

See the working agenda here: https://drive.google.com/file/d/1KFbdYX_qOvuJfpwLGKpelQPtxdmR8NMX/view?usp=sharing

Estimated attendance


Requested Specific Days

Given the importance, timeliness, and momentum from the past several CEDAR workshops of the proposed topic, we hope that a short plenary talk will be allotted to the session conveners to address data science in the context of CEDAR science to the entire community. This plenary will sustain and amplify momentum generated from several previous workshops with a data science focus that the conveners have planned or been central contributors to.

Special technology requests


Characterizing the geospace environment requires measurements from several regions within the geospace. Fortunately, data to advance the scientific understanding of the geospace environment are growing across the four V’s of ‘big data’: 1) Volume; 2) Variety; 3) Veracity (i.e., uncertainty); and 4) Velocity. This growth represents both a challenge to efficiently and comprehensively utilize these data, and an opportunity for new discovery by embracing new technologies and analysis capabilities that scale well to the geospace environment. These developments have revolutionized the creation of new scientific insights from data through the union of statistics, computer science, applied mathematics, and visualization, i.e., data science.

Specifically in 2021, we will highlight the theme of information representation, defined broadly to include all components of the data lifecycle:

   - Data collection: use of data science to more intelligently collect data
   - Data management: use of data science to more intelligently structure data (e.g., linking data and knowledge graphs)
   - Data analysis: use of data science to more intelligently relate input to output (e.g., machine learning)
   - Data communication: use of data science to more intelligently visualize and relate data. 

There has now been a series of devoted CEDAR Data Science sessions dating back to 2017, that have continuously supported our community in unifying data and domain sciences and evolved to meet the new demands/challenges.The progress our community has made sets the stage for a new session that will not only continue to share the latest progress, but will also solidify the CEDAR community as a guiding example as we outline the next decade of Heliophysics.

Therefore, the proposed workshop is a timely effort to sustain and amplify the momentum from what is now a long legacy of advancing CEDAR science through data science, including the following selected workshops that the conveners have planned or been central contributors to:

   - Next Generation System Science (2017) (http://cedarweb.vsp.ucar.edu/wiki/index.php/2017_Workshop:Next_generation_systems_science)
   - Digital Geospace (2017) (http://cedarweb.vsp.ucar.edu/wiki/index.php/2017_Workshop:Digital_Geospace)
   - Grand Challenge: Multi-scale I-T System Dynamics (started in 2018 with multiple sessions - see, specifically, my introduction to our GC from the data perspective: recording - https://www.youtube.com/watch?v=eyia4zPSsh4)
   - Next Generation CEDAR Science (2018) (http://cedarweb.vsp.ucar.edu/wiki/index.php/2018_Workshop:Next_generation_CEDAR_science)
   - The challenge, opportunity, and art of data science for geospace (2019) (http://cedarweb.vsp.ucar.edu/wiki/index.php/2019_Workshop:Geospace_Data_Science)
   - Data Science in CEDAR: Progress, Capacity-Building, and Traversing Disciplines (2020) (http://cedarweb.vsp.ucar.edu/wiki/index.php/2020_Workshop:Data_Science_in_CEDAR#Structure_and_Plans_to_Innovate_Virtual_Interactions)

This session will respond to several thrusts of the Decadal Survey:

   - Determine the origins of the Sun’s activity and predict the variations of the space environment, 
   - Enable effective space weather and climatology capabilities, and
   - The need to establish a space weather research program to effectively transition research to operations;

and the CEDAR Strategic Plan:

   - Strategic Thrust 6 : Manage, Mine, and Manipulate Geoscience Data and Models, and
   - Strategic Thrust 1 : Encourage and Undertake a Systems Perspective to Geospace; 

which collectively emphasize a need to embrace data science.

Additionally, the National Science Foundation announced new investments that will be made toward their 10 ‘big ideas’, particularly focusing on two ideas that together objectify radically interdisciplinary work and data science across the scientific landscape:

   - Harnessing data revolution (https://www.nsf.gov/news/special_reports/big_ideas/harnessing.jsp) and
   - Convergence research (https://www.nsf.gov/news/special_reports/big_ideas/convergent.jsp)

The members of the CEDAR community are making valuable strides to embrace and create a structure for data science and NSF big ideas. Therefore, this session will extend the conversation around increasing capability to address data challenges and opportunities and growing convergence in the CEDAR community.


Our specific objectives will be to:

   - Build on a many year foundation of CEDAR Data Science advances, establishing CEDAR as a leader among the scientific domains in unifying data and domain science; 
   - Promote interaction and collaboration between the CEDAR community and related disciplines (e.g., Earth Science);
   - Improve agility and capability within CEDAR science through embracing newer technologies and sound digital data scholarship;
   - Grow methodology transfer to enhance CEDAR science; and 
   - Create materials to help guide the Heliophysics Decadal survey. 

This year, our session will target a draft document to be input to the Heliophysics Decadal Survey panel.

Outcomes: Progress toward these objectives will prepare us to contribute to the Heliophysics Decadal Survey (outlining the future of our broader science domain). Generative discussions will increase our community’s competitiveness in the NSF big ideas and ultimately will advance the New Frontier of CEDAR research [McGranaghan et al., 2017] that this series of “Data Science in CEDAR” workshops have helped create. Additional outcomes will include:

   - Identify the powerful use cases to advance data science capabilities within CEDAR;
   - Sustain and amplify earlier data science efforts for CEDAR science applications; 
   - Encourage and facilitate the adoption of data science in the CEDAR community; and
   - Curate a community and the corresponding capacities for a more structured foundation for data science in CEDAR science.

Workshop Summary

This is where the final summary workshop report will be.

Presentation Resources

Upload presentation and link to it here. Links to other resources.

Upload Files Here

  • Add links to your presentations here, including agendas, that are uploaded above. Please add bullets to separate talks. See further information on how to upload a file and link to it.