The Carpentries is preparing to retire this handbook at the end of 2023. Existing content that has not already been replicated elsewhere will be relocated. See the relevant issue on the source repository for more details.

Chapter 3 Deciding what to teach

As discussed in an earlier chapter, the first step in designing a curriculum according to backward design principles is to identify the practical skills that you aim to teach. This step is absolutely critical to defining the scope of your curriculum and to avoid scope creep, both during initial writing of the lesson materials and later community-driven development.

3.1 Target audience

To identify the skills you aim to teach, it is first essential to define your target audience, as different audiences will have different needs, as well as different starting skill sets. For example, Data Carpentry’s Genomics workshop curriculum assumes that learners have some background in the biological sciences and will understand biological terminology and abbreviations used in those lessons, but does not assume any prior experience with the tools taught in the lesson, including The Unix Shell and Amazon Web Services. These assumptions set the stage for lesson development, by placing boundaries around what will and won’t need to be explained in the lesson.

Defining your target audience is also essential to reducing the impact of expert awareness gaps. As an experienced researcher in your field, there are likely many steps in the data management and analysis process that you do without consciously thinking about. Without explicitly evaluating your target audience and understanding their actual background and skill level, you are in danger of skipping over intermediate steps that they need to know in order to succeed in their research.

You probably already have some sense of your target audience. To help refine this sense, ask yourself the following questions. Write down your answers and see if you can clearly articulate who is and isn’t included in your audience. Share these thoughts with your colleagues and see if they agree.

3.1.1 Audience definition questions

  • What is the expected educational level of your audience? – Do you expect most learners to be undergraduate students, graduate students, or to have completed graduate school? If you are targeting graduate students, do you expect learners to be new graduate students, masters degree holders, or doctoral candidates?

  • What type of exposure do your audience members have to the technologies you plan to teach? – Think about the typical course work that someone in your field has completed when they are at your targeted educational level. Have they had classes where they needed to use R, Python, or some other programming language for their homework? Does your department require any courses on data organization or management? Do students in your field ever interact with a remote computing system? Don’t worry if the answer to all of these questions is “no”, most university departments don’t build computational training into their undergraduate or graduate programs - which is why The Carpentries exists! Talk with others in your field, especially colleagues at different institutions and in different countries. Having an accurate picture of your target audience’s actual exposure to these skills will help you plan a realistic curriculum.

  • What types of tools do they already use? – Related to the previous question, it is useful to understand the toolkit that your target audience is already comfortable with. Do they commonly use spreadsheet software like Microsoft Excel, Numbers, or Google Sheets? Are they most comfortable working in rich text editors like Microsoft Word or Google Docs? Do they use any web-based GUIs or databases? Having this information will help you appropriately target your content by making useful analogies and tying new knowledge to existing knowledge. It is also very important to understand what tools established researchers in the field are using. No matter how enthusiastic a new doctoral student might be about using Python, if everyone else in their lab (including their advisor) uses MATLAB, they’re unlikely to be successful in convincing the entire lab to change their workflows.

  • What are the pain points they are currently experiencing? – The Carpentries trainings are designed to meet learners where they are and help them improve their workflows in a way that is immediately useful for them. We avoid idealism in favor of realism. Yes, it would be excellent if use of version control was standard across the research community, but if the learners at your workshop don’t see the immediate benefit of version control, they are unlikely to implement it. Talk with students and other new researchers in your field. What are the computational tasks they spend hours upon hours doing, only to have to redo when they get their reviews back from the publisher? What repetitive tasks do they do by hand and find mistakes in weeks or months later? People love to share stories like this and you can learn a lot about what others in your field are struggling with by collecting these stories. These are the skills you should be targeting in your lesson.

  • What types of data does your target audience work with? What are the commonalities in the datasets your target audience will encounter? (types of variable, size, standard data formats, etc.) - If you’re designing a domain-specific curriculum, you’ll need to consider the range of data types that members of your domain community work with. For example, researchers in the social sciences work with a wide range of data types, but survey data is common in this research community. Data Carpentry elected to develop lessons around closed-ended survey data, with the hopes of expanding this lesson set to include analysis of free-response text in the future. Similarly, a genomics researcher may work with data sets that span multiple species or multiple individuals or populations within a species. The Data Carpentry Genomics curriculum development team chose to focus on intra-species data sets. Researchers in the field who work with different data types will still be able to benefit from the lessons, but choosing a common data type will ensure your lesson is maximally useful for a broad component of your domain community. It is important to make this decision early on, as trying to include every type of data that researchers in your field work with will result in sprawling, ungainly lessons that aren’t useful for anyone. Choose one thing and do it well!

3.1.2 Learner profiles

After thinking through the audience definition questions above, and discussing these questions with colleagues in other institutions, you should have a fairly clear understanding of your target audience. You should now know who you expect to show up to your workshops, what knowledge and expectations they will bring with them, and what their motivations are. Keeping this information front-and-center throughout the lesson development process is incredibly important, as it is all too easy to forget your target learner and go down tangents in your lesson that don’t serve this set of learner’s needs. To make your target audience more concrete, we recommend creating a set of learner profiles. A learner profile describes a fictional learner at your workshop and includes the person’s general background, the problems they face, and how the course will help them. Software Carpentry has example learner profiles that will be useful in developing learner profiles for your own course. We recommend creating 2-4 learner profiles that describe different segments of your target audience. These profiles can then be consulted at future stages in the curriculum development process. For example, when developing an exercise, you can look at your learner profiles and ask “Is this exercise useful for my target learners?”.

3.2 Skills list

Congratulations! You now have a solid understanding of the users of your lesson materials and can concretely define the background knowledge and goals of your learners. These goals are a combination of a) reducing or removing pain points that your learners can self-identify and b) achieving next-level competencies that learners may not realize are possible, but which will be practically useful to them in their research. You can think of these two components as the starting and ending points for your lesson. The background knowledge and skills that your learners bring to the workshop define the starting point, while your learners’ goals define the end point. With these start and end points, you can now define the list of skills that you will need to teach at your workshop.

3.3 Example using a Software Carpentry Learner Profile

The following example illustrates how a learner profile can be used to define a list of concrete skills.

Example of Learner Profile

Fan Fullerene is a graduate student in chemistry who is working as a lab technician to help cover his family’s living costs. His only programming experience is a general first-year introduction to computational science using Python.

Fan’s supervisor is studying the production of fullerenes (also known as “buckyballs”). Each set of experiments involves testing a sample at 20 different temperatures and 15 different pressures. Using a machine borrowed from a collaborating lab, Fan can run all temperature and pressure combinations in one job, but must upload a parameter file to the machine to do this. The temperatures and pressures to be used vary from sample to sample, so Fan now has two dozen different parameter files, each containing 300 lines of control information that he fervently hopes is correct.

The machine sends these files to Fan once the experiment is completed. Fan analyzes them by opening Excel, copying and pasting the data into a spreadsheet, then creating a chart using the chart wizard. He then saves the chart as a PNG file on the group’s web site, along with the original data file.

Fan and his wife have had two children arrive while in graduate school, and his research progress is behind that of his peers. He is very nervous about finishing his PhD and suffers from undiagnosed depression.

Software Carpentry will teach Fan how to write programs to generate parameter files and analyze experimental results, and how to track the provenance of the data he is working with so that scientists can trace backward from the final charts to the raw data they represent. It will also teach him how to use version control systems to manage changes to his code.

Fan’s background knowledge and skills include:

  • maybe some vague recolection of basic syntax and terminology from his Python course
  • use of Excel to process tabular data and create graphics
  • interaction with a web GUI to upload data to his lab’s website

Fan’s goals include:

  • creating parameter files automatically
  • managing his many parameter files efficiently
  • automating analysis for multiple runs of his experiment
  • quickly and easily creating a specific graphic from his data
  • uploading data and results to his lab’s website

An (incomplete) list of target skills that can be extracted from this information includes:

  • writing for loops (to create parameter lists)
  • writing data to a file (e.g. parameter lists)
  • writing reusable scripts
  • creating a specific type of graphic programmatically
  • customizing graphics to label them appropriately for a particular data set
  • creating a version controlled repository for storing his parameter and output files
  • pushing output files to his lab’s website programatically

It is important at this stage to be sure you are defining skills that your learners will acquire, not topics that you will teach. You may be tempted to say, for example, that learners will learn about the grammar of graphics – which places the emphasis on a topic that learners will learn about - or that they will learn how to use the R package ggplot2 – which emphasizes the tool that learners will be exposed to. Neither of these ways of stating the learning goal focuses on the abilities that learners will develop that will help them in their work. Since you, as the lesson developer, will be using this skills list to create exercises and lesson content, it is essential to describe specific competencies and abilities that the content will help learners to develop. For example, this learning goal would be better phrased as “create plots that can be quickly and reproducibly modified and customized.” If your target audience has specific plotting needs (like creating time-series plots, or visualising very large datasets), this learning goal would be phrased to incorporate those specific needs.

Spend as much time as you need to on defining this skills list. Don’t rush it! This list will drive the rest of the curriculum development process. It’s ok if your initial list needs to be modified later, but it’s well worth the time to make this list as complete and concrete as possible now as it will save time in the remaining steps.