Skip to contents
library(dplyr)
library(qSIP2)
packageVersion("qSIP2")
#> [1] '0.17.8'

Source Material and Metadata

In a SIP experiment, the “source material” are the subjects that you are running experiments with (e.g. a culture tube or a plant root). For qSIP the source material would then be each DNA extraction that is loaded into its own column for isopycnic centrifugation. The “source data” is the highest level of metadata with a row corresponding to each original experimental or source material object. Because each source is fractionated into samples, it will have a one-to-many relationship with the “sample data” (see vignette("sample_data")).

There are a few required columns for valid source data including a unique ID (the source_mat_id), some measure of quantitative abundance for the source material (either total DNA or qPCR copies), and an isotope and substrate designation (the isotope and isotopolog, respectively). Ideally, the substrate should be a standardized compound ID (e.g. PubChem 6137 for L-methionine), but for qSIP2 is can just be descriptive text like “methionine”. In addition to the required columns, the source data can contain as many other ancillary columns as necessary to describe the experiment. These additional columns might contain important experiment-specific metadata that you will use to group and subset your source material in the qSIP workflow. But, they can also be further details that you might not need for qSIP, but it may make sense to just keep everything included if they’re already in your .txt. or excel file.

An example source dataframe is included in the qSIP2 package called example_source_df. It includes a 13C glucose addition study with two different moisture treatments (“normal” and “drought”) in quadruplicate, with one only in triplicate. Each experiment contains both unlabeled 12C and labeled 13C source material, but you may have an experiment where different 13C treatments share the same 12C source material. For example, a split experiment where you have one 12C data set that is split into many experimental conditions each with a different isotopolog.

The first few rows of example_source_df
source total_copies_per_g total_dna Isotope Moisture isotopolog
S149 34838665 74.46539 12C Normal glucose
S150 53528072 109.01522 12C Normal glucose
S151 95774992 182.16852 12C Normal glucose
S152 9126192 23.68963 12C Normal glucose
S161 41744046 67.62552 12C Drought glucose
S162 49402713 94.21217 12C Drought glucose

qSIP2 Source Data Object

Once the dataframe is ready with at least the three required columns (source_mat_id, isotope and isotopolog), the next step is to convert it to a qsip_source_data object. This is one of the main qSIP2 objects to hold and validate the data.

source_object <- qsip_source_data(example_source_df,
  isotope = "Isotope",
  isotopolog = "isotopolog",
  source_mat_id = "source"
)

class(source_object)
#> [1] "qsip_source_data" "S7_object"

While these three columns are required for the EAF workflow, there are additional columns required for the growth workflow (timepoint, total_abundance and volume). These can remain empty/unassigned for this vignette, and will be detailed in a forthcoming growth workflow vignette.

Note, your column names in the dataframe don’t have to specifically be the required column names, so no need to edit your original table headers if they don’t match. For example, if you’re isotopolog column is titled “substrate”, it isn’t necessary to rename your column. If your column names are already standardized names, then there is no need to assign while creating the object. For example, the isotopolog column is already title “isotopolog”, so if it is omitted from the object creation then the column will still be identified and used.

# this will still work even though the isotopolog parameter is not assigned
qsip_source_data(example_source_df,
  isotope = "Isotope",
  source_mat_id = "source"
)

Structure of qsip_source_data

While this object is not meant to be inspected or worked with outside of qSIP2 functions, a quick glimpse() can show the structure of it.

glimpse(source_object)
#> <qsip_source_data>
#>  @ data           : tibble [15 × 6] (S3: tbl_df/tbl/data.frame)
#>  $ isotope           : chr [1:15] "12C" "12C" "12C" "12C" ...
#>  $ isotopolog        : chr [1:15] "glucose" "glucose" "glucose" "glucose" ...
#>  $ source_mat_id     : chr [1:15] "S149" "S150" "S151" "S152" ...
#>  $ total_copies_per_g: num [1:15] 34838665 53528072 95774992 9126192 41744046 ...
#>  $ total_dna         : num [1:15] 74.5 109 182.2 23.7 67.6 ...
#>  $ Moisture          : chr [1:15] "Normal" "Normal" "Normal" "Normal" ...
#>  @ isotope        : chr "Isotope"
#>  @ isotopolog     : chr "isotopolog"
#>  @ source_mat_id  : chr "source"
#>  @ timepoint      : chr "NULL"
#>  @ total_abundance: chr "NULL"
#>  @ volume         : chr "NULL"

The original dataframe is contained in the @data slot, however, some column names have been modified to the standard names, while keeping a record of the original names in the corresponding slots.

Column name differences
Original Names qSIP Names Original Name Slot
source source_mat_id @source_mat_id
Isotope isotope @isotope
substrate isotopolog @isotopolog

To get the dataframe back out of the qsip_source_data object you can use the get_dataframe() method with original_headers set to TRUE or FALSE, depending on your needs. But, note that the columns may be in a different order than the dataframe you started with.

get_dataframe(source_object, original_headers = T)
Isotope isotopolog source total_copies_per_g total_dna Moisture
12C glucose S149 34838665 74.46539 Normal
12C glucose S150 53528072 109.01522 Normal
12C glucose S151 95774992 182.16852 Normal
12C glucose S152 9126192 23.68963 Normal
12C glucose S161 41744046 67.62552 Drought
12C glucose S162 49402713 94.21217 Drought
12C glucose S163 47777726 87.82524 Drought
12C glucose S164 48734282 75.97274 Drought
13C glucose S178 62964478 73.89526 Normal
13C glucose S179 49475460 68.65182 Normal
13C glucose S180 51720787 81.36874 Normal
13C glucose S200 59426155 71.19377 Drought
13C glucose S201 56379702 73.78225 Drought
13C glucose S202 42562198 108.11436 Drought
13C glucose S203 49914369 80.48608 Drought

Validation of qsip_source_data

While constructing a qsip_source_data object there are a few validation checks that are performed. For now, the only checks are that the source_mat_id is unique for each row, and that the isotope field is an appropriate value. This doesn’t just mean it is a value that makes sense, but also that it is one of the isotopes that qSIP2 knows how to calculate atom fraction values from. This is currently limited to 12C/13C, 14N/15N and 16O/18O. There are some “non-isotopic” names allowed as well for source material that might be unfractionated. These additional options are “bulk”, “unfractionated”, “T0”, “time0”, “Time0”, and are added as exceptions in the validate_isotopes() helper function.


# artificially doubling the rows will give an error from duplicate source_mat_ids
example_source_df |>
  rbind(example_source_df) |>
  qsip_source_data(
    isotope = "Isotope",
    isotopolog = "isotopolog",
    source_mat_id = "source"
  )
#> Error: some source_mat_ids are duplicated

One benefit of the validation steps being embedded in the object itself is that these validations are automatically run when the object is modified. This makes it impossible to modify the data later to an invalid object, e.g. changing an isotope to an invalid choice.

source_object@data$isotope <- "13G"
#> invalid isotope found: 13G
#> Error: Please fix the isotope names and try again

MISIP

While qSIP standards are part of the MISIP1 standards, the qSIP2 package is a little less stringent. This means your valid qSIP2 object might not be valid for a MISIP submission. At the source data level this is primarily through the difference between how the isotope data is coded, plus the addition of another isotopolog_label column.

qSIP2 has functions to convert between these two types. add_isotoplog_label() makes a MISIP version of the source data, and remove_isotopolog_label() converts it back to a qSIP2 compatible version. Two things are changed when running add_isotoplog_label() - 1) the isotopolog_label column is added and is populated with either “isotopically labeled” or “natural abundance” for heavy and light isotopes, respectively, and 2) the isotope column gets modified to be only the heavy isotope (e.g. all “12C” entries become “13C”).

Note, these functions are run on the source dataframe rather than on the qsip_source_data object.

Adding the isotopolog_label column

example_source_df |>
  add_isotopolog_label(isotope = "Isotope")
A ’MISIPified version of example_source_df
source total_copies_per_g total_dna isotope isotopolog_label Moisture isotopolog
S151 95774992 182.16852 13C natural abundance Normal glucose
S178 62964478 73.89526 13C isotopically labeled Normal glucose
S200 59426155 71.19377 13C isotopically labeled Drought glucose
S201 56379702 73.78225 13C isotopically labeled Drought glucose
S150 53528072 109.01522 13C natural abundance Normal glucose
S180 51720787 81.36874 13C isotopically labeled Normal glucose

Now, the Isotope column has been renamed to isotope to satisfy MISIP standards, and all values have been replaced with the heavy isotope.

Count of isotope types in example_source_df_MISIP
isotope n
13C 15

And the designation for whether the source material was the “light” or “heavy” version of the isotope has now been transferred to the isotopolog_label column.

Count of isotopolog_label types in example_source_df_MISIP
isotopolog_label n
isotopically labeled 7
natural abundance 8

Removing the isotopolog_label column

This change can be reverted with the remove_isotopolog_label() function.

example_source_df |>
  add_isotopolog_label(isotope = "Isotope") |>
  remove_isotopolog_label()
example_source_df_MISIP converted back
source total_copies_per_g total_dna isotope Moisture isotopolog
S149 34838665 74.46539 12C Normal glucose
S150 53528072 109.01522 12C Normal glucose
S151 95774992 182.16852 12C Normal glucose
S152 9126192 23.68963 12C Normal glucose
S161 41744046 67.62552 12C Drought glucose
S162 49402713 94.21217 12C Drought glucose
S163 47777726 87.82524 12C Drought glucose
S164 48734282 75.97274 12C Drought glucose
S178 62964478 73.89526 13C Normal glucose
S179 49475460 68.65182 13C Normal glucose
S180 51720787 81.36874 13C Normal glucose
S200 59426155 71.19377 13C Drought glucose
S201 56379702 73.78225 13C Drought glucose
S202 42562198 108.11436 13C Drought glucose
S203 49914369 80.48608 13C Drought glucose

Note, the original is not exactly preserved as the original Isotope column has the MISIP standard isotope name retained.