library(dplyr)
library(qSIP2)
packageVersion("qSIP2")
#> [1] '0.17.8'
Source Material and Metadata
In a SIP experiment, the “source material” are the subjects that you
are running experiments with (e.g. a culture tube or a plant root). For
qSIP the source material would then be each DNA extraction that is
loaded into its own column for isopycnic centrifugation. The “source
data” is the highest level of metadata with a row corresponding to each
original experimental or source material object. Because each source is
fractionated into samples, it will have a one-to-many relationship with
the “sample data” (see vignette("sample_data")
).
There are a few required columns for valid source data including a
unique ID (the source_mat_id
), some measure of quantitative
abundance for the source material (either total DNA or qPCR copies), and
an isotope and substrate designation (the isotope
and
isotopolog
, respectively). Ideally, the substrate should be
a standardized compound ID (e.g. PubChem 6137 for
L-methionine), but for qSIP2
is can just be descriptive
text like “methionine”. In addition to the required columns, the source
data can contain as many other ancillary columns as necessary to
describe the experiment. These additional columns might contain
important experiment-specific metadata that you will use to group and
subset your source material in the qSIP workflow. But, they can also be
further details that you might not need for qSIP, but it may make sense
to just keep everything included if they’re already in your
.txt.
or excel file.
An example source dataframe is included in the qSIP2
package called example_source_df
. It includes a 13C glucose
addition study with two different moisture treatments (“normal” and
“drought”) in quadruplicate, with one only in triplicate. Each
experiment contains both unlabeled 12C and labeled 13C source material,
but you may have an experiment where different 13C treatments share the
same 12C source material. For example, a split experiment where you have
one 12C data set that is split into many experimental conditions each
with a different isotopolog.
source | total_copies_per_g | total_dna | Isotope | Moisture | isotopolog |
---|---|---|---|---|---|
S149 | 34838665 | 74.46539 | 12C | Normal | glucose |
S150 | 53528072 | 109.01522 | 12C | Normal | glucose |
S151 | 95774992 | 182.16852 | 12C | Normal | glucose |
S152 | 9126192 | 23.68963 | 12C | Normal | glucose |
S161 | 41744046 | 67.62552 | 12C | Drought | glucose |
S162 | 49402713 | 94.21217 | 12C | Drought | glucose |
qSIP2 Source Data Object
Once the dataframe is ready with at least the three required columns
(source_mat_id
, isotope
and
isotopolog
), the next step is to convert it to a
qsip_source_data
object. This is one of the main
qSIP2
objects to hold and validate the data.
source_object <- qsip_source_data(example_source_df,
isotope = "Isotope",
isotopolog = "isotopolog",
source_mat_id = "source"
)
class(source_object)
#> [1] "qsip_source_data" "S7_object"
While these three columns are required for the EAF workflow, there
are additional columns required for the growth workflow
(timepoint
, total_abundance
and
volume
). These can remain empty/unassigned for this
vignette, and will be detailed in a forthcoming growth workflow
vignette.
Note, your column names in the dataframe don’t have to specifically
be the required column names, so no need to edit your original table
headers if they don’t match. For example, if you’re
isotopolog
column is titled “substrate”, it isn’t necessary
to rename your column. If your column names are already standardized
names, then there is no need to assign while creating the object. For
example, the isotopolog
column is already title
“isotopolog”, so if it is omitted from the object creation then the
column will still be identified and used.
# this will still work even though the isotopolog parameter is not assigned
qsip_source_data(example_source_df,
isotope = "Isotope",
source_mat_id = "source"
)
Structure of qsip_source_data
While this object is not meant to be inspected or worked with outside
of qSIP2
functions, a quick glimpse()
can show
the structure of it.
glimpse(source_object)
#> <qsip_source_data>
#> @ data : tibble [15 × 6] (S3: tbl_df/tbl/data.frame)
#> $ isotope : chr [1:15] "12C" "12C" "12C" "12C" ...
#> $ isotopolog : chr [1:15] "glucose" "glucose" "glucose" "glucose" ...
#> $ source_mat_id : chr [1:15] "S149" "S150" "S151" "S152" ...
#> $ total_copies_per_g: num [1:15] 34838665 53528072 95774992 9126192 41744046 ...
#> $ total_dna : num [1:15] 74.5 109 182.2 23.7 67.6 ...
#> $ Moisture : chr [1:15] "Normal" "Normal" "Normal" "Normal" ...
#> @ isotope : chr "Isotope"
#> @ isotopolog : chr "isotopolog"
#> @ source_mat_id : chr "source"
#> @ timepoint : chr "NULL"
#> @ total_abundance: chr "NULL"
#> @ volume : chr "NULL"
The original dataframe is contained in the @data
slot,
however, some column names have been modified to the standard names,
while keeping a record of the original names in the corresponding
slots.
Original Names | qSIP Names | Original Name Slot |
---|---|---|
source | source_mat_id | @source_mat_id |
Isotope | isotope | @isotope |
substrate | isotopolog | @isotopolog |
To get the dataframe back out of the qsip_source_data
object you can use the get_dataframe()
method with
original_headers
set to TRUE
or
FALSE
, depending on your needs. But, note that the columns
may be in a different order than the dataframe you started with.
get_dataframe(source_object, original_headers = T)
Isotope | isotopolog | source | total_copies_per_g | total_dna | Moisture |
---|---|---|---|---|---|
12C | glucose | S149 | 34838665 | 74.46539 | Normal |
12C | glucose | S150 | 53528072 | 109.01522 | Normal |
12C | glucose | S151 | 95774992 | 182.16852 | Normal |
12C | glucose | S152 | 9126192 | 23.68963 | Normal |
12C | glucose | S161 | 41744046 | 67.62552 | Drought |
12C | glucose | S162 | 49402713 | 94.21217 | Drought |
12C | glucose | S163 | 47777726 | 87.82524 | Drought |
12C | glucose | S164 | 48734282 | 75.97274 | Drought |
13C | glucose | S178 | 62964478 | 73.89526 | Normal |
13C | glucose | S179 | 49475460 | 68.65182 | Normal |
13C | glucose | S180 | 51720787 | 81.36874 | Normal |
13C | glucose | S200 | 59426155 | 71.19377 | Drought |
13C | glucose | S201 | 56379702 | 73.78225 | Drought |
13C | glucose | S202 | 42562198 | 108.11436 | Drought |
13C | glucose | S203 | 49914369 | 80.48608 | Drought |
Validation of qsip_source_data
While constructing a qsip_source_data
object there are a
few validation checks that are performed. For now, the only checks are
that the source_mat_id
is unique for each row, and that the
isotope
field is an appropriate value. This doesn’t just
mean it is a value that makes sense, but also that it is one of the
isotopes that qSIP2
knows how to calculate atom fraction
values from. This is currently limited to 12C/13C, 14N/15N and 16O/18O.
There are some “non-isotopic” names allowed as well for source material
that might be unfractionated. These additional options are “bulk”,
“unfractionated”, “T0”, “time0”, “Time0”, and are added as exceptions in
the validate_isotopes()
helper function.
# artificially doubling the rows will give an error from duplicate source_mat_ids
example_source_df |>
rbind(example_source_df) |>
qsip_source_data(
isotope = "Isotope",
isotopolog = "isotopolog",
source_mat_id = "source"
)
#> Error: some source_mat_ids are duplicated
One benefit of the validation steps being embedded in the object itself is that these validations are automatically run when the object is modified. This makes it impossible to modify the data later to an invalid object, e.g. changing an isotope to an invalid choice.
source_object@data$isotope <- "13G"
#> invalid isotope found: 13G
#> Error: Please fix the isotope names and try again
MISIP
While qSIP standards are part of the MISIP1 standards, the
qSIP2
package is a little less stringent. This means your
valid qSIP2
object might not be valid for a MISIP
submission. At the source data level this is primarily through the
difference between how the isotope
data is coded, plus the
addition of another isotopolog_label
column.
qSIP2
has functions to convert between these two types.
add_isotoplog_label()
makes a MISIP version of the source
data, and remove_isotopolog_label()
converts it back to a
qSIP2
compatible version. Two things are changed when
running add_isotoplog_label()
- 1) the
isotopolog_label
column is added and is populated with
either “isotopically labeled” or “natural abundance” for heavy and light
isotopes, respectively, and 2) the isotope
column gets
modified to be only the heavy isotope (e.g. all “12C” entries become
“13C”).
qsip_source_data
object.
Adding the isotopolog_label
column
example_source_df |>
add_isotopolog_label(isotope = "Isotope")
source | total_copies_per_g | total_dna | isotope | isotopolog_label | Moisture | isotopolog |
---|---|---|---|---|---|---|
S151 | 95774992 | 182.16852 | 13C | natural abundance | Normal | glucose |
S178 | 62964478 | 73.89526 | 13C | isotopically labeled | Normal | glucose |
S200 | 59426155 | 71.19377 | 13C | isotopically labeled | Drought | glucose |
S201 | 56379702 | 73.78225 | 13C | isotopically labeled | Drought | glucose |
S150 | 53528072 | 109.01522 | 13C | natural abundance | Normal | glucose |
S180 | 51720787 | 81.36874 | 13C | isotopically labeled | Normal | glucose |
Now, the Isotope
column has been renamed to
isotope
to satisfy MISIP standards, and all values have
been replaced with the heavy isotope.
isotope | n |
---|---|
13C | 15 |
And the designation for whether the source material was the “light”
or “heavy” version of the isotope has now been transferred to the
isotopolog_label
column.
isotopolog_label | n |
---|---|
isotopically labeled | 7 |
natural abundance | 8 |
Removing the isotopolog_label
column
This change can be reverted with the
remove_isotopolog_label()
function.
example_source_df |>
add_isotopolog_label(isotope = "Isotope") |>
remove_isotopolog_label()
source | total_copies_per_g | total_dna | isotope | Moisture | isotopolog |
---|---|---|---|---|---|
S149 | 34838665 | 74.46539 | 12C | Normal | glucose |
S150 | 53528072 | 109.01522 | 12C | Normal | glucose |
S151 | 95774992 | 182.16852 | 12C | Normal | glucose |
S152 | 9126192 | 23.68963 | 12C | Normal | glucose |
S161 | 41744046 | 67.62552 | 12C | Drought | glucose |
S162 | 49402713 | 94.21217 | 12C | Drought | glucose |
S163 | 47777726 | 87.82524 | 12C | Drought | glucose |
S164 | 48734282 | 75.97274 | 12C | Drought | glucose |
S178 | 62964478 | 73.89526 | 13C | Normal | glucose |
S179 | 49475460 | 68.65182 | 13C | Normal | glucose |
S180 | 51720787 | 81.36874 | 13C | Normal | glucose |
S200 | 59426155 | 71.19377 | 13C | Drought | glucose |
S201 | 56379702 | 73.78225 | 13C | Drought | glucose |
S202 | 42562198 | 108.11436 | 13C | Drought | glucose |
S203 | 49914369 | 80.48608 | 13C | Drought | glucose |
Note, the original is not exactly preserved as the original
Isotope
column has the MISIP standard isotope
name retained.