$$ \newcommand{\problemdivider}{\begin{center}\large \bf\ldots\ldots\ldots\ldots\ldots\ldots\end{center}} \newcommand{\subproblemdivider}{\begin{center}\large \bf\ldots\ldots\end{center}} \newcommand{\pdiv}{\problemdivider} \newcommand{\spdiv}{\subproblemdivider} \newcommand{\ba}{\begin{align*}} \newcommand{\ea}{\end{align*}} \newcommand{\rt}{\right} \newcommand{\lt}{\left} \newcommand{\bp}{\begin{problem}} \newcommand{\ep}{\end{problem}} \newcommand{\bsp}{\begin{subproblem}} \newcommand{\esp}{\end{subproblem}} \newcommand{\bssp}{\begin{subsubproblem}} \newcommand{\essp}{\end{subsubproblem}} \newcommand{\atag}[1]{\addtocounter{equation}{1}\label{#1}\tag{\arabic{section}.\alph{subsection}.\alph{equation}}} \newcommand{\btag}[1]{\addtocounter{equation}{1}\label{#1}\tag{\arabic{section}.\alph{equation}}} \newcommand{\ctag}[1]{\addtocounter{equation}{1}\label{#1}\tag{\arabic{equation}}} \newcommand{\dtag}[1]{\addtocounter{equation}{1}\label{#1}\tag{\Alph{chapter}.\arabic{section}.\arabic{equation}}} \newcommand{\unts}[1]{\ \text{#1}} \newcommand{\textop}[1]{\operatorname{#1}} \newcommand{\textopl}[1]{\operatornamewithlimits{#1}} \newcommand{\prt}{\partial} \newcommand{\pderi}[3]{\frac{\prt^{#3}#1}{\prt #2^{#3}}} \newcommand{\deri}[3]{\frac{d^{#3}#1}{d #2^{#3}}} \newcommand{\del}{\vec\nabla} \newcommand{\exval}[1]{\langle #1\rangle} \newcommand{\bra}[1]{\langle #1|} \newcommand{\ket}[1]{|#1\rangle} \newcommand{\ham}{\mathcal{H}} \newcommand{\arr}{\mathfrak{r}} \newcommand{\conv}{\mathop{\scalebox{2}{\raisebox{-0.2ex}{$\ast$}}}} \newcommand{\bsm}{\lt(\begin{smallmatrix}} \newcommand{\esm}{\end{smallmatrix}\rt)} \newcommand{\bpm}{\begin{pmatrix}} \newcommand{\epm}{\end{pmatrix}} \newcommand{\bdet}{\lt|\begin{smallmatrix}} \newcommand{\edet}{\end{smallmatrix}\rt|} \newcommand{\bs}[1]{\boldsymbol{#1}} \newcommand{\uvec}[1]{\bs{\hat{#1}}} \newcommand{\qed}{\hfill$\Box$} $$
Tags:
  • genomics
  • Working with the cloud

    export GCS_OAUTH_TOKEN=$(gcloud auth application-default print-access-token)

    Interval notes

    • samtools view -L <bed> does not use the index, but rather iterates over all reads start-to-finish. You need to explicitly specify samtools view -ML.

    • Samtools (and GATK) interval subsetting includes all reads that overlap the interval. This is sometimes not desired behavior, since it makes it difficult to create scatter intervals that are disjoint with respect to reads they contain.

      As such, piping the output of samtools view to a simple helper tool is necessary.

    Making new BAM records

    bam_set1 behaves almost as expected, except:

    1. If copying from an existing record (read_in), l_qname already accounts for the null padding, but this is not accounted for in bam_set1. It thus must be subtracted off:
    bam_set1(
      &read_out,
      read_in.core.l_qname - read_in.core.l_extranul - 1,
      bam_get_qname(&read_in),
      ...
      bam_get_l_aux(&read_in)
    );
    
    1. The easiest way to copy tags from another record is by memcpying its entire aux block. Note that the l_data field of the output record must be incremented by the aux length; passing l_aux to bam_set1 will not automatically do this.
    memcpy(bam_get_aux(&read_out), bam_get_aux(&read_in), bam_get_l_aux(&read_in));
    read_w.l_data += bam_get_l_aux(&read_in);