<TeXmacs|1.0.6>

<style|generic>

<\body>
  <section|Lecture 3>

  What did we learn last time?

  <\itemize-dot>
    <item>matrices: create with <verbatim|matrix()>.

    <item>Address entries in an array: <verbatim|a[2,2], a[1:2,1:2],
    a[,c(2,4)], a[a\<gtr\>8]>

    <item>Apply a function on all rows, or all column of an array:\ 

    <\itemize-dot>
      <item><verbatim|apply( a, 1, mean )> - calculate mean of all rows.

      <item><verbatim|apply( a, 2, mean )> - calculate mean of all columns.
    </itemize-dot>

    All this is very useful for bootstraping!

    <item>Functions: <verbatim|function(x) { x+1}>

    <item>Read data from file: <verbatim|read.table()>
  </itemize-dot>

  <subsection|factors>

  Till now we saw 3 types of vectors: numbers, booleans, and strings

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=1:4
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=c("hello","there")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=(1:4)\<gtr\>2
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  Now we'll learn about factors. Factors are somewhere between a string and a
  boolean. We use them when we have values, such as strings, that have
  several different types

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=sample(c("summer","winter"),20,rep=T)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  This is a vector of strings. Now we want to use the fact that we have just
  2 types:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=factor(a)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a[1]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      as.numeric(a)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      levels(a)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      levels(a)=c("s","w")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  This has several advantages. The main one is that it is much faster to
  check if factors are equal to one another - you just have to compare
  numbers. Anoter comes up later, when we learn about ANOVA.\ 

  <subsection|read.table>

  We want to read the following table: (switch to excel)

  \;

  We saved it as comma separated values.

  Let us first see how we can list files in a directory:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      list.files()
    </input>
  </session>>

  Very easy. How can we see in which directory we are?

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      getwd()
    </input>
  </session>>

  And how to change it?

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      setwd("~/R-course-2006/lecture3")
    </input>
  </session>>

  These are a bit hard to remember: <verbatim|setwd>, and <verbatim|getwd>
  for set working directory, and get working directory.

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=read.table("example3.csv",head=T,row.names=1,as.is=3, sep=",")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  Normaly, R will convert a string column that is read in into categorical
  data. To prevent R from doing that, you should give the <verbatim|as.is>
  argument, with the columns that should be read as strings.

  If a column has values that are essencially all different, store as string.
  If some values are the same and you will compare between values, use
  categorical data (i.e. no <verbatim|as.is>)

  The are many ways to access the data in the table:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a$height
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a$hei
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a$hai
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  The method using the dollar sign takes a column. You can see that you can
  abbreviate the name.

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a[,1]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a[1:2,]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a[,"email"]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a[,"ema"]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  You can not abreviate when you use <verbatim|[]>.

  We want to sort the table by height. How do we do that?

  Here the function <verbatim|order()> helps. It gives us the order of
  elements.

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      order(a$height)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  So <verbatim|a> will be sorted by height if we write first the 5th row,
  then the 3rd, then the 6th, then the 1st, and so on. we can do that like
  this:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      i=order(a$height)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      i
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a[i,]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a[ order( a$height ) , ]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  <subsection|names>

  In R, almost everything can have names:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=c(2,3,4)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=c(x=2,y=3,z=4)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      names(a)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=1:4
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      names(a)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      names(a)=c("a","b","c","d")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      names(a)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  Now we see that each entry has a name.

  \;

  This is useful when functions return values, because it tells us what each
  value is:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=rnorm(100)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      summary(a)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  <verbatim|summary> is a function, and the values that it returns have
  names:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      sum.a=summary(a)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      names(sum.a)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      sum.a["Mean"]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      sum.a["Max."]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  matrices and data.frames have names for the rows and columns:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=matrix(1:6,2,3)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      rownames(a)=c("first row","second row")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      colnames(a)=c("a","b","c")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a[,"a"]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a[,"b"]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  Row names are especially important when we want to use data from different
  sources: for example when we have results of one experiment for some genes,
  and more results of from a second experiment. The row names allow us then
  to quickly connect the results for the same genes.

  <subsection|Functions>

  Last time we talked a bit about functions. Now some more.

  Let us define a simple function:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      f=function(x) x+1
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      f(4)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      f(9.3)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      x=17
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      f(2)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      x
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  So, a function is very easy to define: We just say

  <verbatim|function(x)> followed by an expression.

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      square= function(z) z^2
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      square(3)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      square(1:5)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      square
    </input>
  </session>>

  We can see that functions can simply be stored in variables, like numbers
  or strings.

  Functions can also take several arguments:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      mult = function(x,y) x*y
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      mult(2,3)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      mult(1:4,2:5)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  Applying a function to a vector

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=1:10
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      f=function(x) x+1
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      sapply(a,f)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  <verbatim|sapply()> applies the function to each element in the vector, and
  returns the result.

  If the result is a vector, then we get a matrix:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      g=function(x) c(x,x+1)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      g(2)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      g(5)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      sapply(1:4,g)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  Sometimes we need more than one expression for the calculation. In that
  case we can enclose the calculations with <verbatim|{}>. The result will be
  the last expression.

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      die.roll.6=function(){ x=sample(1:6,1); x==6 }
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      die.roll.6()
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  Assignments in functions do not affect the outside:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      x=100
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      die.roll.6()
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      x
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  To see what is happenening in a function, we can add print statements:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      die.roll.6=function(){ x=sample(1:6,1); print(x); x==6 }
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      die.roll.6()
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  R also has conditional statements, and loops:

  <subsection|Conditionals>

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      f = function(x) { if( x \<gtr\> 3) 4 else 5}
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      f(2)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      f(6)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  <subsection|Loops>

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      for(i in 1:10) print(i)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=1
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      for(i in 1:10) a = a * i
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    \;
  </session>>

  <verbatim|for> has the following structure: you give a variable that will
  iterate over a vector or list of things.\ 

  The expression will be called with each i.

  \;

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>
  </session>>

  \;

  \;

  \;
</body>

<\initial>
  <\collection>
    <associate|sfactor|5>
  </collection>
</initial>

<\references>
  <\collection>
    <associate|auto-1|<tuple|1|1>>
    <associate|auto-2|<tuple|1.1|1>>
    <associate|auto-3|<tuple|1.2|2>>
    <associate|auto-4|<tuple|1.3|5>>
    <associate|auto-5|<tuple|1.4|6>>
    <associate|auto-6|<tuple|1.5|7>>
    <associate|auto-7|<tuple|1.6|9>>
    <associate|auto-8|<tuple|1.7|9>>
    <associate|toc-1|<tuple|<uninit>|?>>
    <associate|toc-2|<tuple|<uninit>|?>>
    <associate|toc-3|<tuple|<uninit>|?>>
    <associate|toc-4|<tuple|<uninit>|?>>
    <associate|toc-5|<tuple|<uninit>|?>>
    <associate|toc-6|<tuple|<uninit>|?>>
    <associate|toc-7|<tuple|<uninit>|?>>
    <associate|toc-8|<tuple|<uninit>|?>>
  </collection>
</references>

<\auxiliary>
  <\collection>
    <\associate|toc>
      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|Lecture
      3> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-1><vspace|0.5fn>

      <with|par-left|<quote|1.5fn>|factors
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-2>>

      <with|par-left|<quote|1.5fn>|read.table
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-3>>

      <with|par-left|<quote|1.5fn>|names <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-4>>

      <with|par-left|<quote|1.5fn>|What is the difference between matrices
      and data.frames? <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-5>>

      <with|par-left|<quote|1.5fn>|Functions
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-6>>

      <with|par-left|<quote|1.5fn>|Conditionals
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-7>>

      <with|par-left|<quote|1.5fn>|Loops <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-8>>
    </associate>
  </collection>
</auxiliary>
