<TeXmacs|1.0.6>

<style|generic>

<\body>
  \;

  \;

  <section|Lecture 8 - access to databases>

  R makes it easy to connect to databases of different types. Accessing a
  database is good because it gives you a central storage, and you can read
  just the parts of the data that you need.

  Currently the interface to databases in R is somewhat under flux.
  Eventually, it might be possible to just have one uniform interface to all
  databases.

  Today we will see how to access 3 types of databases: The Oracle database
  (Bjoern), the PostgreSQL (Johan), and Access/Excel (Windows)

  Basically, accessing a database involves reading and writing tables to and
  from the database.

  To access a database, we need the following parts:

  <\itemize-dot>
    <item>connecting to a database - this will give us a variable that stores
    the ``connection'', with which we access everything.

    <item>Seeing what tables are in the database

    <item>reading a table, or part of a table through a query

    <item>writing a table

    <item>deleting a table

    <item>disconnecting
  </itemize-dot>

  <subsection|Interfacing with an Oracle database>

  <subsubsection|Connecting to the database>

  This consists of 2 steps:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      library(ROracle)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      con=dbConnect(dbDriver("Oracle"),user="lachmann",password="alexandria",dbname="TEST_DB")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  Now we are connected to the database. All our commands will use
  <verbatim|con>.

  Let us see what we have in the database:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbListTables(con)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  Let us start with simple things:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      data(ToothGrowth)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      ls()
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      ToothGrowth
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  the data() statement (which has nothing to do with databases or Oracle)
  loads sample data that is provided by some libraries. This is convenient
  for examples.

  We can save a table to the database:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbWriteTable(con,"hamster",ToothGrowth)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=dbReadTable(con,"hamster")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=dbReadTable(con,"hamster",row.names="ROW_NAMES")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  So, we can load and save.\ 

  And, we can remove tables:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbRemoveTable(con,"hamster")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbListTables(con)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  (I don't know why we get the above error message)

  This is already fairly convenient - it gives a convenient way to exchange
  data.

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  <subsection|Simple queries>

  One of the strength of interfacing with a database is that we can not just
  read a table, but format the data as a new table. For example:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=dbGetQuery(con,"SELECT * from hamster")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbGetQuery(con,"select len,supp from hamster")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbGetQuery(con,"select len,supp,dose from hamster where dose \<gtr\>
      1.0")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  or, something more complicated:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      class(con)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      ?dbWriteTable
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=dbGetQuery(con,"select len from hamster")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      b=data.frame(1:60,len=a$LEN,lensq=a$LEN^2); b=b[sample(1:60),]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      b
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbWriteTable(con,"hamsq",b)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbGetQuery(con,"select a.dose,b.lensq,a.len,b.len from hamster a, hamsq
      b where a.len=b.len")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbRemoveTable(con,"hamsq")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      ?dbWriteTable
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  <subsubsection|>

  <subsection|Adding data to a table>

  dbWriteTable has a flag: append, which allows us to add data to a table

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=dbReadTable(con,"hamster",row.names="ROW_NAMES")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a2=a[1:3,]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a2
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      rownames(a2)=100:102
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a2
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbWriteTable(con,"hamster",a2,append=T)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=dbReadTable(con,"hamster",row.names="ROW_NAMES")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  <subsection|Going slowly through a query>

  If the database is very big, we might not want to load it all at once into
  memory. We might want to compute things slowly over it.

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      x=rnorm(10000);y=1:10000
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=data.frame(x=x,y=y)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbWriteTable(con,"bigtab",a)
    </input>

    <\output>
      [1] TRUE
    </output>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  Now assume we want to calculate the sum of all x:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      quer=dbSendQuery(con,"select * from bigtab")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      sumx=0
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      while(!dbHasCompleted(quer) ) {\ 

      \ \ dat=fetch(quer,n=1)

      \ \ if( !dbHasCompleted(quer) ){

      \ \ sumx = sumx+dat$X[1]

      \ }

      }
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      fetch(quer,n=3,row.names="ROW_NAMES")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbClearResult(quer)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      sumx
    </input>
  </session>>

  You see that going over a table line by line is slow. We can do in in
  batches of 100:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      quer=dbSendQuery(con,"select X from bigtab")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      sumx=0
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      while(!dbHasCompleted(quer) ) {\ 

      \ \ dat=fetch(quer,n=100)

      \ \ if( !dbHasCompleted(quer) ){

      \ \ sumx = sumx+sum(dat$X)

      \ }

      }
    </input>

    <\input|<with|color|red|+ + + + + <with|color|black|>>>
      \;
    </input>

    <\input|<with|color|red|\<gtr\> \<gtr\> <with|color|black|>>>
      dbClearResult(quer)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      sumx
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  The command <verbatim|dbClearResult()> is very important. Without it, we
  can not submit another query. Let us see what can go wrong:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      quer=dbSendQuery(con,"select X from bigtab")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      quer=3
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      quer=dbSendQuery(con,"select X from bigtab")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  So, we would like to submit another query, but we can not, and we don't
  have quer any more to call dbClearresult with it.

  There is the following convenient command:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbListResults(con)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      quer=dbListResults(con)[[1]]
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbClearResult(quer)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      quer=dbSendQuery(con,"select X from bigtab")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  Now all went well.

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbClearResult(quer)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbListTables(con)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbRemoveTable(con,"hamster")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbRemoveTable(con,"bigtab")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbListTables(con)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  That was nice! Now let us do it all again in PostgreSQL:

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      library(RdbiPgSQL)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      con2=dbConnect( PgSQL(), host="biodb02", dbname="testdb",user="guest")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbListTables(con2)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbWriteTable(con2,name="hamster",data=ToothGrowth)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbListTables(con2)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a=dbReadTable(con2,"hamster")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      a
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbRemoveTable(con2,"hamster")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  Sadly, there is no way to delete a table....

  \;

  <subsubsection|Queries>

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbGetQuery(con2,"select * from hamster where dose \<gtr\>= 1.0")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  We can seperate the query from the fetching of the result, but there is no
  way to slowly go over the results.

  <with|prog-language|r|prog-session|default|<\session>
    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      class(con2)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      quer=dbSendQuery(con2,"select * from hamster")
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbGetResult(quer)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbClearResult(quer)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      dbDisconnect(con2)
    </input>

    <\input|<with|color|red|\<gtr\> <with|color|black|>>>
      \;
    </input>
  </session>>

  \;
</body>

<\initial>
  <\collection>
    <associate|language|english>
    <associate|page-bot|1in>
    <associate|page-even|1in>
    <associate|page-odd|1in>
    <associate|page-right|1in>
    <associate|page-top|1in>
    <associate|page-type|letter>
    <associate|par-width|6.5in>
  </collection>
</initial>

<\references>
  <\collection>
    <associate|auto-1|<tuple|1|?>>
    <associate|auto-2|<tuple|1.1|?>>
    <associate|auto-3|<tuple|1.1.1|?>>
    <associate|auto-4|<tuple|1.2|?>>
    <associate|auto-5|<tuple|1.2.1|?>>
    <associate|auto-6|<tuple|1.3|?>>
    <associate|auto-7|<tuple|1.4|?>>
    <associate|auto-8|<tuple|1.4.1|?>>
    <associate|toc-1|<tuple|<uninit>|?>>
    <associate|toc-2|<tuple|<uninit>|?>>
    <associate|toc-3|<tuple|<uninit>|?>>
    <associate|toc-4|<tuple|<uninit>|?>>
    <associate|toc-5|<tuple|<uninit>|?>>
    <associate|toc-6|<tuple|<uninit>|?>>
    <associate|toc-7|<tuple|<uninit>|?>>
    <associate|toc-8|<tuple|<uninit>|?>>
  </collection>
</references>

<\auxiliary>
  <\collection>
    <\associate|toc>
      <with|par-left|<quote|1.5fn>|Permutation tests
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-1>>

      <vspace*|1fn><with|font-series|<quote|bold>|math-font-series|<quote|bold>|Lecture
      8 - access to databases> <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-2><vspace|0.5fn>

      <with|par-left|<quote|1.5fn>|Interfacing with an Oracle database
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-3>>

      <with|par-left|<quote|3fn>|Connecting to the database
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-4>>

      <with|par-left|<quote|1.5fn>|Simple queries
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-5>>

      <with|par-left|<quote|3fn>| <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-6>>

      <with|par-left|<quote|1.5fn>|Adding data to a table
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-7>>

      <with|par-left|<quote|1.5fn>|Going slowly through a query
      <datoms|<macro|x|<repeat|<arg|x>|<with|font-series|medium|<with|font-size|1|<space|0.2fn>.<space|0.2fn>>>>>|<htab|5mm>>
      <no-break><pageref|auto-8>>
    </associate>
  </collection>
</auxiliary>
