Welcome to data science lab!!

The lab is meant for scientific computing (computational science), and especially collaboration in data science (http://science.sciencemag.org/content/323/5915/721). It is also meant for workshops and small classes (eight terminals plus one for instructor).
As of 2020, these are some of the fastest, latest and greatest computers (short of parallel computing) with 2K monitors for productive scientific computing ( http://journals.plos.org/plosbiology/article?id=10.1371/journal.pbio.1001745), which in general means producing reproducible research by writing computer code and avoiding pointing-and-clicking.


Notably, the lab is open-source: it uses free community-developed software (https://www.linuxfoundation.org, https://www.fsf.org).

location/access

We're in the back on the first floor of 321 Cooper St. You'll need the key--ask any of the team members for help accessing the lab.

team

The lab is a community effort of the following team:
  • lab director / faculty lead: Adam Okulicz-Kozaryn adam.okulicz.kozaryn@gmail.com
  • student volunteers:
  • howto use it? (interface)

    Left-click on the "start" button at bottom-left or right-click anywhere on desktop and pick the program (note! can just hit on keyboard first underlined letter to run the program) . Or just hit F12 and type name of program to run: available software is listed below. To toggle between windows press Alt-Tab. To mimic two monitors, there are two desktops. Desktop 1 (default): hit F1, and desktop 2: hit F2.

    The main computer (named "father") with terminal at the head of the table is connected to the TV. You can also connect your laptop to TV using extra HDMI cable. And there is a remote to turn the TV on.

    teaching in the lab

    All displays can be shown and controlled on the main display (at the head of the table): run as root /home/lab/displays.sh and don't forget to kill it after instruction with /home/lab/displaysKill.sh

    software

    generic
  • run (same as F12)
  • terminal (bash shell: run there linux tools such as sed, awk, and pipe etc)
  • help (this page)
  • data science
  • python-spyder
  • rstudio
  • qgis
  • stata
  • utilities
  • file manager (to browse files like on win or mac)
  • firefox
  • latex-gummi
  • meld
  • text editor-leafpad
  • ooffice
  • pdf-editor
  • monitors

    Note that the 4 terminals in the back have wide monitors and are best suited for data science: can have easily code and interpreter side by side (say dofile editor and stata). And don't forget about two virtual screens using F1 and F2.

    saving/disks

    Write temporary work in /tmp/, it may be deleted after your session; write permanent work into /mnt/commons. Best create there a directory under your name so others know it's you; It will stay, and will be backed up and is accessible for everyone for easy collaboration, hence 'commons'. However, everyone has write access there and it is based on common courtesy that no one alters your work; if you'd like private space where only you can write, email our team member and we will give you a user name and password and assign private space on hard drive.

    printing

    Printer is located to the left of the lab, next to the restroom. You can print from oofice, pdf-viewer or data scinece software.

    data science library

    To rent books ask lab director, Adam.
    O'Reilly books
  • 2013data_science_salary_surv
  • Beautiful_Code
  • building_machine_learning_with_py
  • Code_Simplicity
  • ColumbiaUniversityAppliedDataScience
  • Data_Analysis_with_Open_Source_Tools
  • Data_Mining_in_Social_Science
  • data_science_for_business
  • data_scienceFromScratchJoelGrus
  • data-science-manual-skiena
  • data_source-handbook
  • doing_data_science
  • flowingdata
  • functionalProgrammingPY
  • going-pro-in-data-science
  • hadoopWithPython
  • introduction_to_Python_for_Econometrics,_Statistics_and_Data_Analysis
  • intro_to_mach_Learn
  • mining_the_soc_web_2ed
  • Natural_Language_Processing_with_Python
  • Practical_Machine_Learning_Anomaly_Detection
  • principles_of_big_data
  • programmingCollectiveIntelligenceAug2007
  • Programming_Python_Fourth_Edition
  • Python-Data-Science-Handbook-Jake-VanderPlas
  • python-for-data-analysis-pandas-wes-mckinney
  • python_for_econometrics
  • The_Art_Of_Readable_Code
  • think_bayes
  • think_complexity
  • thinkingWithData
  • think_python
  • think_stats
  • Ed Tufte books, to be added
    Others, to be added

    remote access and file transfer

    You can access the lab remotely via ssh from RU network, email team member for user name and passwd and go to:
    userName@father.us.to
    Likewise, you can transfer your files remotely via scp: https://www.linux.com/learn/intro-to-linux/2017/2/how-securely-transfer-files-between-servers-scp. If you're using mac or windows, just use scp client, if you don't have one just google "scp windows client" or "scp mac client". Or just use google drive, dropbox.com or a similar service. To connect your flash drive/portable hard drive, ask team member for assistance.

    jupyter

    You can run Python, R, or Stata in Jupyter. Browse to to http://father.us.to:9999.
    [If off-campus, install VPN and connect VPN first: https://wiki.td.rutgers.edu/wiki/Category:VPN]
    Then enter Jupyter password provided, then in top-right click 'new' and start say Stata (unless I have already started a notebook for you), then type in your code, say
    sysuse auto
    scatter mpg price
    reg mpg price
    and hit Run button at the top :)

    Note that you can also upload files from the initial Jupter screen (just make sure you hit upload button).

    rstudio

    rstudio is at http://father.us.to:8787/.
    [If off-campus, install VPN and connect VPN first: https://wiki.td.rutgers.edu/wiki/Category:VPN]