56:824:719 sec11 directed (independent) study [aka "data (science) project"] [current syllabus
https://theaok.github.io/dirStu]
make no mistake, this is not walk in the park" the bar is high: to get an A it has to be "publishable" at the end of semester or at least "publishable" after 1 set of easly doable revisions as per my final comments
PhD students graded differently; bar higher: need to be research sophisticated!
Fall 2018; Thu 2.30-5.30 321 Cooper, computer lab in the back of the first fl [note, you can also use the lab outside of the class time--just stop by my office and ask me for the key; this semester i am here especially on wed and thu]
note: we may have additional labs
instructor
Adam Okulicz-Kozaryn adam.okulicz.kozaryn@gmail.com
office: 321 Cooper St, room 302; office hours: TBA, and by appointment
this semester always at school on Wed and Thu; usually whole day; stop by
assistant
Shourjya Deb shourjyadb@gmail.com [not here untill late Oct]
office: 321 Cooper St (3rd floor); office hours Wed 3-4, and by appointment
prerequisites
You need to be comfortable using a computer. Knowledge of Stata and
data-management/computer science is helpful but not necessary. We will cover the basics.
course description
Essentially, this is a simplified and applied version of a first half
of my data management
class (https://sites.google.com/site/adamokuliczkozaryn/datman)
plus (especially in the second half of the class) working on a
publishable paper (hence, the name: "directed/independent study").
We will focus on doing utilitarian things with data only (no fancy stuff
from the data management class). We will focus way more on interpretation
of results and brainstorming rather than technicalities.
parts from data management course description
Recently, there is more focus on (and even more need for)
computer science in social science. Reverse is true to some degree,
too: computer scientists are doing social science these days, e.g. http://arxiv.org/pdf/1409.8578.pdf
Unfortunately, the main focus in Social Science is still on theory and data analysis, while data management is
overlooked. Yet, data management is not only a fundamental part of social science research but also
the part that takes most of the time. This class aims at filling this gap. It is an applied class with
hands-on approach. You will see many exercises and tutorials.
This is applied research class that aims to teach computer tools for social science research that
can automate the process and increase the academic productivity. Much
of this class is writing computers programs; if you do not like
programming, this class is not for you... But you may not yet know
whether you like it and you may start liking it in this class: it
often happened before!
We will use Stata only.
The class covers the principles and practical techniques of data
cleaning, data organization, quality control, and automation of
research tasks. Topics covered include data types, labeling, recoding, data documentation, merging
datasets, reshaping, and basic programming structures such as macros and loops.
learning objectives/outcomes
learn the basics of the computer programming
learn the practice of data management (there will be some theory, too)
learn how to conduct reproducible research
learn how to automate research by programming
demonstrate mastery of the material by writing code for a project/paper using learned techniques; you may cowrite code (upto 2 people) but then the project should be 2 times better than a single-authored paper
required textbooks and materials
There are no required textbooks. All required materials (code, readings) will be provided.
recommended course materials
Most of the class is based on: Mitchell
'' Data Management Using Stata: A
Practical Handbook'' http://www.Stata.com/bookstore/dmus.html
A similar book, but with focus on organization is: Long
'' The Workflow of Data Analysis Using Stata'' http://www.Stata.com/bookstore/wdaus.html
Programming, specifically, is covered in Chris Baum "An
Introduction to Stata Programming" https://www.stata.com/bookstore/introduction-stata-programming/
If you are a beginner you may use:
'' A Gentle Introduction to Stata, 3rd Edition''
https://www.stata.com/bookstore/gentle-introduction-to-stata/
Also for beginners: "Statistics with STATA: Version 12" by L
Hamilton; good but overpriced https://www.stata.com/bookstore/statistics-with-stata/
There is actually no need to buy books; there are many excellent free on-line resources:
UCLA is the best website: https://stats.idre.ucla.edu/stata/
UCLA for data management: https://stats.idre.ucla.edu/stata/seminars/stata-data-management/
and many more links here: http://www.Stata.com/links/resources1.html
MORE RESOURCES (not all necessarily recommended but some may be useful):
some website listing resources (skip, you can read at home)
software
We will use Stata version 15 (Intercooled/IC or higher: SE or MP).
Some lucky people can download it for free
at https://software.rutgers.edu;
but, i think, you have to be RU employee
Free on apps: http://apps.rutgers.edu (not apps.camden.rutgers.edu) (somewhat clunky; good
for computing enthusiasts)
See general instructions at https://oirt.rutgers.edu/software/remotexserver/
QGIS is also at apps.rutgers. First make sure you have it
enabled: http://netid.rutgers.edu, on the left, click ``service activation'', and activate ``apps cloud service''.
Then connect to apps.rutgers. Go
to https://apps.rutgers.edu. To
copy files install http://winscp.net, run it
and connect to: Host name: "apps.rutgers.edu"; User name: "your
Rutgers NetID"; Password: "your Rutgers password"
You can buy your own Stata 15 IC/Intercooled perpetual license for $200 https://www.stata.com/order/new/edu/gradplans/campus-gradplan
NOTE: there is not much difference between versions... if you have
Stata >=12 you are fine; if you have Stata 8, it is quite old... I
update every second version; new version is out every 1-2 years...
Stata is cross-platform: linux, mac, windows
auxiliary software
GIT
when submitting ps in git just have one file that you will be
updating for each ps!! [can just call it ps.do]
set up a repo on github.com
or bitbucket.org or something else
that way we will be more productive: i will be able to give you
much more comments and suggestions and offer more help writing code
it is also much more fun! and this is how code must be written if
you are serious about it; to get started just go to one of the above
websites
let me do a quick demo in github
howto get started with GIT (very quick HOWTO on basic use of github.com):
sign up or login at github.com
may see on your right 'New repository' button: click it; or may
need to go to 'repositories', click 'new'; then pick some name for
your repository, keep
selected 'public', important!: must check 'Initialize this repository
with a README', and click button 'Create repository'
now can simply hit button 'Upload files' and choose your dofile,
important: add some meaningful commit message, say: 'first try on
importing and exporting data, submitted as ps1' and hit 'Commit
changes' button
then hit 'Settings' towards the top, and then on the left select
'Collaborators' tab and add me "theaok" and hit 'Add colaborator',that's it!
then I will download it, edit, and upload back
then you can click my commit message and see the so called
diff--the difference between your version and my version
then you can
download this latest version first, edit it, and upload it back when
done--don't forget about a meaningful commit message--can keep on
uploading newer versions as many times as you like
note: when you click the file, you can then click 'History' and
see how the file evloved over time :)
below are general references on how to get started using it fully,
probably the first two are most useful
http://www.sitepoint.com/git-for-beginners/
http://rogerdudler.github.io/git-guide/
http://stackoverflow.com/questions/315911/git-for-beginners-the-definitive-practical-guide
https://backlogtool.com/git-guide/en/intro/intro1_1.html
more about GIT
Tech Talk: Linus Torvalds on git http://www.youtube.com/watch?v=4XpnKHJAok8
a guide to git on windows http://nathanj.github.com/gitguide/tour.html
An introduction to git from The Chronicle of Higher Ed http://chronicle.com/blogs/profhacker/a-gentle-introduction-to-version-control/23064
a general paper about workflow (incl latex, git, emacs) http://www.kieranhealy.org/files/misc/workflow-apps.pdf
requirements
Strictly speaking an advice, rather than a requirement, but in
practice really a requirement, as it is virtually impossible to
succeed otherwise! Ask often many questions. This is a
software class. It is different from all other classes! You will get
stuck often and whenever stuck, email me, as opposed to pulling
your hair out! And stop by my office, too.
There are 6 problem sets (ps) due the following week afterbeing
posted (unless indicated otherwhise; some ps will be due in 2 weeks). You will be asked to write some computer code that
does something that we covered in the class to your data. You may
work in groups (<=2), but indicate who you worked with,
and the more people in the group, the better/longer the code should be.
Students will write an empirical paper/report/etc on any topic using one or more of the
techniques covered in this course. A typical paper will be 5 to 20
double spaced pages. I will give you comments and help with the
paper, and it is a good opportunity to produce a paper.I will
also grade the code that you wrote to produce the results in your paper.
You will submit not only paper, but also code that produced results in
the paper; in fact, you can just submit the code.
Ideally, the paper should be submitted to a professional journal for a publication.
grading
problem sets 60% (6ps x 10%)
empirical paper (code, too; incl presentation(cool code and some
cool output esp graphs2*5%)) 40%
min | max | grade |
90.0 | 100.0 | A |
85.0 | 89.9 | B+ |
80.0 | 84.9 | B |
75.0 | 79.9 | C+ |
70.0 | 74.9 | C |
0 | 69.9 | F |
calendar
warning! don't get behind: learning curve may be steep
tentative: the most uptodate calendar is always on the website:
url is at the of this document
(university calendar: http://scheduling.rutgers.edu/calendar.shtml)
calendar is continuously updated: see timestamps on slides, best save
or print them at the beginning of the class (i will not print
for you); almost all
changes will be minor; i will tell you if there is any bigger
change; the further the class ahead the less updated it is
when printing handouts you can print multiple slides per sheet (i
like 6) http://kb2.adobe.com/cps/332/332720.html#main_Print multiple
[*] means bonus (extra/not required)
sep13 introduction
Overview of the class material and policies. We will fire up Stata
and have a look at Stata's text editor. And go over step-by-step GIT above
https://theaok.github.io/dirStu#gitSta
ps0.pdf
over next week: very important! and if you haven't used Stata,
familiarize yourself with it: see above links, especially ucla
website; the learning curve may be steep soon!
intro.pdf intro.do
replication.pdf
if time: let's discuss your research interests and data for this
class (ps0)
[*] Data revolution! Interesting articles from the Economist: http://www.economist.com/node/15557443 and
everything in one pdf file: https://www.emc.com/collateral/analyst-reports/ar-the-economist-data-data-everywhere.pdf
[*] "The end of theory" from the wired magazine http://www.wired.com/science/discoveries/magazine/16-07/pb_theory
data management
sep20 data reading/saving (formats/conversion) and manipulationvid
We will talk about different basic data formats, conversion
between them, and how they can be imported/exported to/from Stata
ps1.pdf and ps2.pdf
readAndManipulate.pdf readAndManipulate.do
https://stats.idre.ucla.edu/stata/seminars/notes/stata-class-notesentering-data/
https://stats.idre.ucla.edu/stata/seminars/notes/stata-class-notesexploring-data
https://stats.idre.ucla.edu/stata/seminars/notes/stata-class-notesmodifying-data
[*] if time: flip the class: we will flip the last half an hour or so
[*] Mitchell ch1-5,7
[*] This is for next week really, but it would help if you start
looking at these asap, will also help with understanding better ps2: https://www.stata.com/manuals/u22.pdf
and
https://www.princeton.edu/~otorres/Merge101.pdf
[!] NOTE: around this time we need to get your project going :
you need to have your own data and be reasonably comfortable with it so that
you can be productive with it and we can work remotely on it;
typically, we'll need to meet few times around this time! there
will be assignments due and we will not slow down!
sep27 combining data vid
This class covers the key command for this class: merge.
ps3.pdf
start with a look at git repos
start with keep/drop (s36) from last class
mergeAppendReshape.pdf mergeAppendReshape.do
merge conceptual setup:
https://www.stata.com/manuals/u22.pdf
and
https://www.princeton.edu/~otorres/Merge101.pdf
merge practice: (make sure you run examples that start on p12!):
https://www.stata.com/manuals14/dmerge.pdf
a quick overview of what we have done so far and doing today http://dss.princeton.edu/training/DataPrep101.pdf
reshape (also see "help reshape", as usual...) https://stats.idre.ucla.edu/stata/modules/reshaping-data-wide-to-long/ https://stats.idre.ucla.edu/stata/modules/reshaping-data-long-to-wide/
[*] Mitchell ALL
flip the class
oct4 Slow down, make sure everybody got merge and read/manipualtion basics, discussion, brainstorm: have a look at what you have so far in github.
oct11 continue with merge: digest, practice; and: organization,
documentation vid
merge again: Start with a look at your repos, eg Rachel
We already covered key topics, so we
make sure today that everything is crystal clear. Also, labeling data, variables, and values. Keeping your data
organized.
if needed: revisit the code from past
organize.pdf organize.do
[*] https://stats.idre.ucla.edu/stata/modules/labeling-data/
[*] Scott Long "The Workflow of Data Analysis Using Stata" ALL
[*] Mitchell ch5
wrap up what we did so far, review, revisit, maybe do http://dss.princeton.edu/training/DataPrep101.pdf; new module next week!
flip the class: flip the last 45min or so; maybe: exercises from
last class; tutorials from links from last class
oct18 graphics and exporting results vid
Graphics is critical in understanding data, and understanding data is critical in data management.
Results from Stata are data too, and need to manage them too!
ps4.pdf
graphics.pdf graphics.do
exp.do and see
at
home http://dss.princeton.edu/training/Outreg2.pdf
AND great resource per regressions: just follow these examples in
your dofile! https://stats.idre.ucla.edu/stata/webbooks/reg/
oct25 presentations: go over your graph code, practice graphics; and if time: some data management theory
we will pick up with exporting results from the previous class
please make sure you added some graphs to github or jupyter, i
will add there graphs too, and we will spend most of the class
disucssing graphs for each of you
in class: let's try for your data: hist, perc; tab, plot sort; gr matrix;
scatter, mlab(UA); bar charts
theory.pdf try to do as much as we can
and need: possibly just get to key slides based on scott long in
first part as for tehory for soc sci; and quick look at IT theory
from the box in plos one article
part2: working on paper: directed study/project part
Now having covered Stata, we will focus on producing the research,
work more one-on-one and spend class time on discussions and
brainstorming; note: each class will spend most of it discussing your research, be
prepared, have each class something new; you also may want to have a brief
presentation of what you have accomplished since last week
nov1 theory; discuss final project; discuss your research vid
ps5.pdf
presentations of ps4: graphs
we will focus on research questions and hypotheses for your
projects; and data, variables and execution of testing of your
hypotheses (also see ps5)
nov8
ideally bring and present a draft of your ps5
also, while not covering new stata material, data and stata
questions and discussions are also a part of this second part of the class
nov15
ps6.pdf
do quick theory from last class
final_project.pdf (esp sec: inline response and activism v science)
ps5 presentations
continue discussing your projects: again please do take into
account discussions from last week and improve your papers
accordingly; also, as always, in this second part of the course, be
prepared to present and discuss improvements in your papers AND new
ideas/directions; also, bring any questions you may have
nov20 Tues!(Thanksgiving change of schedule)
pick up with final_project.pdf; maybe esp: lit rev
discuss our comments from last week and your responses to them
(ideally, you may bring and present a draft of your inline response
to them)
continue discussing your projects: again please do take into
account discussions from last week and improve your papers
accordingly; also, as always, in this second part of the course, be
prepared to present and discuss improvements in your papers AND new
ideas/directions; also, bring any questions you may have
nov29 presentations of ps6, and final project discussions
make sure you record all comments (verbal and written) and copy-paste (verbatim!) into
next assignent and repond to them
if time revisit final_project.pdf
dec6: last class!
final project presentations 15min max; this is really
important: i will give you a bunch of comments by email and
verbally, and so may others: please save these comments and respond
to all of them inline at the beginning of your final project!
final project discussions: focus on Rachel and Sarah
wrap.pdf
final project due on wed dec12 at 10pm
when submitting final project pdf, don't forget about the stata
code!
final_project.pdf
just to be safe, delete the data you have posted online, you never know: someone may be picky about it
rules
do not share or link to class videos!
These videocasts and podcasts are the exclusive copyrighted property of Rutgers University and the Professor teaching the course. Rutgers University and the Professor grant you a license only to replay them for your own personal use during the course. Sharing them with others (including other students), reproducing, distributing, or posting any part of them elsewhere -- including but not limited to any internet site -- will be treated as a copyright violation and an offense against the honesty provisions of the Code of Student Conduct. Furthermore, for Law Students, this will be reported by the Law School to the licensing authorities in any jurisdiction in which you may apply to the bar.
attendance
Attendance is recommended. Be advised that you are
responsible for any material covered in the class, whether or not it was in the readings or
lecture notes. You are also responsible for any announcements made in class. For most
students, attendance is simply essential to learning the material. If you do need to miss a
class, be sure to consult with a fellow student to learn what transpired.
incompletes: Generally speaking, the material in this course is best learned as a single unit. I
will grant incompletes only in cases where a substantial change in life circumstances occurs that
is beyond the control of the student, and only with appropriate
documentation.
study groups. You are encouraged to form a regular study group. Many students over the years
have found the study groups to be very helpful. Study groups are permitted and encouraged to
work on the problem sets together. However, each individual student should write up his or her
own answer to hand in, based on his or her own understanding of the material. Do not hand in a
copy of another person’s problem set, even a member of your own group. Writing up your own
answer helps you to internalize the group discussions and is a crucial step in the learning process.
Academic Integrity. I am very serious about this. Make no
mistake--I may appear accommodating and informal--but I am extremely
strict about academic integrity. Violations of academic integrity include cheating on tests or handing in
assignments that do not reflect your own work and/or the work of a study group in which you
actively participated. Handing in your own work that was performed not
for this class (e.g. other class, any other project) is cheating,
too. I have a policy of zero tolerance for cheating. Violations will be referred
to the appropriate university authorities.
For more information see http://fas.camden.rutgers.edu/student-experience/academic-integrity-policy
Accommodating Students with Disabilities.
Any student with a disability affecting performance in the class
should contact the disability office ASAP: http://learn.camden.rutgers.edu/disability/disabilities.html