56:219:521 DATA VISUALIZATION (dat sci)
56:824:728 DATA VISUALIZATION (soc sci/pub pol)
56:834:653 DATA VISUALIZATION (soc sci/pub pol)
https://theaok.github.io/vis most current syllabus (class materials updated continuously)
rucvis@googlegroups.com listserv (everyone in class gets these
emails, use often!) [if you didn't get welcome email, email me!: critical you're on the list!]
Spring 2025; Thu 6.00-8.50pm BSB-134
- instructor: Adam Okulicz-Kozaryn adam.okulicz.kozaryn@gmail.com
- office: 321 Cooper St, 1st fl lab in the back; office hours: Thu 1-2, and by appointment
- or just stop by: this semester I am in most of Tue and Thu in the afternoons
prerequisites
No prerequisites, but ability to learn programming is necessary. You need to be comfortable using a computer. Knowledge of Python and/or computer science/programming/scripting is helpful but not necessary. We will cover the basics.
social science/humanities students:
This class is mostly coding/programming/scripting.
If you do not like programming, this class is not for you. But you may not yet know
whether you like it and you may start liking it in this class: it often happened before!
Warning for people new to coding: dont get behind!
course description
It is an
interdisciplinary applied data science class focused on visualization, an
integral part of data science. We will also cover online visualization
from within Python (glue lang). Visualization is perhaps the most rewarding part
of data science as it produces insight, "aha moments." It is also perhaps the only part of
data science that involves art: designing graphics.
Some data management will also be covered as
necessary to process data for visualization. We will mostly use Pandas
and Matplotlib (and others building on it).
Course is relevant for natural and social
science, and quantitative/digital humanities.
learning objectives/outcomes
data visualization/story telling using graphics (most of the class)
about data (sources, best practices, tips and tricks): this class
is all about data (you will use the data you chose
that will serve you well beyond this class!!)
the basics of the computer programming (Python)
The key is the mastery of "data story-telling:" 1) What data are
telling, 2) what I want to say, and 3) what audience needs to know
required textbooks and materials
No required textbooks. All required materials (code, readings) will be provided.
recommended course materials
galleries [Py]
general
comprehensive https://www.python-graph-gallery.com/
matplotlib (if you want to really customize it, most
powerful/versatile way to do graphs but sometimes complicated code)
https://matplotlib.org/stable/gallery/index.html
and see notebook sec 'basics / setup with matplotlib'
pandas (very easy syntax, we use pandas for data managment anyway; full
documentation, rather dry and boring)
https://pandas.pydata.org/docs/user_guide/visualization.html
basic plot function http://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.plot.html
can also try panda's rplot, especially trellis:
http://pandas.pydata.org/pandas-docs/version/0.14.1/rplot.html
and
http://pandasplotting.blogspot.com
others
seaborn (easy, fast, pretty!) https://seaborn.pydata.org/examples/index.html
plotly (interactive: pan, hoover, pop up) https://plotly.com/python
galleries [concept/theory]
https://chart.guide
https://datavizproject.com
https://www.smashingmagazine.com/2023/01/guide-getting-data-visualization-right/
An useful intro read, i like the breakdown: Comparison, Composition,
Distribution, and Relationship; also by lev of measurment. And see
bunch of useful links at the bottom. Quite comprehensive/exhaustive,
almost like python-graph-gallery.com, but in general don't really need all of
that, say you get 90\% functionality with 10\% of charts. In
general, there's much fanciness/novelty seekeing resulting in
proliferation of vis, with little value added and increasing
probability of getting overwhelmed.
more like a blog but lots of great stuff https://flowingdata.com
online books/tutorials (traditional, lengthy, overly detailed, but
great if you want a textbook/full elaboration)
looks great https://realpython.com/tutorials/data-viz/
creator of Pandas, uptodate https://wesmckinney.com/book/, incl notebooks: https://github.com/wesm/pydata-book
maybe especially ch3 and ch4 https://jakevdp.github.io/PythonDataScienceHandbook/
see dat sci, gis, etc a-gallery-of-interesting-jupyter-notebooks
software
[right before the break so can troubleshoot during the break]
python
We will use Python 3x (>=3.10).
It is free for Linux, Chromebook, Mac, and Win; say can get
anaconda or from python.org:
https://www.anaconda.com/products/distribution
or
https://www.python.org/downloads
and can run it in RStudio or Stata-like environment with http://spyder-ide.org
BUT no need to download or install any software: we will run
Python online in webbrowser in the cloud, so called "Colab" (2
sections down). But first lets get GitHub running.
GitHub
We will use GitHub to store the Python code in form of a notebook, we
will edit the notebook in colab (next sec).
sign up or login at github.com
(depending on os, browser) on top left hit "New" or "Create
Rpository" or top right under plus "+" select "New repository"
pick some repository name, say "vis"
; keep
selected 'Public'; important!: under "Initialize this repository
with" check "Add a README file"; and hit at the bottom "Create repository"
then hit "Settings" towards the middle-top right; on the left select
"Collaborators" tab and hit "Add people" : "theaok", and hit "Add theaok to this repository"
workflow: my comments, diffs, inline response [lets go over this next week again]
i will run it in my Colab, edit, and upload back
diff and response to my comments: actually cleaner and better in
colab: File-Revision history; or clunky in GitHub:
can click my commit message and see the so called
diff--the difference between your version and my version: important!
do make sure to fix it up for next ps, you may even have inline
response to my comments in your next ps (especially if sth complex
or if you disagree)
you can
dont forget about a meaningful commit message--can keep on
uploading newer versions as many times as you like
note: when you click the file, you can then click 'History' and
see how the file evolved over time :)
a thought about file naming: ps1.ipynb, ps2.ipynb, etc, or
sections in one file; or just one file and keep it updating throught with new stuff as we go!
colab
You can just run Py notebook in Colab and save subsequent versions in
Github that will keep track of changes [stick with this for the ps]
go
to https://github.com/theaok/vis/blob/main/all.ipynb
and hit 'open in colab'
OR go
to https://colab.research.google.com
and on popup pick GitHub, search for:
https://github.com/theaok/vis/blob/main/all.ipynb
(it should find it and click "all.ipynb", and it should load it into colab, and
follow instructions at the top of the file, ie save it in your
GitHub etc)
and best class vis:
https://github.com/theaok/vis/blob/main/bestStudentVis.ipynb
https://github.com/ewattudo/vis1
data
The class is a bit like an independent study: you will carry out some
research (by doing visualizations).
You need your own data for this class ASAP, the more data and the more
complex, the better. Software will need to load the data straight up
from online! Some data are easily downloadable from online
eg https://gss.norc.org/get-the-data/stata,
but many are not. Then you have to put data online yourself [just go
over Git<25mb]:
https://theaok.github.io/generic/howToPutDataOnline.html
icpsr: biggest repository of survey data; check out also var search
google is great for data search; and it has data search, too
google cloud/big query has data ,too
kdnuggets listing of sources, a lot!; kdnuggets is great in general for data science
another kdnuggets listing; maybe actually better start here, easier to wrap your head around
kaggle
NOAA
NASA
datsets on GitHub
pew
advice/requirements and grading
2 keys to success: start early AND ask often many questions; (and study groups: get couple people on zoom, screenshare notebooks, etc) This is a
software class. It is different from typical soc sci classes! You will get
stuck often and whenever stuck, email listserv, ask me, ask your
classmates, as opposed to pulling
your hair out! And stop by my office, too. Googling (and built-in Gemini) solves most
problems but for many things its better to talk to me and your
classmates; also more social/human, if you talk to computer all the
time, its not healthy.
There are several problem sets (ps) due the following week or
typically in
2 weeks after being
posted (as indicated in ps). You will be asked to write some computer code that
does something that we covered in the class to your data. You may
work in groups (<=2), but say who you worked with,
and the more people in the group, the better/longer the code must be.
Final project (ps5) is like final paper (doing some useful empirical
quantitative research), except that I only grade code, in fact you can submit
code only.
100% (5ps x 20%) problem sets [just Py notebook], may cowrite code (upto 2 people) but then
the project should be 2 times better than a single-authored one
bonus/extra upto 5% engagement, class participation
eg answering/asking questions, helping others, listserv
discussions
bonus/extra upto 5% civic engagement (see bottom of the syllabus)
calendar
[*] = bonus (extra/not required)
ps0.pdf
see some vids, can see screen with good resolution for coding steps:)
intro.pdf
https://github.com/theaok/vis/blob/main/all.ipynb
[*] if time: final_project.pdf: just skim through TOC
[*] Data revolution! economist data data everywhere
data management
ps1.pdf
data.pdf
continue with notebook
revisit what we did so far, esp difficult topics like merge:
run sec 'merge' again
make it interactive: q and a, work on ps1, wrap up dat man
do dive into vis: discuss what folks did so far in their ps in
terms of vis, what worked and what didnt; get going with next weeks
class, at least vis tables and mpl setup first main cell
VIS
feb20 dive into vis: notebook vidSp23
ps2.pdf
go over ps1 comments from listserv; and diff: https://colab.research.google.com/github/soymlk94/datavis_sp24/blob/main/Copy_of_ps1.ipynb
a point about merging and in general data management/processing:
we do not have time to be thorough, this is vis
class and we have to move on! again, simplify, have fewer obs,
subset, do only easier part [btw i teach dat man class]
notebook
feb27 vis in notebook
vid
vidSp23
go over magics and themes/styles and mpl setup again
flip the class work on ps2; present https://github.com/erikaguiracocha/Data-Visualization-2025/blob/main/PS2erikaguiracocha.ipynb
notebook
wrap up mpl: revisit key stuff, q and a (mostly done, then really focus on your
projects/vis: presentations/discussions)
vis by others: examples
theory.pdf
pull up some of your vis / flip the class work on ps2
ps3.pdf
present ps2 (just focus on key/best vis, typically 3-5 graphs) 10min sharp (i will cut you off) and 10min discussion
look into your github repo for my comments (and
explore your own progress): can diff in github but clunky, better in colab! File-Revision history;
lets do like 3 examples incl some of your repos
mar20 sp break no class
mar27 ps3 presentations 10min (sharp!) + 10min discussion
vid
ps4.pdf
ps BONUS! [upto 5pts extCre] present someones else vis, incl
python code (not too difficult!) (email me your ipynb to get ok and
schedule date; and then email ipynb to rucvis@googlegroups.com ahead
of presentation): can do it next week or later; again email me first
about it
your vis projects (and advanced vis)
we do slow down and focus on your vis projects, flip the class,
present work in progress, etc
we will also cover few more advanced vis topics (bonus/not required):
interactive vis, maps, etc
apr3 advanced mpl and interactive/plotly/d3
vid
vidSp23
clustering and advanced mpl
https://github.com/theaok/vis/blob/main/plotly.ipynb
if time: flip class and work on ps4; revisit theory
apr10 ps4 presentations
vid
vidSp23
ps5.pdf
time: 9, discussions: 9
apr17 maps
vid
lets quicky go over ps5.pdf again
10min Erika extra credit presentation
https://github.com/theaok/vis/blob/main/map.ipynb
let me go over Erika and Sai ps5 draft, plus general comments for
everyone:
https://colab.research.google.com/github/erikaguiracocha/Capstone-Project/blob/main/Ps5erikaguiracocga.ipynb
and https://github.com/SaiAnirudh659/Vis/blob/main/ps5/SaiAnirudh_PS5.ipynb
10min Shirley extra credit presentation
finish last weeks class and revisit, q and a
ad http:theaok.github.io/swb
final_project.pdf: just skim through TOC
check out my
working paper and
vis notebook:
this is important! vis in real world! (0) start with theory/lit/idea; (1) always necessary to manipulate the data for the right vis!; (2) takes a bunch of vis to
find the right one; and while trying to find the best way to tell
the story, let the data speak, dont force it!
also see
https://link.springer.com/article/10.1007/s11482-019-09719-y create
var that is ratio in 1st vis 2nd panel; different levels of
measurement for robustness: country, region, state, county;
cross-section and time series
theory.pdf quickly revisit secs
revisit the class material, q and a: wrap.pdf
if time: ols.ipynb
flip the class and work on ps5
time: 9, discussions: 9
see canvass for your predicted course grade so far
just to be safe, may delete the data you have posted online, you never know: someone may be picky about it
rules
do not share or link to class videos!
These videocasts and podcasts are the exclusive copyrighted property of Rutgers University and the Professor teaching the course. Rutgers University and the Professor grant you a license only to replay them for your own personal use during the course. Sharing them with others (including other students), reproducing, distributing, or posting any part of them elsewhere -- including but not limited to any internet site -- will be treated as a copyright violation and an offense against the honesty provisions of the Code of Student Conduct. Furthermore, for Law Students, this will be reported by the Law School to the licensing authorities in any jurisdiction in which you may apply to the bar.
attendance
Attendance is recommended. Be advised that you are
responsible for any material covered in the class, whether or not it was in the readings or
lecture notes. You are also responsible for any announcements made in class. For most
students, attendance is simply essential to learning the material. If you do need to miss a
class, be sure to consult with a fellow student to learn what transpired.
incompletes: Generally speaking, the material in this course is best learned as a single unit. I
will grant incompletes only in cases where a substantial change in life circumstances occurs that
is beyond the control of the student, and only with appropriate
documentation.
study groups. You are encouraged to form a regular study group. Many students over the years
have found the study groups to be very helpful. Study groups are permitted and encouraged to
work on the problem sets together. However, each individual student should write up his or her
own answer to hand in, based on his or her own understanding of the material. Do not hand in a
copy of another person’s problem set, even a member of your own group. Writing up your own
answer helps you to internalize the group discussions and is a crucial step in the learning process.
academic integrity. I am very serious about this. Make no
mistake--I may appear accommodating and informal--but I am extremely
strict about academic integrity. Violations of academic integrity include cheating on tests or handing in
assignments that do not reflect your own work and/or the work of a study group in which you
actively participated. Handing in your own work that was performed not
for this class (e.g. other class, any other project) is cheating,
too. I have a policy of zero tolerance for cheating. Violations will be referred
to the appropriate university authorities.
For more information see http://fas.camden.rutgers.edu/student-experience/academic-integrity-policy
accommodating students with disabilities.
Any student with a disability affecting performance in the class
should contact the disability office ASAP: https://success.camden.rutgers.edu/success-services/disability-services/