fa25 it's one class for both 56:219:523 data science and
56:834:608/56:824:725 public admin/affairs; public admin/affairs
warning: if you cant learn simple coding in Python, this class is
not for you
56:219:523 GEOGRAPHIC INFORMATION SYSTEMS FOR DATA SCIENCE
https://theaok.github.io/gisPy most current syllabus (class materials
edited continuously)
rugispy@googlegroups.com listserv (everyone in class gets these
emails, use often!) [email me if you didn't get welcome email or can't
email listserv (i may need to add your alternative email! only added
roster's email)]
Fa 2025; Thu 6-8.50pm, ATG-101
- instructor: Adam Okulicz-Kozaryn adam.okulicz.kozaryn@gmail.com
- office: 321 Cooper St, lab in the back of the 1st fl; office
hours: Tue, Thu 1-2, and by appointment; or just stop by: this semester I am in most of Tue and Thu
assistant:
Wei Chen wc642@camden.rutgers.edu
office: 321 Cooper St, Room 202 (2nd Floor); office hours: Thu 2-3, and by appointment
prerequisites
You need to be comfortable using a computer, and able to write simple
code in Python or willing to learn it. Most of the class material is
simple coding in Python. There is no
prerequsite to know Python, you can learn it in the class.
course description
Introductory + applied:
produce maps, put interesting info
on them.
GIS is useful in all fields that have any geographic/location info.
course objectives
- manipulate data: import/export, subset, merge/join, dissolve, produce centroids, etc
- visualize data: make maps
required books
none
software
We will run
Python online in webbrowser in the cloud, so called "Colab" (2
sections down). But first lets get GitHub running.
GitHub
We will use GitHub to store the Python code in form of a notebook, and
we will edit (and run) the notebook in colab (next sec).
sign up or login at github.com
(depending on os, browser) on top left hit "New" or "Create
Repository" or top right under plus "+" select "New repository"
pick some repository name, say "gis"
; keep
selected 'Public'; important!: under "Initialize this repository
with" check "Add a README file"; and hit at the bottom "Create repository"
then hit "Settings" towards the middle-top right; on the left select
"Collaborators" tab and hit "Add people" : "theaok", and hit "Add theaok to this repository"
workflow: my comments, diffs, inline response [lets go over this next week again]
i will run it in my Colab, edit, and upload back
diff and response to my comments: actually cleaner and better in
colab: File-Revision history; or clunky in GitHub:
can click my commit message and see the so called
diff--the difference between your version and my version: important!
do make sure to fix it up for next ps, you may even have inline
response to my comments in your next ps (especially if sth complex
or if you disagree)
dont forget about a meaningful commit message--can keep on
uploading newer versions as many times as you like
note: when you click the file, you can then click 'History' and
see how the file evolved over time :)
file naming: ps1.ipynb, ps2.ipynb, etc, or
ps1, ps2, etc sections in one file; or just one file and keep it updating throught with new stuff as we go
[*] bonus/extra: general references on how to get started using Git fully,
http://www.sitepoint.com/git-for-beginners/
http://rogerdudler.github.io/git-guide/
Colab
Just run Py notebook in Colab and save subsequent versions in
Github that will keep track of changes [stick with this for the ps]
go
to https://github.com/theaok/gisPy/blob/main/map.ipynb
and hit 'open in colab'
OR go
to https://colab.research.google.com
and on popup pick GitHub, search for:
https://github.com/theaok/gisPy/blob/main/map.ipynb
(it should find it, and load it into colab, and
follow instructions at the top of the file, ie save it in your
GitHub etc)
data
The class is a bit like an independent study: you will carry out some
research (by making maps).
You need your own data for this class ASAP, the more data and the more
complex, the better. Software will need to load the data straight up
from online! Some data are easily downloadable from online
eg https://gss.norc.org/get-the-data/stata,
but many are not. Then you have to put data online yourself [just go
over Git<25mb]:
https://theaok.github.io/generic/howToPutDataOnline.html
google is great for data search; and it has data search, too
google cloud/big query has data ,too
kdnuggets listing of sources, a lot!; kdnuggets is great in general for data science
another kdnuggets listing; maybe actually better start here, easier to wrap your head around
kaggle
NOAA
NASA
datsets on GitHub
datahub
humandata
academictorrents
pew
data.gov
fun/inspiration
https://github.com/OluPaul22/gis/blob/main/PS_2.ipynb
https://github.com/theaok/gisPy/blob/main/examples.ipynb
https://github.com/theaok/gisPy/blob/main/fa23_Dhairya890_ps3.ipynb
https://colab.research.google.com/github/BhavyaniD/GIS/blob/main/GIS_PS1.ipynb
pub
pol/adm maps wiki, search for say 'shooting' or 'atlantic' [on-campus or vpn]
https://sites.google.com/davidsouthgate.com/poncegis/maps
advice/requirements and grading
2 keys to success: start early AND ask often many questions; (and study groups: get couple people on zoom, screenshare notebooks, etc). This is a
software class, different from non-software classes. You will get
stuck often and whenever stuck, email listserv, ask me, ask your
classmates, as opposed to pulling
your hair out! And stop by my office, too. Google (eg stackoverflow)
and AI (eg Colabs Gemini) solves most
problems but for many things its better to talk to me and your
classmates; also more social/human, if you talk to computer all the
time, its not healthy.
Problem sets (ps): You will write computer code that
does something that we covered in the class to your data. You may
work in groups (<=3), but say who you worked with,
and the more people in the group, the better/longer the code must be.
grading (strict and harsh!) [incompletes only if documented emergency (eg hospitalization)]
- 5x20 problem sets (incl presentation/s) (can work in groups
of <=3, but then needs to be this many times better!) [just Pyhon code (notebook)]
- upto 10pts or 10% of final grade for engagement! [extCre] upto5: class participation (answering/asking questions, helping others, etc) and listserv discussions, and upto5: civic engagement (see bottom of the syllabus)
academic calendar
tentative, most uptodate always online, I work on class
materials continuously and theyll be changing slightly
print several slides on one sheet, say 6
or just annotate electronic pdf
dive into GIS: geopandas (sister of pandas): your best gis lib in py
sep5 intro: fire up Python, load gis data, and produce our first map
sep5vid
sep12 data: join/merge
sep12vid
sep19 pretty maps
sep19vid
- finish up with merge/join from ipynb: census data
- pretty.pdf how to produce a pretty map
- final_project.pdf: just skim through TOC
- [*] early/volunteer student presentations of maps from ps1
- flip the class: (Q and A; I walk around and sit with you; otherwhise Id be looking at your githubs, and then approach you with ideas)
sep26 ps1 presentations
sep26vid
5min sharp: i will cut you off! + 10min discussion
oct3 more advanced thematic mapping: geopandas bells and whistles and geoprocessing
vid
oct10 wrapping up basics
vid
oct17 ps2 presentations
vid; Passcode: 90F#%6hd
- ps3.pdf
- 5min sharp: i will cut you off! (max 5 maps!!!) + 10min discussion
interactive maps (zoom, move, popup): folium
note: what we cover from now on is not nesessary for ps, but it does
help, pick from the following what is useful and helpful for your research
oct24 go over ps2 comments from listserv; dive into folium
vid
oct31 folium
vid
nov7 ps3 presentations: 7min! + 12min discussion
vid
- fa24 revisit tooltip/popup
- finish up/revisit folium
nov14 continue with ps3 presentations; Q&A, flip the class, work on ps4
vid
- ps5.pdf
- Q&A, wrap-up; listserv ps3 comments: https://groups.google.com/g/rugispy/c/6hJNYL8AF7Y
- do flip the class 45min at least, work on ps4
for ps4 take into account:
- pretty.pdf
- final_project.pdf
- theory.pdf esp: know your data (des sta) and simplicity (cut back, trim, fewer lines of code, simpler maps! (like initial ones with just couple lines of code))
spatial auto-correlation: pysal
nov21 pysal
vid
nov26 tue continue pysal; class
wrapup; 3-4 ps4/final presentations (8min! + 12min discussion)
vid
- again see nov14 class: "for ps4 take into account"
- i added spiffy plot_local_autocorrelation in geoda.ipynb
- mapclassify under the hood
revisit gpd:
- I/O: gpd would take just about any geo data just as shp: dont
shy awyay from json, kml, gdb, etc
- subset shapefile: may not get data from paraguay govt, but from
who, wb, imf, then just subset; if camden county too busy in North
East, just retain North East, ditch the less populated South West
- edgecolors are pretty cool
- do annotate to tell the story
- other vis than gis/maps: sns.heatmap nobody did it
and for a story you need couple vars, see how they corr!;
sns.joinplot need hist anyway and scatterplot more info than corr;
px.scatter awesome to id interesting points and outliers; px.treemap
inherently geo
- raster: eg ru-camden evolution since 1930; paraguay agriculture
dec5 remaining ps4/final presentations (8min! + 12min discussion)
- try to repeat https://github.com/sg2083/GIS_repo as didnt have time to comment last week
- wrap.pdf
[i fork couple best repos as example for future classes]
rules
24fa NASPAA competency: To analyze, synthesize, think critically, solve problems and make decisions
attendance:
strongly recommended, you're
responsible for everything covered, incl discussions and announcements. If you miss a
class, consult with a fellow student and/or watch video.
academic integrity. I am very serious about this. Make no
mistake--I may appear accommodating and informal--but I am extremely
strict about academic integrity. Violations of academic integrity include cheating on tests or handing in
assignments that do not reflect your own work and/or the work of a study group in which you
actively participated. Handing in your own work that was performed not
for this class (e.g. other class, any other project) is cheating,
too. I have a policy of zero tolerance for cheating. Violations will be referred
to the appropriate university authorities.
For more information see http://fas.camden.rutgers.edu/student-experience/academic-integrity-policy
accommodating students with disabilities.
Any student with a disability affecting performance in the class
should contact the disability office ASAP
do not share or link to class videos!
These videocasts and podcasts are the exclusive copyrighted property of Rutgers University and the Professor teaching the course. Rutgers University and the Professor grant you a license only to replay them for your own personal use during the course. Sharing them with others (including other students), reproducing, distributing, or posting any part of them elsewhere -- including but not limited to any internet site -- will be treated as a copyright violation and an offense against the honesty provisions of the Code of Student Conduct. Furthermore, for Law Students, this will be reported by the Law School to the licensing authorities in any jurisdiction in which you may apply to the bar.
civic engagement component (opportunity for extra credit!)
Start early. Start thinking about how you want to engage civically
today.
typical civic engagement
Universities and social science should serve society.
You are encouraged have to engage with local community.
The idea is that you engage civically using research methods. There are several
ways to do it. Ideally, you will partner with a local organization,
obtain data from them, do some analysis, and present results to them. You may also use government data, say from census bureau, and present relevant
information to locals. A local organization can be Rutgers research
institute such as WRI, CURE, LEAP or any other organization such as
school or soup kitchen or CamConnect. Rutgers Office of civic
engagement may be able to help
you contact them. The key idea is partnership: you will use tools
from this class to produce output useful to local community. This
is similar to taking a role of an apprentice at a local organization
or serving as a consultant.
Using
real world data poses challenges, which is a part of
exercise. Presenting your findings to stakeholders outside of a class
is also challenging. At the same time, it is fairly easy to contribute
locally by using simple tools learned in this class. For instance,
simple comparison of means between two schools in Camden can be
revealing and helpful locally.
An obvious way would be to use data at your workplace or at a
workplace of someone you know. However, you need to make sure that it
serves society in some way. For instance, it would be straightforward
if you work at a hospital or school or fire department; but it would
be difficult if you work at Starbucks.
atypical civic engagement--CONTACT ME FIRST if you consider this!
Successful completion of atypical civic engagement will take estimated at
least double of the typical civic engagement time.
You could try to engage at regional or State level-for
instance, you may evaluate some policy in NJ as compared to NY, or
produce descriptive statistics of a region that would be useful
regionally (e.g. my South Jersey WRI paper
http://dept.camden.rutgers.edu/rand-institute/files/changes-across-the-region.pdf
Such type of engagement typically requires substantial research
experience typically found at late stage of PhD program.
There may also be some other atypical ways-let me know your ideas.