note! if you have anything sensitive/private like address, name, dob, ssn, etc: do not upload anywhere! First, generate some new id, like 1,2,3 and keep the full dataset secure in your own place, then deidentify: drop all sensitive info and upload this deidentified file (later if needed can merge it back to your initial file on id)
data are public on github and shared goog drive; if you absolutely
cannot share your data even after deidentifying cases by dropping
sensitive info (say super private data from your company), then as a
last resort can just email it to me and note in the notebook that you
only shared private data with me
note: if dataset is bigger than 25mb, just take a random sample, say
in stata "sample 10" to get 10perc random sample; also: zipping
reduces file about 3x (and stata can unzip, just google 'Stata
unzipfile'; of course can do the same in Python
we may practice by putting online this file:
https://drive.google.com/uc?id=1YH8DfzsQ8suZkVQBk7T9FTKvvm9Vyej8&export=download
google drive (upto 25mb)
go to https://drive.google.com
first upload the file, then right-click on it and select
"Share...",
under General access change from "Restricted" to "Anyone with the
link", and hit "Copy link"
paste link into text editor; it should look like:
https://drive.google.com/file/d/1F4ZfRhKzJAlQKGRDCTEZBGuWjti6JcRd/view?usp=sharing
and then copy the FILE_ID from it, ie everything that is between ``/d/'' and "/view"
and then paste that FILE_ID into:
https://docs.google.com/uc?id=FILE_ID&export=download
so it would become:
https://docs.google.com/uc?id=1F4ZfRhKzJAlQKGRDCTEZBGuWjti6JcRd&export=download
github (<25mb) EASY
can also upload files of upto 25mb (maybe even 50) to GitHub
under repo hit "Add file" and "Upload new files"
then about middle-right hit "Raw"
so the link is "https://raw.githubusercontent.com/USER/REPO/main/FILE" eg:
"https://raw.githubusercontent.com/blup321/vis/main/a.csv"
and remember to load the 'raw' file!, eg:
insheet using https://github.com/sdegiorgis/test/raw/master/PhillyParcelsSubsample.csv
the following might be different/out of date; i did it few years back
github (<100mb) NOT easy
can upload data files over 25mb (but not larger than 100mb) to GitHub if you upload them
through the command in your computers command prompt (for my Mac its called terminal). It took a million articles and tutorials
but I finally got my biggest data set to upload this way! If it would be helpful, I could take the time to type up how I did
this for my classmates/future students to use to upload bigger data sets to GitHub. One of the key articles I used is here:
https://help.github.com/en/github/managing-files-in-a-repository/adding-a-file-to-a-repository-using-the-command-line