note! if you have anything sensitive/private like address, name, dob, ssn, etc: do not upload anywhere! First, generate some new id, like 1,2,3 and keep the full dataset secure in your own place, then deidentify: drop all sensitive info and upload this deidentified file (later if needed can merge it back to your initial file on id)

data are public on github and shared goog drive; if you absolutely cannot share your data even after deidentifying cases by dropping sensitive info (say super private data from your company), then as a last resort can just email it to me and note in the notebook that you only shared private data with me

note: if dataset is bigger than 25mb, just take a random sample, say in stata "sample 10" to get 10perc random sample; also: zipping reduces file about 3x (and stata can unzip, just google 'Stata unzipfile'; of course can do the same in Python

we may practice by putting online this file: https://drive.google.com/uc?id=1YH8DfzsQ8suZkVQBk7T9FTKvvm9Vyej8&export=download

google drive (upto 25mb)

  • go to https://drive.google.com
  • first upload the file, then right-click on it and select "Share...", under General access change from "Restricted" to "Anyone with the link", and hit "Copy link"
  • paste link into text editor; it should look like:
    https://drive.google.com/file/d/1F4ZfRhKzJAlQKGRDCTEZBGuWjti6JcRd/view?usp=sharing and then copy the FILE_ID from it, ie everything that is between ``/d/'' and "/view"
  • and then paste that FILE_ID into:
    https://docs.google.com/uc?id=FILE_ID&export=download
  • so it would become:
    https://docs.google.com/uc?id=1F4ZfRhKzJAlQKGRDCTEZBGuWjti6JcRd&export=download
  • github (<25mb) EASY

  • can also upload files of upto 25mb (maybe even 50) to GitHub
  • under repo hit "Add file" and "Upload new files"
  • then about middle-right hit "Raw"
  • so the link is "https://raw.githubusercontent.com/USER/REPO/main/FILE" eg:
  • "https://raw.githubusercontent.com/blup321/vis/main/a.csv"
  • and remember to load the 'raw' file!, eg: insheet using https://github.com/sdegiorgis/test/raw/master/PhillyParcelsSubsample.csv

    the following might be different/out of date; i did it few years back

    github (<100mb) NOT easy

    can upload data files over 25mb (but not larger than 100mb) to GitHub if you upload them through the command in your computers command prompt (for my Mac its called terminal). It took a million articles and tutorials but I finally got my biggest data set to upload this way! If it would be helpful, I could take the time to type up how I did this for my classmates/future students to use to upload bigger data sets to GitHub. One of the key articles I used is here: https://help.github.com/en/github/managing-files-in-a-repository/adding-a-file-to-a-repository-using-the-command-line