Getting Started With Git: Difference between revisions

From genomewiki
Jump to navigationJump to search
No edit summary
 
(97 intermediate revisions by 9 users not shown)
Line 1: Line 1:
== Getting Started With Git ==
So, you think you know git? Take this [[http://genecats.soe.ucsc.edu/eng/gitQuiz.txt quiz]] to get permission to work with the kent source code git repository.


== Gitting Started With Git ==
(This page is intended for UCSC Genome Browser developers and staff.)<br>


'''Git''' is a modern (SCM) source code management system written by Linus Torvalds.
'''Git''' is a modern (SCM) source code management system written by Linus Torvalds.
Like all his software, he names it after himself.
(This page is intended for UCSC Genome Browser developers.)
We are currently in the process of migrating from CVS to Git.


== Setting Up Your Own Personal Git Kent Repository ==
== Setting Up Your Own Personal Git Kent Repository ==


To create your personal git repository of the kent source,
To create your personal git repository of the kent source,
please use the following simple directions:
please use the following simple directions (note, you can also copy someone's .gitconfig file, changing the name and email, but the below steps will reproduce):


   cd $HOME
   cd $HOME
    
    
   git config --global user.name "Your Name Here"
   git config --global user.name "Your Name Here"
   git config --global user.email yourlogin@soe.ucsc.edu
   git config --global user.email yourlogin@ucsc.edu
 
To enable automatic merge without requiring a comment (the way git used to be):
 
'''tcshell''' uses this syntax (see .tcshrc):
 
  setenv GIT_MERGE_AUTOEDIT no
 
'''bash''' uses '''this syntax''':
 
  GIT_MERGE_AUTOEDIT=no
  export GIT_MERGE_AUTOEDIT
 
As an alternative, this option does not work globally,
so you run it in every git repo that you use:
 
  git config core.mergeoptions --no-edit


If you like colors:
If you like colors:  
(warning this may require particular terminal settings to work right)


   git config --global color.diff auto
   git config --global color.diff auto
Line 25: Line 40:
   git config --global color.branch auto
   git config --global color.branch auto


[and if you want to turn it off?]
If you are security-conscious:
  chmod 644 ~/.gitconfig
Turn off a nagging message about not having a default value set for git push.
git config --global push.default simple
'''These steps get you the repository desired:'''
'''/data/git/kent.git is our shared kent repository.
We access it via SSH.'''
Clone your kent and other git repos:
# only do this the first time when you have no repos:
  cd $HOME
  git clone '''yourlogin'''@hgwdev.gi.ucsc.edu:/data/git/kent.git
  git clone '''yourlogin'''@hgwdev.gi.ucsc.edu:/data/git/htdocsExtras.git
  git clone '''yourlogin'''@hgwdev.gi.ucsc.edu:/data/git/genecats.git
  git clone '''yourlogin'''@hgwdev.gi.ucsc.edu:/data/git/hgdownload.git
We also have a post-receieve hook sending your pushes
to a redmine clone repo.  To make sure that it will work,
you need to ssh to the redmine machine one time,
and answer "y" to the question.
  ssh redmine  # answer y to any questions
  exit          # close the shell on redmine, we don't need to do anything with it
'''Installing standard hooks'''
cd $HOME/kent    # do not do this with the other git repos. only for kent.
./install-hooks.sh
The limit is 2MB maximum for kent repo files.
We want small souce code text files in this repo.
Previously, people only found out they had a too-large file when they pushed to the shared repo,
and then they typically have to do an involved process using git rebase to repair the problem.
This very handy hook will check for too-large files when you run git commit, and give you an error message.
The commit will not have taken place because of the error,
and your too-large file(s) will be still in your staging/index area.
You can replace the big file with a smaller version and re-run git add on it.
Or, you can just use "git rm --cached someFileTooLarge" or equivalent command to remove it from staging.
Once there are no more files that are too big, finish your "git commit" as usual.
'''Making git pulls without passwords:'''
For new people they need to add hgwdev to their authorized_keys.  They create some keys and then add a line to their .ssh directory authorized_keys file.
Create the key if it doesn't exist:
::ssh-keygen -t rsa
Add the public key to the authorized_keys
::cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
::chmod 600 ~/.ssh/authorized_keys
Then you should be able to do a <code> git pull</code> and not be asked for your password.
== Some simple real-world git usage ==
Here is an example of some git commands used to modify a file,
check on it, diff it, check it in, push it up to shared repo:
  vi doSomethingCool.csh
  git diff --help
  git diff  # show detailed diff between working dir and staged
  git diff --stat  # show condensed view of the names of files that changed
  git diff --name-status  # show only the names of files that changed
  git diff --cached # show detailed diff between staged and HEAD of current branch
  git diff HEAD # show detailed diff between working dir and HEAD of current branch
  git diff master origin/master # show detailed diff between local master and central master
  git status    # another way to see info about changes
  git add doSomethingCool.csh  # notice that this adds it our list of things
                            # to be committed together in the next commit.
                            # Whether it is a new file, or just a change, you do a git add.
                            # Several related things can and should be committed at the same time as a unit.
  git commit -m 'made some useful changes ...'  # Commits to your local repo only.
                            # Make the comments useful.
  git diff      # check that the changes got committed
  git status    # another way to see info about changes and so on
  git fetch              # make sure things are up to date including origin/* refs
  git diff origin/master  # diffing work-dir to the shared-repo
  git push          # push my change up to be shared with all
                    # works great for default branch master or for tracking branches
  git push origin HEAD:master    # push head of my branch to central repo master branch
                    # Only do this if your branch is related to the central repo master branch
  git diff origin/master  # verify that there are no more outstanding changes
  git status  # check info again
Warning:
  git commit -a      # '''DO NOT USE'''. Commits ALL changes in your local-repo (dangerous!)
    
    
If you have an existing old test local git repo ~/kentgit, please remove it before proceeding.
-a automatically adds ANY and ALL changes in your local-repo
to the "cache/commit index" list of things to commit
for existing tracked files.  Unlike CVS, it is not influenced
by your current directory location.  Any tracked files that you
have modified, for instance common.mk will get checked in.


  mv kent kent-cvs    # move your old kent directory out of the way
instead use:
   git clone yourlogin@hgwdev.cse.ucsc.edu:/scratch/kentrepo.git/ kent
   git add . -u # this gets all tracked changes in current dir and subdirs.
  cd kent
              # do not forget the -u or it will add ALL files including untracked files.


== Sharing Changes With Others ==
== Sharing Changes With Others ==
Line 42: Line 150:
repository approach is fine for our group.
repository approach is fine for our group.


   git pull origin    # equivalent to cvs up -dP, this pulls in changes by others.
   git pull  
equivalent to cvs up -dP, this pulls in changes by others.  But not if you have removed a file.  Those files will not be recovered.
 
  git add somefile; git commit; git push
equivalent to cvs commit, only do this when your changes are ready to share with others.
 
 
If you are working on a development branch that does not have tracking set up,
you can use these commands to push and pull to master, but
 
  ONLY IF IT MAKES SENSE TO DO SO.
 
  git pull origin master 
    # ONLY do this if your branch is a recent relative of the central repo master.
    # Note that this does not update "origin/*" including origin/master,
    # so a plain git fetch will still be needed if you wish to
    # do git log or git diff against origin/master
 
  git push origin HEAD:master 
    # ONLY do this if your branch is a recent relative of the central repo master.
 
"origin" refers to the shared repository from which your local repo was cloned.
 
== Stash ==
 
Git stash is handy when you are not keeping your
sandbox clean with other methods such as using
development branches, and you are doing
something potentially dangerous such as pulling, merging,
or switching branches.  Git refuses to lose your stuff
due to failures in these operations, so you may
be required to use git stash to save away those
changes you are not ready to commit yet.
 
Git stash supports several operations:
git stash # save to stash stack
git stash list  # list all the stashes you have
git stash show <name>  # show --stat level details about stash
git stash show -p <name>  # show diff details about stash
git stash save <name>  # save a named stash
git stash apply <name> # apply a named stash, i.e. merge it in
git stash drop <name>  # delete the stash
git stash pop          # apply the stash and clear it from the stack
git stash clear        # be careful, wipes out all of your stashes
 
If when pulling or merging you get an error saying
"not uptodate", this is usually because you have changes
in your working directory that are in danger of being
lost because they are not checked in anywhere.
 
Should recovery from a failed merge get ugly,
you might lose your edits and git refuses to be
responsible for that.
 
Newer versions of git have an improved error message
that advises you to commit or stash your working dir changes.
 
If this error occurs, here are your options:
 
1. Check it in. If it's ready, don't be afraid of commitment.
git add myfile
git commit -m 'new great thing'
git pull
# note that you may now see a regular merge conflict to be resolved.
 
2. Use stash. 
git stash
git pull
git stash pop
# Note that the pop is a merge and you see a conflict to be resolved.
# This only restores the working directory. 
# You can use git stash apply --index to restore staging too.
 
3. Check it in on another development branch.
git checkout -b brandNewTopic
git add myfile
git commit -m 'new great thing'
git checkout master
# now I can pull or merge
# the myfile changes are no longer in the working dir.
git pull
 
4. Abandon the changes.
# I really didn't want those changes anyway
git checkout myfile  # over-write myfile with last committed version on HEAD.
# can alternatively use git reset commands with caution, see the section on reset.
git pull
 
5. Copy it aside and restore with checkout or reset(yuck).
# Not recommended.  You can't simply do a unix mv,
# because git will think you deleted the file and that
# this deletion is just another unchecked-in change.
cp myfile myfileX  # If you try to restore later, BEWARE losing of other peoples changes!!!
git checkout myfile  # over-write myfile with last committed version on HEAD.
# can alternatively use git reset commands with caution, see the section on reset.
git pull
 
When git makes a stash it is really a commit.
The git stash command is primarily a convenience feature. 
The stash works roughly equivalent to:
git checkout -b mystash0
git add -u   # add all tracked but dirty changes
git commit -m 'my stash 0'
git checkout master  # return to master branch, or whatever branch you were on.
    # note that there are no more dirty files on either branch
 
The real stashes are stored in a separate namespace
from tags and branches.
 
Git stash apply is roughly like:
git merge mystash0
git reset HEAD^
 
== CVS Equivalents ==
'''More equivalents [[Git:_CVS_equivalent_operations | here]]'''
 
  cvs add somefile ==> git add somefile
 
  cvs commit -m 'comment' somefile ==> git add somefile; git commit -m 'comment'; git push
 
  cvs log somefile  ==> git log somefile
  You may find it useful to pipe some outputs into the "tig" utility.
 
  cvs ann somefile ==> git blame somefile
  You can see who did what when.
 
  cvs up -dP ==>  git pull 
  (git fetch only pulls in new objects but does no merging)
 
  cvsup ==> git status 
  (but nicely git status does not need to update
  and thereby mess with your working dir to do it)
 
  cvs rm somefile ==> git rm somefile
 
  git mv somepath newpath (cvs has no real equivalent)
 
== git diff ==
 
  git diff does a lot of things.
  You can see just names or full details.
  You can diff between different specific commits,
  between branches, between repositories,
  between your sandbox and your commit-list,
  between your commit-list and the head, etc.
 
== Ignoring files ==
Configuring git to ignore certain files [[Git ignore | here]]'''
 
 
== Terminology ==
 
We say sandbox or working-dir interchangeably here.
We also say commit-list, stage, cached, or index,
but they all mean the same thing.  This is the changes
you have said you want in the next commit, but that
commit has not been frozen in yet.
 
== Comments ==
 
Please use good comments.
This is often all people have to go on when
looking through the git log commit history.
Try to make it a meaninful one-liner if you can.
It's worth taking a moment to get it right.
And with git, you can fix the comment that
is messed up before pushing to central repo
and shared history.
  git commit --amend
 
== Branches ==
 
Read this wiki page on [[Working with branches in Git]] for details on how we use branches within our group.
 
Your default branch in your own repository is called master.
Because we imported cvs history, we all have a lot of branches already.
There is also a master or head branch on the shared-repository.
 
You can and should easily create additional branches in your
local repository.  This requires NO TAGGING, and it's fast
and convenient.  You can switch back and forth between
the master branch for a quick fix and some more involved
detailed development branch, or make a quick branch
to test some idea, or another friends code.  It's cheap
to leave these local branches, they don't clog up the
shared repository, and you can also clean up ones that
you no longer need.  Merging stuff between branches
is usually pretty easy and smooth.  Note that if you
have outstanding changes that would be lost when switching
branches, you can tuck them away with the git stash command.
Then you should be able to switch branches.  But you need
to later use your stash and delete it to tidy up.
 
== Tree-ish ==
 
(Do not be alarmed.  You are not experiencing perl hell.  Breathe deeply.)
 
  You can use a hashId from a commit.
  You can use a symbol such as a tag or a branch-tag.
  You can use tree-ish commands:
  Remember that merge commits have two parents,
  the first parent is the mainline branch master^1,
  the second parent is the other merged-in branch master^2.
 
  master^2^1 means the first parent of the second parent of master.
  master^1 == master^
  master^^^ == master~3
  master^  == master~ == master^1 == master~1
 
  You can use this to choose ancestors relative
  to a branch-tag or other symbol or SHA1-id.
  Use git rev-parse <some-tree-expression>
  to actually resolve a tree-ish expression into a specific hash-id.
 
== Git Diff and Git Log ==
 
  git diff and git log (and other commands too) use tree-ish.
  But they use it in different ways.
 
  git diff x..y means give me the diff from x to y (but not including x itself).
  It simply does a diff between those two commit-endpoints.
  It does not consider the exact history between them.
  Notice that git diff y..x is the inverse of x..y
  so that insertion becomes deletion etc.
 
  Unlike git diff, git log is concerned with all
  the history between the two points.
  git log x..y means git log ^x y.
  ^x means not in x (^==NOT==EXCLUSION)
  Notice that the caret(^) is on the LEFT.
  So git log ^x y means y and its ancestors
  not including x and its ancestors.
  But git log y..x is nothing like git log x..y
  git log y..x means git log ^y x which means
  x and its ancestors not including y and its ancestors.
 
  So x..y means very different things to diff and log.
 
  There is a x...y (triple dots) also for both diff and log.
  In the case of git log, x...y means all things
  that are in x and its ancestors
  and y and its ancestors, but not in any of their common ancestors.
 
  git diff x...y means find common ancestor of x and y,
  and then diff from there to y.
 
  Here is a handy diff if you have been working on a branch,
  and you do not want to have to do a git pull from master,
  you can use this method to see only the changes ON YOUR BRANCH,
  and ignore changes on master since your branch started:
  git diff --stat origin/master...HEAD
  git log --stat ^origin/master HEAD
 
== Re-basing ==
 
Don't go crazy with re-basing.
It's not usually necessary.
 
Another important rule is,
do not change shared history.
 
Re-basing is a special technique used in your local repo
before you push your changes to the shared repo.
It allows you make the history appear tidier and
more linear. For instance, you may have been
working and checked in 3 small closely related changes.
You want to just have them all be one single commit
with a good comment.  Re-basing is one way.
 
Re-basing can also be used to linearize the history,
which is sometimes helpful for making a nice change list.
It basically takes your changes since your branch was
forked off, updates all your changes and re-writes them
as if they had been patched in after the current shared
state.  This creates a simpler merge and makes reading
the shared history easier.  There is some fluidity to
changes in your own repo, but once it gets pushed up
to the shared repo, it's not so easy to change, in part
because everyone's history would have to be modified
at that point.
 
== Figuring Our What Commits Are Already Pushed To Shared Repo ==
 
We know that it is sometimes safe to carefully
modify commits in the personal repo which have not
yet been pushed to the shared repo, and therefore
are not in the shared history.
 
You probably have a little bit of a clue because you remember
what you have been working on recently.
 
If you are thinking of using git reset to wipeout the
most recent commit for instance, please be sure to
be very careful and check with git log what commits you have.
 
Sometimes people forget that doing a git pull will create
a new merge commit on top of whatever they were working on.
So looking with the git log command is important.
 
Any time you see a commit on your git log history
which was NOT authored by you, then that is definitely
shared history and should not be messed with.
 
But on occasion, even things authored by you may have
been pushed already.
 
What are some things we can do to test the commits on the git log history?
 
<pre>
git branch -r --contains commitId
</pre>
This should list remote (-r) branches on the shared repo which contain the specified commitId shaHash.
<pre>
  origin/HEAD -> origin/master
  origin/master
</pre>
So this means that the commitId is for sure in the shared history
and you should NOT modify it under any circumstances.
If it does not appear in any branches, then the output of the above command will be blank.
 
Another idea is that you can find the most recent common ancestor
between your branch and the shared repo:
 
How to find the most recent common ancestor of two Git branches?
 
<pre>
git merge-base master origin/master
050dc022f3a65bdc78d97e2b1ac9b595a924c3f2
</pre>
So this means that this is the first shared history commit and should not be modified.
 
== Quick Repo Updates ==
CVS update was getting very slow
because the source had grown to thousands of files
and CVS update has to check each one for changes.
With git, a change log is kept and only new changes
need to be processed.  This is usually very fast.
 
== No undetected corruption ==
Git is also big on making sure digital corruption does
not creep in, and it will detect it automatically if it happens.
Everytime that any object is accessed in git, it runs the SHA1
has on the extracted content and compares it to its ID.
If any corruption has occurred, either accidentally or
intentionally, you will be informed right away.
 
== No more hanging locks ==
No more concerns about hanging CVS locks, for example:
  cvs log somefile | more
You are working in your own repo, and have full access to all the history.
You are not holding up anybody else.
 
== Mirror site access to git repository ==
Mirror sites will have read-only access here:
 
  git://genome-source.soe.ucsc.edu/kent.git
 
If you have firewall issues, this will also be provided:
 
  http://genome-source.soe.ucsc.edu/kent.git
 
== Browse the kent source online ==
  http://genome-source.soe.ucsc.edu/gitlist/
 
== Make a permanent URL link ==
Make a permanent link to the latest version of a file in the kent source
  http://genome-source.soe.ucsc.edu/gitlist/kent.git/raw/master/src/makefile
 
== External git Documentation ==
The official git [http://git-scm.com home page], [http://git-scm.com/documentation documentation], and [http://www.kernel.org/pub/software/scm/git/docs/gittutorial.html tutorial]. Another interesting [http://wiki.freegeek.org/index.php/Git_for_dummies tutorial].
 
A recommended [https://www.codecademy.com/learn/learn-git codecademy session about git].
 
A good complete resource for git is the book [http://progit.org/book/ "Pro Git" by Scott Chacon]. The complete text of the book is available online [http://progit.org/book/ here].
 
Here is a link you can access to a book via our VPN library connection. [http://proquest.safaribooksonline.com/book/bioinformatics/9781449367480/firstchapter#X2ludGVybmFsX0h0bWxWaWV3P3htbGlkPTk3ODE0NDkzNjc0ODAlMkZjaDA1X21lcmdlX2NvbmZsaWN0X2h0bWwmcXVlcnk9Qk9PSw== Merge Conflict]
 
Here is a GitHub repository to clone in two directories where you can model a git hub conflict (you won't be able to push unless you have been added as a contributor -ask BrianL- but you can model everything up to that and still see git conflict messages): [https://github.com/brianleetest/testGit/blob/master/README.md https://github.com/brianleetest/testGit/blob/master/README.md]
 
== Installing git on a Mac OS ==
 
See also: [[Installing git]] for other operating systems.
 
$ sudo port install git-core
 
That seems pretty straight forward, but the problem is you may not
have the 'port' command installed yet.  That is a little more involved.
To get this "MacPorts' system installed, please follow their installation
procedures at: [http://guide.macports.org/ Guide MacPorts].  This could be
an extensive procedure depending upon what you may not yet have installed
on your Mac since it needs the X11 system and the Xcode tools installed
to get MacPorts installed.  But once you have this 'port' command, you
can install a vast array of interesting software quite easily.
 
Like Windows, the default Mac disk is case-insensitive.
If there are two versions of the file with different cases,
this will cause problems.  Google for work-arounds.
 
== Genome Browser Mirror Sites ==
 
Move your existing cvs kent directory out of the way:
mv kent kent.cvs
 
Start a new kent git repository.  This clone command will establish a new kent directory:
git clone git://genome-source.soe.ucsc.edu/kent.git
 
If you have a firewall that interferes with that operation, use http:
git clone http://genome-source.soe.ucsc.edu/kent.git
 
Then mark your repository with the beta tag so it will track along
with our beta releases.  '''Important:''' you need to be in the newly
created kent directory for this command to function:
cd kent
git checkout -t -b beta origin/beta
 
On other versions of git, this may be:
git checkout --track -b beta origin/beta
 
Some older versions of git do not allow this tracking option.
 
== [[Resolving merge conflicts in Git]] ==
 
== Galt's tips for avoiding merge conflicts ==
 
1. Check your stuff in before merge/pull
whenever possible.  You will thank yourself for it.
You can make a branch if you need to.
 
2. Sometimes git will not allow you to start the
pull/merge because it knows you will lose unchecked in stuff.
So then you check it in or stash it.
 
3. ONCE THE MERGE HAS BEGUN AND FAILED, it shows an error,
then you need to realize that you are in a kind of critical
state where you want to be careful not to lose other people's
changes.
 
* You need to edit each file which git status shows as "merge".
* You need to git add the file.  Yes, really, just do it!
* Repeat until merge conflicts are all gone.
* git commit -m 'good message'
    !!!! Do not try to specify a filename here.
      i.e. DO NOT DO: git commit -m 'msg' somefile(s)
    Hopefully git will not allow it, realizing that you are in the middle of a merge.
 
You are committing EVERYBODY ELSE'S changes
along with your merge-conflict resolution.  Do not lose their changes.
Their changes are sitting in your sandbox and staging area.
Get the conflicts resolved, git add them,
then do the commit for everything involved in the merge.
 
You want to commit it ALL.
 
(It would be helpful to say "Merge" somewhere in your commit message too.)
 
ESPECIALLY DO NOT DO DURING A MERGE FAILURE:
 
  git stash    // DO NOT DO !!!
      just messes up your merge.  Too bad git does not disallow it.
 
  git reset    // DO NOT DO !!!
      this screws up your staging area, clearing it and
          LOSING other peoples changes, while leaving your sandbox
            full of files from the messed-up merge
              that will be a pain to find and remove
              to recover when you finally fix what's wrong.
 
If you have goofed up and suspect you might have lost other people's changes, STOP AND GET HELP.  DO NOT GIT PUSH.
 
I spent an hour this morning going over how this happened to someone
and showing them what they needed to do right instead,
which was quite easy when you know what you're doing.
 
NOTE that once you have safely and correctly gotten past the
merge resolution and checkin, you are no longer in a critical-path,
so to speak, and git use can return to normal.
 
IF you get yourself messed up, I can help you recover.
 
Yes, there is a way to abandon a merge, but it's not that much fun.
It's almost always better to just resolve the conflicts,
add then, commit. Go forward! You can do it!  Ganbatte!


  git push master origin  # equivalent to cvs commit, only do this when your changes are ready to share with others.
[[Category:Browser Development]]
[[Category:Git]]

Latest revision as of 09:36, 29 April 2021

So, you think you know git? Take this [quiz] to get permission to work with the kent source code git repository.

Gitting Started With Git

(This page is intended for UCSC Genome Browser developers and staff.)

Git is a modern (SCM) source code management system written by Linus Torvalds.

Setting Up Your Own Personal Git Kent Repository

To create your personal git repository of the kent source, please use the following simple directions (note, you can also copy someone's .gitconfig file, changing the name and email, but the below steps will reproduce):

 cd $HOME
 
 git config --global user.name "Your Name Here"
 git config --global user.email yourlogin@ucsc.edu

To enable automatic merge without requiring a comment (the way git used to be):

tcshell uses this syntax (see .tcshrc):

 setenv GIT_MERGE_AUTOEDIT no

bash uses this syntax:

 GIT_MERGE_AUTOEDIT=no
 export GIT_MERGE_AUTOEDIT

As an alternative, this option does not work globally, so you run it in every git repo that you use:

 git config core.mergeoptions --no-edit

If you like colors: (warning this may require particular terminal settings to work right)

 git config --global color.diff auto
 git config --global color.status auto
 git config --global color.branch auto

[and if you want to turn it off?]

If you are security-conscious:

 chmod 644 ~/.gitconfig 

Turn off a nagging message about not having a default value set for git push.

git config --global push.default simple

These steps get you the repository desired: /data/git/kent.git is our shared kent repository. We access it via SSH. Clone your kent and other git repos:

  1. only do this the first time when you have no repos:
 cd $HOME
 git clone yourlogin@hgwdev.gi.ucsc.edu:/data/git/kent.git
 git clone yourlogin@hgwdev.gi.ucsc.edu:/data/git/htdocsExtras.git
 git clone yourlogin@hgwdev.gi.ucsc.edu:/data/git/genecats.git
 git clone yourlogin@hgwdev.gi.ucsc.edu:/data/git/hgdownload.git

We also have a post-receieve hook sending your pushes to a redmine clone repo. To make sure that it will work, you need to ssh to the redmine machine one time, and answer "y" to the question.

 ssh redmine   # answer y to any questions
 exit          # close the shell on redmine, we don't need to do anything with it

Installing standard hooks

cd $HOME/kent    # do not do this with the other git repos. only for kent.
./install-hooks.sh

The limit is 2MB maximum for kent repo files. We want small souce code text files in this repo. Previously, people only found out they had a too-large file when they pushed to the shared repo, and then they typically have to do an involved process using git rebase to repair the problem. This very handy hook will check for too-large files when you run git commit, and give you an error message. The commit will not have taken place because of the error, and your too-large file(s) will be still in your staging/index area. You can replace the big file with a smaller version and re-run git add on it. Or, you can just use "git rm --cached someFileTooLarge" or equivalent command to remove it from staging. Once there are no more files that are too big, finish your "git commit" as usual.

Making git pulls without passwords: For new people they need to add hgwdev to their authorized_keys. They create some keys and then add a line to their .ssh directory authorized_keys file. Create the key if it doesn't exist:

ssh-keygen -t rsa

Add the public key to the authorized_keys

cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
chmod 600 ~/.ssh/authorized_keys

Then you should be able to do a git pull and not be asked for your password.

Some simple real-world git usage

Here is an example of some git commands used to modify a file, check on it, diff it, check it in, push it up to shared repo:

 vi doSomethingCool.csh
 git diff --help
 git diff  # show detailed diff between working dir and staged
 git diff --stat  # show condensed view of the names of files that changed
 git diff --name-status  # show only the names of files that changed
 git diff --cached # show detailed diff between staged and HEAD of current branch
 git diff HEAD # show detailed diff between working dir and HEAD of current branch
 git diff master origin/master # show detailed diff between local master and central master
 git status     # another way to see info about changes
 git add doSomethingCool.csh   # notice that this adds it our list of things
                           # to be committed together in the next commit.
                           # Whether it is a new file, or just a change, you do a git add.
                           # Several related things can and should be committed at the same time as a unit.
 git commit -m 'made some useful changes ...'  # Commits to your local repo only.
                           # Make the comments useful.
 git diff       # check that the changes got committed
 git status     # another way to see info about changes and so on
 git fetch               # make sure things are up to date including origin/* refs
 git diff origin/master  # diffing work-dir to the shared-repo
 git push          # push my change up to be shared with all
                   # works great for default branch master or for tracking branches
 git push origin HEAD:master    # push head of my branch to central repo master branch
                   # Only do this if your branch is related to the central repo master branch
 git diff origin/master  # verify that there are no more outstanding changes
 git status  # check info again

Warning:

 git commit -a      # DO NOT USE. Commits ALL changes in your local-repo (dangerous!)
 

-a automatically adds ANY and ALL changes in your local-repo to the "cache/commit index" list of things to commit for existing tracked files. Unlike CVS, it is not influenced by your current directory location. Any tracked files that you have modified, for instance common.mk will get checked in.

instead use:

 git add . -u # this gets all tracked changes in current dir and subdirs.
              # do not forget the -u or it will add ALL files including untracked files.

Sharing Changes With Others

Git is a distributed SCM so it works a bit differently from CVS. Each user has their own local project repository which includes the full history of all changes ever made. This allows one to work offline and had other advantages besides. However, for simplicity there is a shared repository that people push changes to from their local repository. More complex configurations are possible, such as hierarchical. Probably the shared repository approach is fine for our group.

 git pull 

equivalent to cvs up -dP, this pulls in changes by others. But not if you have removed a file. Those files will not be recovered.

 git add somefile; git commit; git push 

equivalent to cvs commit, only do this when your changes are ready to share with others.


If you are working on a development branch that does not have tracking set up, you can use these commands to push and pull to master, but

 ONLY IF IT MAKES SENSE TO DO SO.
 git pull origin master  
    # ONLY do this if your branch is a recent relative of the central repo master.
    # Note that this does not update "origin/*" including origin/master,
    # so a plain git fetch will still be needed if you wish to 
    # do git log or git diff against origin/master
 git push origin HEAD:master  
    # ONLY do this if your branch is a recent relative of the central repo master.

"origin" refers to the shared repository from which your local repo was cloned.

Stash

Git stash is handy when you are not keeping your sandbox clean with other methods such as using development branches, and you are doing something potentially dangerous such as pulling, merging, or switching branches. Git refuses to lose your stuff due to failures in these operations, so you may be required to use git stash to save away those changes you are not ready to commit yet.

Git stash supports several operations:

git stash # save to stash stack
git stash list  # list all the stashes you have
git stash show <name>  # show --stat level details about stash 
git stash show -p <name>  # show diff details about stash
git stash save <name>  # save a named stash
git stash apply <name> # apply a named stash, i.e. merge it in
git stash drop <name>  # delete the stash
git stash pop          # apply the stash and clear it from the stack
git stash clear        # be careful, wipes out all of your stashes

If when pulling or merging you get an error saying "not uptodate", this is usually because you have changes in your working directory that are in danger of being lost because they are not checked in anywhere.

Should recovery from a failed merge get ugly, you might lose your edits and git refuses to be responsible for that.

Newer versions of git have an improved error message that advises you to commit or stash your working dir changes.

If this error occurs, here are your options:

1. Check it in. If it's ready, don't be afraid of commitment.

git add myfile
git commit -m 'new great thing'
git pull
# note that you may now see a regular merge conflict to be resolved.

2. Use stash.

git stash
git pull 
git stash pop
# Note that the pop is a merge and you see a conflict to be resolved.
# This only restores the working directory.  
# You can use git stash apply --index to restore staging too.

3. Check it in on another development branch.

git checkout -b brandNewTopic
git add myfile
git commit -m 'new great thing'
git checkout master
# now I can pull or merge
# the myfile changes are no longer in the working dir.
git pull
 

4. Abandon the changes.

# I really didn't want those changes anyway
git checkout myfile  # over-write myfile with last committed version on HEAD.
# can alternatively use git reset commands with caution, see the section on reset.
git pull

5. Copy it aside and restore with checkout or reset(yuck).

# Not recommended.  You can't simply do a unix mv,
# because git will think you deleted the file and that 
# this deletion is just another unchecked-in change.
cp myfile myfileX  # If you try to restore later, BEWARE losing of other peoples changes!!!
git checkout myfile  # over-write myfile with last committed version on HEAD.
# can alternatively use git reset commands with caution, see the section on reset.
git pull

When git makes a stash it is really a commit. The git stash command is primarily a convenience feature. The stash works roughly equivalent to:

git checkout -b mystash0
git add -u    # add all tracked but dirty changes 
git commit -m 'my stash 0'
git checkout master  # return to master branch, or whatever branch you were on.
   # note that there are no more dirty files on either branch

The real stashes are stored in a separate namespace from tags and branches.

Git stash apply is roughly like:

git merge mystash0
git reset HEAD^

CVS Equivalents

More equivalents here

 cvs add somefile ==> git add somefile
 
 cvs commit -m 'comment' somefile ==> git add somefile; git commit -m 'comment'; git push
 
 cvs log somefile  ==> git log somefile
 You may find it useful to pipe some outputs into the "tig" utility.
 
 cvs ann somefile ==> git blame somefile
 You can see who did what when.
 
 cvs up -dP ==>  git pull  
 (git fetch only pulls in new objects but does no merging)
 
 cvsup ==> git status  
 (but nicely git status does not need to update 
 and thereby mess with your working dir to do it)
 
 cvs rm somefile ==> git rm somefile
 
 git mv somepath newpath (cvs has no real equivalent)

git diff

 git diff does a lot of things.
 You can see just names or full details.
 You can diff between different specific commits,
 between branches, between repositories,
 between your sandbox and your commit-list,
 between your commit-list and the head, etc.

Ignoring files

Configuring git to ignore certain files here


Terminology

We say sandbox or working-dir interchangeably here. We also say commit-list, stage, cached, or index, but they all mean the same thing. This is the changes you have said you want in the next commit, but that commit has not been frozen in yet.

Comments

Please use good comments. This is often all people have to go on when looking through the git log commit history. Try to make it a meaninful one-liner if you can. It's worth taking a moment to get it right. And with git, you can fix the comment that is messed up before pushing to central repo and shared history.

 git commit --amend

Branches

Read this wiki page on Working with branches in Git for details on how we use branches within our group.

Your default branch in your own repository is called master. Because we imported cvs history, we all have a lot of branches already. There is also a master or head branch on the shared-repository.

You can and should easily create additional branches in your local repository. This requires NO TAGGING, and it's fast and convenient. You can switch back and forth between the master branch for a quick fix and some more involved detailed development branch, or make a quick branch to test some idea, or another friends code. It's cheap to leave these local branches, they don't clog up the shared repository, and you can also clean up ones that you no longer need. Merging stuff between branches is usually pretty easy and smooth. Note that if you have outstanding changes that would be lost when switching branches, you can tuck them away with the git stash command. Then you should be able to switch branches. But you need to later use your stash and delete it to tidy up.

Tree-ish

(Do not be alarmed. You are not experiencing perl hell. Breathe deeply.)

 You can use a hashId from a commit.
 You can use a symbol such as a tag or a branch-tag.
 You can use tree-ish commands:
 Remember that merge commits have two parents,
 the first parent is the mainline branch master^1,
 the second parent is the other merged-in branch master^2.
 
 master^2^1 means the first parent of the second parent of master.
 master^1 == master^
 master^^^ == master~3
 master^  == master~ == master^1 == master~1
 
 You can use this to choose ancestors relative
 to a branch-tag or other symbol or SHA1-id.
 Use git rev-parse <some-tree-expression> 
 to actually resolve a tree-ish expression into a specific hash-id.

Git Diff and Git Log

 git diff and git log (and other commands too) use tree-ish.
 But they use it in different ways.
 
 git diff x..y means give me the diff from x to y (but not including x itself).
 It simply does a diff between those two commit-endpoints.
 It does not consider the exact history between them.
 Notice that git diff y..x is the inverse of x..y
 so that insertion becomes deletion etc.
 
 Unlike git diff, git log is concerned with all 
 the history between the two points.
 git log x..y means git log ^x y.
 ^x means not in x (^==NOT==EXCLUSION)
 Notice that the caret(^) is on the LEFT.
 So git log ^x y means y and its ancestors
 not including x and its ancestors.
 But git log y..x is nothing like git log x..y
 git log y..x means git log ^y x which means
 x and its ancestors not including y and its ancestors.
 
 So x..y means very different things to diff and log.
 
 There is a x...y (triple dots) also for both diff and log.
 In the case of git log, x...y means all things
 that are in x and its ancestors 
 and y and its ancestors, but not in any of their common ancestors.
 
 git diff x...y means find common ancestor of x and y,
 and then diff from there to y.
 Here is a handy diff if you have been working on a branch,
 and you do not want to have to do a git pull from master,
 you can use this method to see only the changes ON YOUR BRANCH,
 and ignore changes on master since your branch started:

 git diff --stat origin/master...HEAD
 git log --stat ^origin/master HEAD

Re-basing

Don't go crazy with re-basing. It's not usually necessary.

Another important rule is, do not change shared history.

Re-basing is a special technique used in your local repo before you push your changes to the shared repo. It allows you make the history appear tidier and more linear. For instance, you may have been working and checked in 3 small closely related changes. You want to just have them all be one single commit with a good comment. Re-basing is one way.

Re-basing can also be used to linearize the history, which is sometimes helpful for making a nice change list. It basically takes your changes since your branch was forked off, updates all your changes and re-writes them as if they had been patched in after the current shared state. This creates a simpler merge and makes reading the shared history easier. There is some fluidity to changes in your own repo, but once it gets pushed up to the shared repo, it's not so easy to change, in part because everyone's history would have to be modified at that point.

Figuring Our What Commits Are Already Pushed To Shared Repo

We know that it is sometimes safe to carefully modify commits in the personal repo which have not yet been pushed to the shared repo, and therefore are not in the shared history.

You probably have a little bit of a clue because you remember what you have been working on recently.

If you are thinking of using git reset to wipeout the most recent commit for instance, please be sure to be very careful and check with git log what commits you have.

Sometimes people forget that doing a git pull will create a new merge commit on top of whatever they were working on. So looking with the git log command is important.

Any time you see a commit on your git log history which was NOT authored by you, then that is definitely shared history and should not be messed with.

But on occasion, even things authored by you may have been pushed already.

What are some things we can do to test the commits on the git log history?

git branch -r --contains commitId

This should list remote (-r) branches on the shared repo which contain the specified commitId shaHash.

  origin/HEAD -> origin/master
  origin/master

So this means that the commitId is for sure in the shared history and you should NOT modify it under any circumstances. If it does not appear in any branches, then the output of the above command will be blank.

Another idea is that you can find the most recent common ancestor between your branch and the shared repo:

How to find the most recent common ancestor of two Git branches?

git merge-base master origin/master
 050dc022f3a65bdc78d97e2b1ac9b595a924c3f2

So this means that this is the first shared history commit and should not be modified.

Quick Repo Updates

CVS update was getting very slow because the source had grown to thousands of files and CVS update has to check each one for changes. With git, a change log is kept and only new changes need to be processed. This is usually very fast.

No undetected corruption

Git is also big on making sure digital corruption does not creep in, and it will detect it automatically if it happens. Everytime that any object is accessed in git, it runs the SHA1 has on the extracted content and compares it to its ID. If any corruption has occurred, either accidentally or intentionally, you will be informed right away.

No more hanging locks

No more concerns about hanging CVS locks, for example:

 cvs log somefile | more

You are working in your own repo, and have full access to all the history. You are not holding up anybody else.

Mirror site access to git repository

Mirror sites will have read-only access here:

 git://genome-source.soe.ucsc.edu/kent.git

If you have firewall issues, this will also be provided:

 http://genome-source.soe.ucsc.edu/kent.git

Browse the kent source online

 http://genome-source.soe.ucsc.edu/gitlist/

Make a permanent URL link

Make a permanent link to the latest version of a file in the kent source

 http://genome-source.soe.ucsc.edu/gitlist/kent.git/raw/master/src/makefile

External git Documentation

The official git home page, documentation, and tutorial. Another interesting tutorial.

A recommended codecademy session about git.

A good complete resource for git is the book "Pro Git" by Scott Chacon. The complete text of the book is available online here.

Here is a link you can access to a book via our VPN library connection. Merge Conflict

Here is a GitHub repository to clone in two directories where you can model a git hub conflict (you won't be able to push unless you have been added as a contributor -ask BrianL- but you can model everything up to that and still see git conflict messages): https://github.com/brianleetest/testGit/blob/master/README.md

Installing git on a Mac OS

See also: Installing git for other operating systems.

$ sudo port install git-core

That seems pretty straight forward, but the problem is you may not have the 'port' command installed yet. That is a little more involved. To get this "MacPorts' system installed, please follow their installation procedures at: Guide MacPorts. This could be an extensive procedure depending upon what you may not yet have installed on your Mac since it needs the X11 system and the Xcode tools installed to get MacPorts installed. But once you have this 'port' command, you can install a vast array of interesting software quite easily.

Like Windows, the default Mac disk is case-insensitive. If there are two versions of the file with different cases, this will cause problems. Google for work-arounds.

Genome Browser Mirror Sites

Move your existing cvs kent directory out of the way:

mv kent kent.cvs

Start a new kent git repository. This clone command will establish a new kent directory:

git clone git://genome-source.soe.ucsc.edu/kent.git

If you have a firewall that interferes with that operation, use http:

git clone http://genome-source.soe.ucsc.edu/kent.git

Then mark your repository with the beta tag so it will track along with our beta releases. Important: you need to be in the newly created kent directory for this command to function:

cd kent
git checkout -t -b beta origin/beta

On other versions of git, this may be:

git checkout --track -b beta origin/beta

Some older versions of git do not allow this tracking option.

Resolving merge conflicts in Git

Galt's tips for avoiding merge conflicts

1. Check your stuff in before merge/pull whenever possible. You will thank yourself for it. You can make a branch if you need to.

2. Sometimes git will not allow you to start the pull/merge because it knows you will lose unchecked in stuff. So then you check it in or stash it.

3. ONCE THE MERGE HAS BEGUN AND FAILED, it shows an error, then you need to realize that you are in a kind of critical state where you want to be careful not to lose other people's changes.

  • You need to edit each file which git status shows as "merge".
  • You need to git add the file. Yes, really, just do it!
  • Repeat until merge conflicts are all gone.
  • git commit -m 'good message'
   !!!! Do not try to specify a filename here.
     i.e. DO NOT DO: git commit -m 'msg' somefile(s)
   Hopefully git will not allow it, realizing that you are in the middle of a merge.

You are committing EVERYBODY ELSE'S changes along with your merge-conflict resolution. Do not lose their changes. Their changes are sitting in your sandbox and staging area. Get the conflicts resolved, git add them, then do the commit for everything involved in the merge.

You want to commit it ALL.

(It would be helpful to say "Merge" somewhere in your commit message too.)

ESPECIALLY DO NOT DO DURING A MERGE FAILURE:

  git stash    // DO NOT DO !!!
      just messes up your merge.  Too bad git does not disallow it.
  git reset    // DO NOT DO !!!
      this screws up your staging area, clearing it and
         LOSING other peoples changes, while leaving your sandbox
            full of files from the messed-up merge
             that will be a pain to find and remove
             to recover when you finally fix what's wrong.

If you have goofed up and suspect you might have lost other people's changes, STOP AND GET HELP. DO NOT GIT PUSH.

I spent an hour this morning going over how this happened to someone and showing them what they needed to do right instead, which was quite easy when you know what you're doing.

NOTE that once you have safely and correctly gotten past the merge resolution and checkin, you are no longer in a critical-path, so to speak, and git use can return to normal.

IF you get yourself messed up, I can help you recover.

Yes, there is a way to abandon a merge, but it's not that much fun. It's almost always better to just resolve the conflicts, add then, commit. Go forward! You can do it! Ganbatte!