Contents: 1. The copyright-extractor script 2. Using the script 3. Adding copyright files to spec files ----------------------------------------------------------------------------- 1. The copyright-extractor script The copyright-extractor script can be used to find copyright and licensing comments in source files. The script looks at C, C++, Java, Python, Perl and shell script files (based on file name) and finds all comments that appear in the file before actual code starts. Then it finds identical comments. This works for some modules, where standard comment blocks are used, however even in those modules, the authors and copyright holders may be different in each module. So the next step is trying to merge the licenses. Consider these 2 headers: /* ATK - Accessibility Toolkit * Copyright 2007 Sun Microsystems Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Library General Public * License as published by the Free Software Foundation; either * version 2 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Library General Public License for more details. * * You should have received a copy of the GNU Library General Public * License along with this library; if not, write to the * Free Software Foundation, Inc., 59 Temple Place - Suite 330, * Boston, MA 02111-1307, USA. */ /* ATK - Accessibility Toolkit * Copyright 2001 Sun Microsystems Inc. * * This library is free software; you can redistribute it and/or * modify it under the terms of the GNU Lesser General Public * License as published by the Free Software Foundation; either * version 2 of the License, or (at your option) any later version. * * This library is distributed in the hope that it will be useful, * but WITHOUT ANY WARRANTY; without even the implied warranty of * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU * Lesser General Public License for more details. * * You should have received a copy of the GNU Lesser General Public * License along with this library; if not, write to the * Free Software Foundation, Inc., 59 Temple Place - Suite 330, * Boston, MA 02111-1307, USA. */ The only difference is the copyright year. The merged license will be: ATK - Accessibility Toolkit Copyright 2001 Sun Microsystems Inc. Copyright 2007 Sun Microsystems Inc. This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2 of the License, or (at your option) any later version. This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Lesser General Public License for more details. You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA. There are usually bigger differences between comments. Merging only takes place if at least 10 lines match. There may be many more authors and copyright holders and additional comments at the end of the license comments. This can sometimes cause weird results. In general, the fewer files checked, the better the results. In the case of something like SUNWgnome-base-libs, I recommend extracting the licenses for each component (atk, glib, gtk, pango, cairo, ...) separately and manually merging them. In all cases, the output must be carefully reviewed, edited and the unnecessary lines deleted. Copyright notices should appear at the beginning of each license block (as per Sun guidelines). When you find copyright notices that only differ in the year or the addition of an email address, it's safe to merge them. Examples: if you find things like this: Copyright (C) 2003, 2006 Red Hat, Inc. Copyright (C) 2004, 2005 Red Hat, Inc. Copyright (C) 2003, 2004, 2007 Red Hat, Inc. Copyright (C) 2003, 2004, 2005 Red Hat, Inc. Copyright (C) 2003, 2004 Red Hat, Inc. Copyright (C) 2002-2006 Red Hat Inc. Copyright (C) 2002, 2003, 2006 Red Hat, Inc. Copyright (C) 2002, 2003, 2005 Red Hat Inc. Copyright (C) 2002, 2003, 2004, 2006 Red Hat Inc. Copyright (C) 2002, 2003, 2004, 2005 Red Hat, Inc. Copyright (C) 2002, 2003, 2004 Red Hat Inc. Copyright (C) 2002, 2006 Red Hat Inc. Copyright (C) 2002, 2005 Red Hat Inc. Copyright (C) 2002, 2004 Red Hat Inc. Copyright (C) 2002, 2003 Red Hat, Inc. Copyright (C) 2007 Red Hat Inc. Copyright (C) 2006 Red Hat, Inc. Copyright (C) 2005 Red Hat, Inc. Copyright (C) 2004 Red Hat, Inc. Copyright (C) 2003 Red Hat, Inc. Copyright (C) 2003 Red Hat, Inc. consolidate this down to: Copyright (C) 2002-2007 Red Hat, Inc. Or in this case: Copyright (C) 2007, Jamie McCracken Copyright (C) 2007, Jamie McCracken it's enought to keep the latter, it's clearly the same person. Note that if the source file contains only a reference to a file not easily accessible to the reader of the package copyright file (e.g., the example below with a reference to a file that must be ftp'ed), you will need to include the text from the COPYING file that is referenced: /* (C) 1998-2002 Red Hat, Inc. -- Licensing details are in the COPYING file accompanying popt source distributions, available from ftp://ftp.rpm.org/pub/rpm/dist */ If you suspect that something went wrong during the smart merge (the output looks bogus) or the merges result in weird license texts even if you run the script subdirectory by subdirectory, you can use the -r (or --raw) option. This causes the script to only unify identical comments. In this mode does not attempt to do the "smart merge". There's also -g (or --gpl) option that can be used to detect if any of the licenses in the output appears to be LGPL or GPL, in which case it puts Sun's standard disclaimer about the choice of GPLv2. The -c (or --copyright-first) option copies the copyright statements and author names to the beginning of each license block, as required by Sun guidelines. This option only works if the copyright lines don't extent to multiple lines, e.g. the following will break badly: Copyright (c) yeah Some Company, Inc All rights reserved. Use is subject to license terms. Authors: joe.bloggs@some-company.com The result will be that the Copyright line is moved, but the rest of the lines stay where they were. I would say that this option is less that useful. The -O (or --omitted) option prints the list of files NOT read by this script. 2. Using the script The copyright-extractor script is used for creating copyright files for SUNW packages. Since the contributors (who are usually copyright owners) and even licenses are subject to change, this process needs to be repeated each time we update GNOME in Nevada, typically once every 6 months. The recommended way to run the script is: pkgtool prep --download SUNWfoo.spec scripts/copyright-extractor -g /path/to/BUILD/SUNWfoo-version \ > copyright.txt As mentioned earlier, in the case of larger packages, it's a good idea and sometimes necessary to run the script for each component or ever each subdirectory of larger components: scripts/copyright-extractor -g /path/to/BUILD/SUNWfoo-version/libfoo \ > copyright.txt scripts/copyright-extractor -g /path/to/BUILD/SUNWfoo-version/libbar \ >> copyright.txt ... Then, review and edit the copyright.txt file to create SUNWfoo.copyright. 3. Adding copyright files to spec files Each SUNW spec file needs a copyright file, "base" spec files don't need copyright files. Place the copyright file in spec-files/copyright/ The naming convention is SUNWpackage-name.copyright. Then edit SUNWpackage-name.spec and add SUNW_Copyright: %{name}.copyright in the preamble (i.e. where Name, Version, etc. are). The subpackages (-devel, -root, etc.) inherit this setting from the main package so you don't need to repeat this line in the %package section.