• Tidak ada hasil yang ditemukan

Licensing talk short

N/A
N/A
Protected

Academic year: 2017

Membagikan "Licensing talk short"

Copied!
36
0
0

Teks penuh

(1)

Licensing is Software Too:

Achievements and Challenges

(and how this relates to code provenance)

Massimiliano Di Penta

University of Sannio, Italy

[email protected]

(2)

2

Acknowledgements

Daniel M. Germán

, Univ. Victoria, Canada

Julius Davies

, Univ. Victoria, Canada

Giuliano Antoniol

, Ecole Polyt. Montréal, Canada

Yann-Gaël Guéhéneuc

, Ecole Polyt. Montréal,

(3)

3

Reusing Open Source Software

When developing a software system,

we try (if possible) not to reinvent the wheel

Components, libraries, source

code snippets out of there, ready to be reused

Code search engines are becoming popular

Open source code modification and

redistribution governed by

Software licenses

Copyright statements

Everything contained in a licensing

(4)

4

What does a licensing contain?

/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK *****

* Version: MPL 1.1/GPL 2.0/LGPL 2.1 *

* The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at

* http://www.mozilla.org/MPL/

….

* Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved.

*

* Contributor(s):

* Brian Ryner <[email protected]>

….

* decision by deleting the provisions above and replace them with the notice * and other provisions required by the GPL or the LGPL. If you do not delete * the provisions above, a recipient may use your version of this file under * the terms of any one of the MPL, the GPL or the LGPL.

*

* ***** END LICENSE BLOCK ***** */ #include "nsXULAppAPI.h" #ifdef XP_WIN #include <windows.h>

License

(MPL+GPL+LGPL)

Copyright

statement

Copyright

year

(5)

5

Restrictive vs. permissive

licenses

Restrictive (aka copyleft or reciprocal)

Changed software must be made available under

similar terms wrt. the original

Example:

GPL

Permissive

Modifications/enhancements may remain

proprietary

Distribution of source code or binary permitted

– Provided copyright notice and/or liability disclaimers

– Contributor names do not imply endorsement

Examples:

Berkeley Software Distribution (BSD),

(6)

6

FOSS development teams care!

(source: Debian)

I am in the process of trying to prepare 0.8.0 for Debian

GNU/Linux I have started going over the copyright/license

headers. In src/celeste many files are missing copyright

information. Most of these are files imported with minimal

changes from Gabor API http://www.kung-foo.tv/gaborapi.php

or libsvm http://www.csie.ntu.edu.tw/\~cjlin/libsvm/.

The attached patch adds copyright and license statements

to these files.[1]

Please apply and update the headers (adding copyright

holders) if you make substantial changes.

thanks, cu andreas

[1] I have doublechecked with Gabor API's upstream author

Adriaan Tijsseling that files like ContrastFilter.cpp are

Copyright (c) Adriaan Tijsseling and licensed under

GPLv2+, although the original headers just say:

Original Author: Yasunobu Honma

(7)

7

Conjectures

Since licenses determine the way software

can be composed and re-distributed

They may

change/evolve

as any other part of

the software

They might be subject to

bugs

too

– See our ICPC 2010 paper about how to identify

licensing incompatibilities

They might determine the success/failure of a

software project

Code provenance and licenses:

Licenses constrain source code migration

between projects

Code provenance might be useful to determine

(8)

8

Licenses influence the software

lifetime

OpenBSD founder and project leader Theo de Raadt

removed a security software package called IP-Filter

[written by Darren Reed] after its author changed its

license.

Stephen Shankland, CNET News, 2001/05/30.

Licenses evolve as software does

Failing to account for that would cause copyright

infringements

Decisions on license changes impact as other

decisions on software evolution

Little attention so far from the scientific community

(9)

9

Example: Java

Until November 2006, the license of Java JDK v1.2 said:

“Except as specifically authorized in any Supplemental

License Terms, you may not make copies of Software,

other than a single copy of Software for archival

purposes”

This disallowed the inclusion of Java in Linux distributions

Java 5.0 released under the GPL v2 with the

CLASSPATH exception:

Java could be modified/updated under the GPL v2

Java programs could be released under any license as long as

they satisfy the conditions stated in the CLASSPATH exception

Changing the license of a system can promote

and ease the distribution and reuse of a

(10)

11

Example: QT

First released under a non-open source but free

license, called the FreeQT License, and a commercial

license

QT became the basis for KDE

QT v2.0 was released under a new license, the Q Public

License

incompatible with the GPL

GNOME project started as a QT-free alternative to KDE

Harmony project started as a GPL replacement of QT

Trolltech changed the license of QT v3 to the GPL v2

The Harmony project was abandoned

Changing the license of FOSS system

(11)

13

Empirical Study

Goal:

analyze licensing evolution

Purpose:

investigating how

developers change licensing

statements

Context:

CVS/SVN repositories of

ArgoUML, Eclipse-JDT, the FreeBSD and

(12)

14

Research Questions

RQ1:

To what extent are files

changing their licenses?

RQ2:

How are copyright years

changed in licensing statements?

RQ3:

Who are the contributors of a

(13)

15

Licensing Analysis Method –

Extracting Licensing statements

/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK *****

* Version: MPL 1.1/GPL 2.0/LGPL 2.1 *

* The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at

* http://www.mozilla.org/MPL/

….

* Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved.

*

* Contributor(s):

* Brian Ryner <[email protected]>

….

* decision by deleting the provisions above and replace them with the notice * and other provisions required by the GPL or the LGPL. If you do not delete * the provisions above, a recipient may use your version of this file under * the terms of any one of the MPL, the GPL or the LGPL.

*

* ***** END LICENSE BLOCK ***** */ #include "nsXULAppAPI.h"

#ifdef XP_WIN

(14)

16

Licensing Analysis Method –

Classifying licenses

FoSSology [Gobeille, MSR 2008]

: detects licenses

using the Binary Symbolic Alignment Matrix (bSAM)

Ninka [German et al., ASE 2010]:

uses a

pattern-matching approach

/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK *****

* Version: MPL 1.1/GPL 2.0/LGPL 2.1 *

* The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at

* http://www.mozilla.org/MPL/

….

* Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved.

*

* Contributor(s):

* Brian Ryner <[email protected]>

….

/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK *****

* Version: MPL 1.1/GPL 2.0/LGPL 2.1 *

* The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at

* http://www.mozilla.org/MPL/

….

* Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved.

*

* Contributor(s):

* Brian Ryner <[email protected]>

….

(15)

17

Licensing Analysis Method –

Identifying changes in copyright

years

Mining references to years in licensing…

/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK *****

* Version: MPL 1.1/GPL 2.0/LGPL 2.1 *

* The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at

* http://www.mozilla.org/MPL/

….

* Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved.

*

* Contributor(s):

* Brian Ryner <[email protected]>

….

/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK *****

* Version: MPL 1.1/GPL 2.0/LGPL 2.1 *

* The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at

* http://www.mozilla.org/MPL/

….

* Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved.

*

* Contributor(s):

* Brian Ryner <[email protected]>

(16)

18

Licensing Analysis Method –

Identifying contributor names

Mining emails, plus various patterns

Copyright … year name Contributor(s) …

And mapped to committers, whenever possible

/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK *****

* Version: MPL 1.1/GPL 2.0/LGPL 2.1 *

* The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at

* http://www.mozilla.org/MPL/

….

* Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved.

*

* Contributor(s):

* Brian Ryner <[email protected]>

….

/* -*- Mode: C++; tab-width: 2; indent-tabs-mode: nil; c-basic-offset: 2 -*- */ /* ***** BEGIN LICENSE BLOCK *****

* Version: MPL 1.1/GPL 2.0/LGPL 2.1 *

* The contents of this file are subject to the Mozilla Public License Version * 1.1 (the "License"); you may not use this file except in compliance with * the License. You may obtain a copy of the License at

* http://www.mozilla.org/MPL/

….

* Portions created by the Initial Developer are Copyright (C) 2002 * the Initial Developer. All Rights Reserved.

*

* Contributor(s):

* Brian Ryner <[email protected]>

(17)

19

RQ1: Most relevant license changes

Eclipse-JDT

Common Public License v1.0

Eclipse Public License v1.0

CHANGE

2394

Common Public License v0.5

Common Public License v1.0

UPDATE

808

Mozilla

NPL

'NPL v1.1'-style+GPL v2+LGPL v2.1

DUAL

2914

NPL

'Dual MPL GPL'-style+MPL

DUAL

1274

'Dual MPL GPL'-style+MPL

NPL

BUG

1194

Licensing updated as new licenses were

developed

Eclipse JDT:

CPL 0.5

CPL 1.0

EPL 1.0

IBM has relinquished control of licenses to the Eclipse

Foundation

Mozilla:

NPL

MPL + GPL (+ LGPL)

NPL allowed to release Netscape 6 as a proprietary system

MPL only allows to re-distribute the source code under the

MPL

(18)

20

RQ1: Most relevant license changes

FreeBSD

BSD UCRegents (4-cl BSD)

'BSD UCRegents'-style

(4-cl BSD)

UPDATE

491

'BSD UCRegents'-style (4-cl BSD)

'INRIA-OSL'-style (3-cl BSD)

UPDATE

300

OpenBSD

'BSD UCRegents'-style (4-cl BSD)

'INRIA-OSL'-style (3-cl BSD)

UPDATE

964

BSD UCRegents (4-cl BSD)

'BSD UCRegents'-style

(4-cl BSD)

UPDATE

414

FreeBSD and OpenBSD

are more eclectic

than other projects

Moving from BSD-4 clauses to the

more

(19)

21

RQ1: Most relevant license changes

ArgoUML

None

'Free with copyright clause'-style +'UC Regents free with

copyright clause'-style

ADD

127

Samba

None

GPL v2

ADD

15

ArgoUML and Samba

kept the same

licenses over the analyzed time span

Change is from

None

to a simple license

Authors realized the importance of including a

(20)

22

RQ2: How and why were

copyright years changed?

Files for which the copyright years were

updated underwent a significantly higher

number of changes than others

When developers perform substantial changes to a

file, they also update copyright years

Required by copyright regulations

Lack of updates with substantial changes would

allow an infringer to claim “

innocent infringement

Commits explicitly targeted to copyright years

“Updated copyrights”

(21)

23

RQ3: When do contributors change?

Changes where contributor

names are added are significantly

bigger than other changes

Contributors often added

when they make substantial

changes

Contributor names are important

assets in source code

Like the signature on a picture

However…

contributors can change during the time

no standard way of reporting them

no clear rule on when one should become a

contributor

(22)
(23)

25

Free (software) as a bird…

As

birds

migrate differently

during different seasons….

Code might have a

migration preferential

direction

Given two systems

e.g. FreeBSD and Linux

We find the same code in

both systems

Three scenarios:

Migration FreeBSD

Linux

Migration Linux

FreeBSD

Migration third-party

(24)

27

Sibling(s) Origin

Identify siblings

between systems using clone detection

CCFinderX

, with >100 tokens as threshold, plus other heuristics

Trace back

into past siblings – their code fragments in the

same files

Again clone detection, the sibling fragment wrt. previous file

revisions

When they disappear

, then we have their origins

Take the oldest of the two as the true origin

Sys 1 – File i

Sys 2 – File j

siblings

Cloned fragments

Cloned fragments

(25)

28

Code Migration and Licenses

FreeBSD

Linux

Files

BSD

GPL

8

BSD

MIT

2

BSD

None

2

Corporate

BSD+GPL

89

GPL

None

1

Phrase

BSD+GPL

1

X.Net+BSD MIT

1

Linux

FreeBSD

Files

BSD+GPL

Corporate

8

GPL

BSD

17

GPL

BSD+GPL

1

GPL

CPL+BSD+GPL

1

MIT

BSD

1

MIT+GPL

None

2

None

BSD

1

Phrase+GP

L

MIT

2

OpenBSD

Linux

Files

BSD

BSD+GPL

1

BSD

MIT

2

BSD

Unknown

1

BSD+GPL

GPL

1

BSD+Phras

e

Phrase+GPL

1

MIT

GPL

23

After Jan 1, 2002

Nothing before

Before

Jan 1, 2002

(26)

29

Discussion

Siblings have a

preferential flow

Initially from BSD(s) to Linux – frequent

Today from Linux to FreeBSD – less frequent

Thus, due to licenses but also to the system

level of development

Companies directly contribute

to code in

different kernels – see Intel drivers with

dual licenses

In this case, code migrates from a third party

(27)
(28)

31

Motivations

Very often, Java open source software

is distributed in jar archives

See

http://mvnrepository.com/

Problem:

the jar might not contain

licensing info

Under what conditions can we integrate

the component?

The jar might not be legally used

Even if it’s from open source code, we

(29)

32

Search-driven approach

Extracting info from the class bytecode

Class and package names.. or a fingerprint..

We use the ASM library (

http://asm.ow2.org/

)

Querying Google Code Search

Using the full qualified class name

Using the package only

Query performed using the Google Code API

(

http://code.google.com/apis/gdata/

)

If the same class is not found, its license is

(30)
(31)

34

% of correct classifications

Found license:

Min. 29%

(commons.codec), Avg.

82%, median: 89.5%

Inferred licenses:

Min. 62% (JLayer 1.0),

Avg. 95%, median 100%

The inferring heuristic

significantly better

both in terms of

(32)

35

Incorrect classifications

Most of them are between LGPL

and GPL and between BSD and

Apache.

commons-codec:

mismatching

between Apache and BSD

files licensed under the Apache v 1.1

derived from the BSD

JLayer:

mismatching between GPL

and LGPL

same inferred licenses in both

releases (0.4 and 1.0)

however,

JLayer moved from GPL to

(33)

36

Conclusions

We proposed a code analysis method as

support for lawyers other than for software

engineers

We studied how licensing are used and

evolve

License type, copyright year, contributors

Main findings:

License influence projects outcome

License influence code migration

Moving towards more permissive licenses

Copyright years and contributor names updated

(34)

37

Licensing and code provenance

Licensing influences the

direction in which

code flows

from a system towards another

one

Often code flows in the direction of more

permissive licenses…

..but there are many other factors influencing how

code flows

Search-driven approaches can be adopted to

determine

from what code does a closed

component come from

And thus its licensing…

Issues related to the capabilities of the code

(35)
(36)

39

References

Daniel M. Germán, Jens H. Weber-Jahnke, Massimiliano Di Penta:

Lawful

Software Engineering

, Proceedings of FoSER: Working Conference on the

Future of Software Engineering Research, November 2010, Santa Fe', USA,

2010, ACM

Daniel M. Germán, Massimiliano Di Penta, Julius Davies:

Understanding and

Auditing the Licensing of Open Source Software Distributions

. ICPC 2010:

84-93

Massimiliano Di Penta, Daniel M. Germán, Yann-Gaël Guéhéneuc, Giuliano

Antoniol:

An exploratory study of the evolution of software licensing

. ICSE

2010: 145-154

Massimiliano Di Penta, Daniel M. Germán, Giuliano Antoniol:

Identifying

licensing of jar archives using a code-search approach

. MSR 2010: 151-160

Massimiliano Di Penta, Daniel M. Germán:

Who are Source Code Contributors

and How do they Change?

WCRE 2009: 11-20

Daniel M. Germán, Massimiliano Di Penta, Yann-Gaël Guéhéneuc, Giuliano

Antoniol:

Code siblings: Technical and legal implications of copying code

between applications

. MSR 2009: 81-90

Daniel M. Germán, Yuki Manabe, Katsuro Inoue:

A sentence-matching method

for automatic license identification of source code files

. ASE 2010: 437-446

Daniel M. Germán, Ahmed E. Hassan:

License integration patterns: Addressing

license mismatches in component-based development

. ICSE 2009: 188-198

Referensi

Garis besar

Dokumen terkait

Copyright © 1989 by the Association for Supervision and Curriculum Development. All

16.40 Copyright 1996 Lawrence C. All rights reserved. Reproduction or translation of this work beyond that permitted in Section 117 of the 1976 United States Copyright Act

Copyright © 2015, Oracle and/or its affiliates.. All

Copyright © 2015, Oracle and/or its affiliates. All

McGraw-Hill/Irwin Copyright © 2011 by The McGraw-Hill Companies, Inc.. All

Copyrightc 2002 Gallup Korea Ltd.. All rights

Terms and conditions Privacy policy Copyright © 2021 Elsevier B.V.. All rights

39 Copyright © 2020 ACADEMIA INDUSTRY NETWORKS-All rights reserved The main class is "Social Engineering" and the sub-classes are the types of social engineering techniques; the