Ninka, a license identification tool for Source Code

Table of Contents

1 Ninka

Ninka is a lightweight license identification tool for source code. It is sentence-based, and provides a simple way to identify open source licenses in a source code file. It is capable of identifying several dozen different licenses (and their variations).

Ninka has been designed with the following design goals:

  1. To be lightweight.
  2. To be fast.
  3. To avoid making errors.

2 Background

Ninka is the result of a research project aimed at identifying licenses in source code. It is documented in the paper:

A sentence-matching method for automatic license identification of source code files by D.M. German, Y. Manabe and K. Inoue. In Proceedings of the IEEE/ACM international Conference on Automated Software Engineering (ASE) 2010, pp: 437–446. Paper in PDF format.


The reuse of free and open source software (FOSS) components is becoming more prevalent. One of the major challenges in finding the right component is finding one that has a license that is e for its intended use. The license of a FOSS component is determined by the licenses of its source code files. In this paper, we describe the challenges of identifying the license under which source code is made available, and propose a sentence-based matching algorithm to automatically do it. We demonstrate the feasibility of our approach by implementing a tool named Ninka. We performed an evaluation that shows that Ninka outperforms other methods of license identification in precision and speed. We also performed an empirical study on 0.8 million source code files of Debian that highlight interesting facts about the manner in which licenses are used by FOSS

If you use Ninka for research purposes, we would appreciate you reference the paper. Its bibtex entry is (as provided by ACM):

author = {German, Daniel M. and Manabe, Yuki and Inoue, Katsuro},
title = {A sentence-matching method for automatic license identification of source code files},
booktitle = {Proceedings of the IEEE/ACM international conference on Automated software engineering},
series = {ASE '10},
year = {2010},
isbn = {978-1-4503-0116-9},
location = {Antwerp, Belgium},
pages = {437–446},
numpages = {10},
url = {http://doi.acm.org/10.1145/1858996.1859088},
doi = {http://doi.acm.org/10.1145/1858996.1859088},
acmid = {1859088},
publisher = {ACM},
address = {New York, NY, USA},
keywords = {automated license identification, open source licenses, software licenses},

3 Download

To download the latest version, follow this link. It source code is hosted using git at GitHub: https://github.com/dmgerman/ninka

Except for the directories comments and splitter, Ninka is open source and licenced under the terms of the Affero General Public License version 3.0 (or at your option) any later version–as published by the Free Software Foundation

Copyright (C) 2009-2010  Yuki Manabe and Daniel M. German

This program is free software: you can redistribute it and/or modify it under the terms of the GNU Affero General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU Affero General Public License for more details.

You should have received a copy of the GNU Affero General Public License along with this program. If not, see http://www.gnu.org/licenses/.

splitter.pl is a derivative work of the Rule-based sentence splitter script by Paul Paul Clough. Please see splitter/README for details.

comments is based on a program to remove comments by Jon Newman, it is released under the GNU General Public License Version 2 or (at your option) any later version.

4 Contact

For further information about Ninka, kudos, patches, and/or success stories, please email Daniel M. German at dmg at uvic dot ca.


  1. How do I use it?
    ninka.pl <filename>
  2. How do I scan files in an entire package? Use xargs. For example, to scan all the files in a directory (and its children) use:
    find * | xargs -n1 -I@ ninka.pl '@'

Author: Daniel M German and Yuki Manabe

Date: 2011-01-22 00:40:15 JST

HTML generated by org-mode 7.01trans in emacs 23