Installation

This is an installation guide to evaluate the AI-generated code search features of AboutCode.

The search for AI-generated code needs to have these sub-systems deployed and installed:

  • PurlDB: a database of software package metadata, and an index of their files and file fingerprints.

  • MatchCode: a backend code matching service that has direct access to the PurlDB database and executes a rich pipeline of code matching steps, from exact package, file tree or file, to approximate code tree, file and code fragments, including possibly generated and refactored code fragments.

  • ScanCode.io: a frontend code scanning service that receives the codebase to analyse as inputs, then scans and computes fingerprints to send to the MatchCode backend for matching, before presenting consolidated matching results.

This guide describes and provide pointers on how to install each of the three components.

Alternatively, we maintain a publicly accessible, fully deployed evaluation system with each of these three components. Please reach out to hello@aboutcode.org to gain access to this environment.

Install the backend: PurlDB and MatchCode.io

Follow instructions to get started at: https://aboutcode.readthedocs.io/projects/PURLdb/en/latest/getting-started/index.html

Then follow installation instructions at:

This will also install MatchCode.io.

Install the frontend: ScanCode.io

Follow insructions at: https://scancodeio.readthedocs.io/en/latest/installation.html

Configure ScanCode.io for PurlDB access

Follow insructions at: https://scancodeio.readthedocs.io/en/latest/application-settings.html#purldb

Run a simple test

See instructions at: https://aboutcode.readthedocs.io/projects/PURLdb/en/latest/how-to-guides/matchcode.html

You can also index the package pkg:npm/inherits@2.0.3 to support the step-by-step end to end tutorial Step-by-step: Find AI-generated code with AboutCode.

See also https://zenodo.org/records/15449709 for a larger dataset of popular package PURLs.