SJI Network Explorer

Explore protein similarity through Signal Jaccard Index networks.

ProtDC provides a pre-built SJI network and a local web application for extracting subnetworks around proteins of interest, filtering by species, highlighting proteins, and exporting protein sets for downstream analysis.

Start the Tutorial Read About SJI Install the package

SJI framework overview showing signal proteins, noise proteins, and network structure — SJI compares protein-specific signal sets rather than only direct pairwise sequence similarity.

What SJI Measures

The Signal Jaccard Index compares proteins by the overlap of their signal homolog neighborhoods across proteomes. A high SJI means two proteins share much of the same signal set.

Neighborhood based

SJI uses many homologs associated with each protein, so it captures broader evolutionary and genomic context.

Protein-specific boundaries

Signal and noise homologs are separated by data-derived gaps in two-dimensional similarity plots, not by one fixed cutoff.

Network ready

The Explorer turns SJI relationships into interactive networks where users can expand clusters and inspect local structure.

Currently Available Data

The current release includes pre-built SJI data for bacterial and eukaryotic UniProt reference proteomes, plus the taxonomy files needed for species-aware browsing.

406

UniProt reference proteomes represented in the taxonomy map

Eukaryotic reference proteomes

355

Bacterial reference proteomes

Available Species Tree

Browse the NCBI common tree used by the Explorer's species-aware filtering. Highlighted leaves are the 406 reference proteomes included in the current release. This browser loads its public taxonomy data from commontree.txt and NCBI_txID.csv in this folder.

Search taxa

Installation Options

Use Docker for the simplest local viewer setup. Use Git if you want to inspect or modify the preprocessing and analysis scripts.

Option 1: Docker

Install Docker Desktop, pull the published image, download the release data, and mount the data folder into the container.

docker pull ghcr.io/gang-fang/sji-network-explorer:latest

docker run --rm \
  --name sji-network-explorer \
  -p 3000:3000 \
  -v "$PWD/data:/app/data" \
  ghcr.io/gang-fang/sji-network-explorer:latest

Links: Docker Desktop, data release, and Tutorial data-download instructions.

Option 2: Git clone

Clone the repository when you want the source code, runtime scripts, and preprocessing workflows.

git clone https://github.com/gang-fang/network-viz-platform.git
cd network-viz-platform

After cloning, follow the local setup below to install Node.js dependencies, install Python dependencies, and build topN. The advanced workflows in the tutorial run outside the Docker container and use scripts under tools/preprocessing and tools/bin.

Links: GitHub repository, Git clone setup tutorial, and source workflow tutorial.

Running Locally from a Git Clone

A cloned repository needs Node.js dependencies, Python dependencies, and a compiled topN executable before ingestion and startup.

# 1. Install Node.js dependencies
npm install

# 2. Install Python dependencies
python3 -m venv .venv
source .venv/bin/activate
pip install -r requirements.txt

# 3. Build topN for preprocessing workflows
cd tools/preprocessing/topN_cpp
make
cd ../../..

# 4. Download and organize the release data
# Follow tutorial.html#first-run:
# download all .gz release assets plus install_data.sh, then run:
chmod 755 install_data.sh
./install_data.sh

# 5. Configure environment
cp .env.example .env
# Edit .env as needed: PORT, DB_PATH, DATA_PATH, PYTHON_COMMAND, etc.
# If using the virtual environment above, set:
# PYTHON_COMMAND=.venv/bin/python

# 6. Ingest data files into the database
npm run ingest

# 7. Start the server
npm start

# Or combine steps 6 and 7:
npm run start:ingest-and-serve

Data Files

Place data files in the expected folders before ingestion. The release data should be downloaded and organized by following the Docker setup data-download section in tutorial.html.

Directory or file	Environment variable	Expected content
`data/networks/`	`DATA_PATH`	Network CSV files. Each line stores one edge as `node1,node2,weight`.
`data/indexes/`	`INDEXES_PATH`	Preprocessed graph index triplets such as `Bacteria.adj.bin`, `Bacteria.adj.index.bin`, and `Bacteria.node_ids.tsv`.
`data/nodes_attr/`	`NODE_ATTRIBUTES_PATH`	Exactly one `*.nodes.attr` file with a header row including `node_id`, `NCBI_txID`, `NH_ID`, and `NH_Size`.
`data/NCBI_txID/NCBI_txID.csv`	`SPECIES_PATH`	Two columns: `ncbi_txid,species_name`.
`data/NCBI_txID/commontree.txt`	`TAXON_TREE_PATH`	NCBI common tree text used to render the species filter hierarchy.

The same paths can be overridden in .env. Important variables include DB_PATH, DATA_PATH, INDEXES_PATH, NODE_ATTRIBUTES_PATH, SPECIES_PATH, PYTHON_COMMAND, SUBNETWORK_SCRIPT_PATH, and PORT.