API Reference

class GenomeSpy(height=600, server_port=18089)[source]

Bases: object

A Python wrapper for GenomeSpy visualization library.

Parameters:

height (int, optional) – The height of the visualization in pixels, by default 600
server_port (int)

height

The height of the visualization in pixels

Type:: int

spec

The GenomeSpy specification defining the visualization structure

Type:: dict

_server_port

The port number of the local HTTP server

Type:: int

_template

The HTML template for rendering the visualization

Type:: str

Notes

GenomeSpy is a toolkit for interactive visualization of genomic and other data. It enables tailored visualizations through a declarative grammar inspired by Vega-Lite, allowing mapping of data to visual channels (position, color, etc.) and composing complex visualizations from primitive graphical marks (points, rectangles, etc.).

Key Features: - GPU-accelerated rendering for fluid interaction with large datasets - Support for specialized genomic file formats (BigWig, BigBed, Indexed FASTA) - Built-in genomic coordinate handling and transformations - Interactive zooming and navigation - Composable visualization grammar

load_spec(spec, is_url=False)[source]

Load a GenomeSpy specification.

GenomeSpy specifications define how data should be visualized, including data sources, transformations, and visual encodings. Specifications can be loaded from a JSON file or directly as a dictionary.

Parameters:

spec (Union[str, Dict[str, Any]]) – Either a JSON string/dict containing the spec or a URL to a spec file.
is_url (bool, optional) – Whether the spec is a URL to a JSON file. Defaults to False.

Returns:

The current instance for method chaining.

Return type:

GenomeSpy

save_html(filename)[source]

Save the visualization as a standalone HTML file.

Parameters:: filename (str) – Output HTML file path.

show(filename=None)[source]

Display the visualization in a browser or Jupyter notebook.

Parameters:: filename (str, optional) – Optional filename to save the HTML file. If None, creates a temporary file.

Notes

When running in a Jupyter notebook, the visualization will be displayed inline. Otherwise, it will open in the default web browser.

Examples

>>> plot = GenomeSpy()
>>> # Configure visualization...
>>> plot.show()  # Display inline in notebook
>>>
>>> # Save to specific file
>>> plot.show("visualization.html")

close()[source]

Close the server if it’s running and cleanup temporary files.

Notes

This method should be called when you’re done with the visualization to: - Stop the local HTTP server if running - Remove any temporary files created during visualization - Free up system resources

Examples

>>> plot = GenomeSpy()
>>> # Create visualization...
>>> plot.show()
>>> plot.close()  # Cleanup when done

cleanup()[source]: Cleanup all temporary files, including from previous runs.

data(data, format='json')[source]

Set the data for the visualization.

Parameters:

data (Union[pd.DataFrame, np.ndarray, str]) – The data to visualize. Can be: - pandas DataFrame: Converted to records format - numpy array: Converted to list format - str: URL or path to data file
format (str, optional) – The format of the data file if using URL/path, by default “json” Options include: - “json”: JSON data - “csv”: Comma-separated values - “tsv”: Tab-separated values - “bigwig”: BigWig genomic data - “bigbed”: BigBed genomic data - “fasta”: FASTA sequence data - “gff3”: GFF3 genomic features

Returns:

The current instance for method chaining

Return type:

GenomeSpy

Notes

GenomeSpy utilizes a tabular data structure as its fundamental data model, similar to a spreadsheet or database table. Each dataset consists of records containing named data fields.

Data Sources: - Eager data: Fully loaded during initialization (CSV, TSV, JSON) - Lazy data: Loaded on-demand (BigWig, BigBed, Indexed FASTA) - Named data: Can be dynamically updated using the API

Examples

>>> import pandas as pd
>>> from genomespy import GenomeSpy
>>>
>>> # Using pandas DataFrame
>>> df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})
>>> plot = GenomeSpy()
>>> plot.data(df)
>>>
>>> # Using file path
>>> plot.data("data.bigwig", format="bigwig")

transform(transform)[source]

Add transformations to the visualization specification.

Parameters:: transform (List[Dict[str, Any]]) – A list of transformation specifications. Each transformation is a dictionary with at least a “type” field and transformation-specific parameters.
Returns:: The current instance for method chaining
Return type:: GenomeSpy

Notes

Transformations allow data manipulation before visualization. GenomeSpy provides specialized transformations for genomic data visualization and analysis tasks.

Common Transformations: - formula: Calculate new fields using expressions - filter: Filter data based on conditions - flatten: Flatten nested data structures - coverage: Calculate coverage from interval data - pileup: Create piled-up layout for overlapping features - flattenSequence: Split sequences into individual bases - collect: Group and sort data - project: Select and rename fields

Examples

>>> plot = GenomeSpy()
>>> plot.transform([
...     {
...         "type": "formula",
...         "expr": "datum.end - datum.start",
...         "as": "length"
...     },
...     {
...         "type": "filter",
...         "expr": "datum.length > 1000"
...     }
... ])

mark(mark_type, **kwargs)[source]

Set the mark type for the visualization.

Parameters:

mark_type (str) – The type of mark to use
**kwargs (dict) – Additional mark properties to configure appearance and behavior

Returns:

The current instance for method chaining

Return type:

GenomeSpy

Notes

Marks are the basic graphical elements used to represent data. GenomeSpy provides various mark types suitable for genomic data visualization.

Mark Types: - rect: Rectangles (good for intervals, exons) - point: Points (good for variants, peaks) - line: Lines (good for continuous data) - rule: Rules (good for boundaries) - text: Text labels - area: Filled areas

Mark Properties: - size: Size of the mark - color: Color of the mark - opacity: Transparency - strokeWidth: Width of stroke - tooltip: Tooltip configuration - minWidth: Minimum width for visibility - minOpacity: Minimum opacity for visibility

Examples

>>> plot = GenomeSpy()
>>> plot.mark("rect",
...     size=5,
...     minWidth=0.5,
...     tooltip={"content": "data"}
... )

encode(**kwargs)[source]

Set the encoding for the visualization.

Encodings map data fields to visual properties. GenomeSpy supports various encoding types and provides special support for genomic coordinates.

Parameters:

**kwargs (dict) – Encoding specifications for different channels. Each specification should be a dictionary defining the encoding properties.

Returns:

GenomeSpy – The current instance for method chaining.
Supported Channels
—————-
- x, y (Position encoding)
- x2, y2 (Secondary position for intervals)
- color (Color encoding)
- opacity (Transparency)
- size (Size of marks)
- text (Text content)
- tooltip (Tooltip content)
- sample (Sample ID for multi-sample visualizations)
Data Types
———
- quantitative (Numerical values)
- nominal (Categorical values)
- ordinal (Ordered categories)
- locus (Genomic coordinates (requires chrom and pos fields))

Examples

>>> plot = GenomeSpy()
>>> plot.encode(
...     x={"chrom": "chr", "pos": "start", "type": "locus"},
...     y={"field": "value", "type": "quantitative"},
...     color={"field": "category", "type": "nominal"}
... )

scale(**kwargs)[source]

Set the scales for the visualization.

Scales are functions that map abstract data values (e.g., a type of mutation) to visual values (e.g., colors). GenomeSpy implements most of Vega-Lite’s scale types and adds specialized scales for genomic data.

Parameters:

**kwargs (dict) – Scale specifications for different channels. Each specification can include: - type: The type of scale to use - domain: Input domain range - range: Output range values - nice: Whether to extend domain to nice round numbers - padding: Padding to add around domain - scheme: Color scheme for color scales

Returns:

GenomeSpy – The current instance for method chaining.
Supported Scale Types
——————-
- linear (Linear mapping for quantitative data)
- pow (Power scale for quantitative data)
- sqrt (Square root scale for quantitative data)
- symlog (Symmetric log scale)
- log (Logarithmic scale)
- ordinal (Discrete mapping for categorical data)
- band (Special scale for discrete ranges)
- point (Position-based scale)
- quantize (Binning for continuous data)
- threshold (Threshold-based binning)

Examples

>>> plot = GenomeSpy()
>>> plot.scale(
...     y={
...         "type": "linear",
...         "domain": [0, 1],
...         "range": [0, 100],
...         "nice": True
...     },
...     color={
...         "type": "ordinal",
...         "domain": ["A", "C", "G", "T"],
...         "range": ["red", "blue", "green", "yellow"]
...     }
... )

view(view_spec)[source]

Add a view to the visualization.

Views in GenomeSpy allow for hierarchical composition of visualizations. Views can be concatenated, layered, or arranged in other ways. Each view inherits data and encoding from its parent but can override them with its own specifications.

Parameters:

view_spec (Dict[str, Any]) – The view specification defining the visualization properties, data, marks, and encodings for this view.

Returns:

GenomeSpy – The current instance for method chaining.
View Properties
————–
- data (Data source for the view)
- transform (Data transformations)
- mark (Visual marks to represent data)
- encoding (Visual encodings)
- height (View height)
- width (View width)
- name (Unique identifier for the view)
- title (View title)
- description (View description)
- padding (Space around the view)
- opacity (View opacity)
- configurableVisibility (Whether view can be toggled)

Examples

>>> plot = GenomeSpy()
>>> plot.view({
...     "name": "genes",
...     "height": 120,
...     "data": {"url": "genes.bed"},
...     "mark": "rect",
...     "encoding": {
...         "x": {"chrom": "chr", "pos": "start", "type": "locus"},
...         "x2": {"chrom": "chr", "pos": "end"}
...     }
... })

import_view(url)[source]

Import a view from a URL.

This function allows importing external view specifications, enabling reuse and sharing of visualization components. Common uses include importing standard genomic tracks like: - Chromosome ideograms - Gene annotation tracks - Reference genome sequences

Parameters:

url (str) – The URL or path to the view specification to import. Can be absolute URL or relative to the base URL.

Returns:

GenomeSpy – The current instance for method chaining.
Built-in Views
————-
The following views are available in the .genomespy_shared/ directory
- cytobands.json (Chromosome ideogram track)
- genes.json (Gene annotation track)
- hg38.json (Reference genome sequence)

Examples

>>> plot = GenomeSpy()
>>> # Import chromosome ideogram
>>> plot.import_view(".genomespy_shared/cytobands.json")
>>>
>>> # Import gene annotations
>>> plot.import_view(".genomespy_shared/genes.json")
>>>
>>> # Import reference genome
>>> plot.import_view(".genomespy_shared/hg38.json")

expression(name, expr)[source]

Add an expression to the visualization.

Expressions in GenomeSpy allow for computing new data fields or modifying existing ones. They use a JavaScript-like syntax and can access the current data object using ‘datum’. Expressions can be used in transforms, encodings, and other places where dynamic computation is needed.

Parameters:

name (str) – The name of the expression to be referenced elsewhere in the specification.
expr (str) – The expression string using GenomeSpy’s expression syntax. Can access current data object via ‘datum’.

Returns:

GenomeSpy – The current instance for method chaining.
Common Uses
———-
- Computing derived values
- Conditional logic
- String manipulation
- Mathematical calculations
- Accessing parameters

Examples

>>> plot = GenomeSpy()
>>> # Calculate length of genomic interval
>>> plot.expression("length", "datum.end - datum.start")
>>>
>>> # Compute log ratio
>>> plot.expression("logRatio", "log2(datum.value / datum.control)")
>>>
>>> # Create conditional label
>>> plot.expression(
...     "label",
...     "datum.score > 0.05 ? 'High impact' : 'Low impact'"
... )

parameter(name, value)[source]

Add a parameter to the visualization.

Parameters enable dynamic behaviors and interactions in GenomeSpy visualizations. They can be used for interactive selections, conditional encoding, data filtering, and parameterizing imported specifications.

Parameters:

name (str) – The name of the parameter to be referenced in expressions and conditions.
value (Any) – The parameter value or configuration. Can be a simple value or a parameter definition object.

Returns:

GenomeSpy – The current instance for method chaining.
Parameter Types
————–
- Selection parameters (Enable interactive data selection)
- Value parameters (Store single values)
- Range parameters (Store numeric ranges)
- Vector parameters (Store arrays of values)
Common Uses
———-
- Interactive filtering
- Conditional encoding
- Dynamic thresholds
- Coordinated selections
- View parameterization

Examples

>>> plot = GenomeSpy()
>>> # Selection parameter for interactive highlighting
>>> plot.parameter("highlight", {
...     "select": {"type": "point", "on": "pointerover"}
... })
>>>
>>> # Value parameter for filtering
>>> plot.parameter("threshold", 0.05)
>>>
>>> # Use in encoding
>>> plot.encode(
...     opacity={
...         "condition": {"param": "highlight", "value": 1.0},
...         "value": 0.3
...     }
... )

to_json()[source]

Convert the specification to a JSON string.

This function serializes the current GenomeSpy specification into a JSON string, which can be used for saving or sharing the visualization configuration.

Returns:: The JSON string representation of the specification.
Return type:: str

Examples

>>> plot = GenomeSpy()
>>> plot.encode(x={"field": "value", "type": "quantitative"})
>>> json_spec = plot.to_json()

heatmap(data, x_label='x', y_label='y')[source]

Create a heatmap from a pandas DataFrame.

Heatmaps are a common way to visualize matrix-like data, where values are represented by colors. This function prepares the data and sets up the GenomeSpy specification for rendering a heatmap.

Parameters:

data (pd.DataFrame) – A pandas DataFrame containing the data for the heatmap.
x_label (str, optional) – The label for the x-axis. Defaults to “x”.
y_label (str, optional) – The label for the y-axis. Defaults to “y”.

Returns:

The current instance for method chaining.

Return type:

GenomeSpy

Examples

>>> import pandas as pd
>>> plot = GenomeSpy()
>>> data = pd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     'C': [7, 8, 9]
... })
>>> plot.heatmap(data, x_label="Samples", y_label="Features")

clustermap(data, x_label='x', y_label='y', method='ward', metric='euclidean', z_score=None, standard_scale=None, row_cluster=True, col_cluster=True, vmax=None, vmin=None, center=None, cmap='viridis')[source]

Create a clustermap from a pandas DataFrame.

A clustermap combines a heatmap with hierarchical clustering dendrograms on both axes. The clustering helps reveal patterns and relationships in the data by grouping similar rows and columns together.

Parameters:

data (pd.DataFrame) – Input data matrix to be clustered and visualized
x_label (str, optional) – Label for x-axis, by default “x”
y_label (str, optional) – Label for y-axis, by default “y”
method (str, optional) – Linkage method for hierarchical clustering, by default “ward”
metric (str, optional) – Distance metric for clustering, by default “euclidean”
z_score (int, optional) – Standardize the data along rows (0) or columns (1), by default None
standard_scale (int, optional) – Scale data along rows (0) or columns (1), by default None
row_cluster (bool, optional) – Whether to cluster rows, by default True
col_cluster (bool, optional) – Whether to cluster columns, by default True
vmax (float, optional) – Maximum value for color scaling, by default None
vmin (float, optional) – Minimum value for color scaling, by default None
center (float, optional) – Center value for diverging colormaps, by default None
cmap (str, optional) – Colormap name, either “viridis” or “blues”, by default “viridis”

Returns:

The current instance for method chaining

Return type:

GenomeSpy

Examples

>>> import pandas as pd
>>> from genomespy import GenomeSpy
>>>
>>> # Create sample data
>>> data = pd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [2, 4, 6],
...     'C': [3, 6, 9]
... })
>>>
>>> # Create and display clustermap
>>> plot = GenomeSpy()
>>> plot.clustermap(
...     data,
...     x_label="Samples",
...     y_label="Features",
...     z_score=1,
...     method="ward"
... )

dendrogram(data, method='ward', metric='euclidean')[source]

Create a dendrogram using GenomeSpy.

Dendrograms are tree-like diagrams used to visualize the arrangement of clusters produced by hierarchical clustering.

Parameters:

data (pd.DataFrame) – Input data matrix for clustering
method (str, optional) – Linkage method for clustering, by default “ward”
metric (str, optional) – Distance metric for clustering, by default “euclidean”

Returns:

The current instance for method chaining

Return type:

GenomeSpy

Examples

>>> import pandas as pd
>>> plot = GenomeSpy()
>>> data = pd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6]
... })
>>> plot.dendrogram(data, method="ward", metric="euclidean")

show_gradio(filename=None)[source]

Return the HTML content for Gradio integration.

Returns:: The HTML representation of the visualization.
Return type:: str

igv(file_dict, region=None, height=600, server_port=18089, gs=None)[source]

Create a GenomeSpy visualization with custom tracks in IGV style.

This function creates a genome browser visualization similar to IGV (Integrative Genomics Viewer), with support for various genomic data formats and customizable tracks.

Parameters:

file_dict (Dict[str, Dict[str, Any]]) – A dictionary mapping track names to their configurations. Each track configuration should specify: - url or path : Path to the data file - type : Data format (e.g., “bigwig”, “bigbed”) - height : Track height in pixels
region (Optional[Dict[str, Any]], optional) – The genomic region to display, by default None. Should contain: - chrom : Chromosome name - start : Start position - end : End position
height (int, optional) – The height of the visualization in pixels, by default 600
server_port (int, optional) – The port number for the GenomeSpy server, by default 18089
gs (GenomeSpy, optional) – An existing GenomeSpy instance to reuse, by default None

Returns:

The configured GenomeSpy instance ready for display

Return type:

GenomeSpy

Examples

>>> from genomespy import igv
>>> # Configure tracks
>>> tracks = {
...     "ZBTB7A": {
...         "url": "https://chip-atlas.dbcls.jp/data/hg38/eachData/bw/SRX3161009.bw",
...         "height": 40,
...         "type": "bigwig"
...     }
... }
>>> # Create visualization
>>> plot = igv(
...     tracks,
...     region={"chrom": "chr7", "start": 66600000, "end": 66800000}
... )
>>> plot.show()

Core Functionality

class RangeRequestHandler(*args, directory=None, **kwargs)[source]

Bases: SimpleHTTPRequestHandler

HTTP handler that supports range requests for bigwig/bigbed files.

This handler extends the SimpleHTTPRequestHandler to support HTTP range requests, which are necessary for serving large genomic data files like bigwig and bigbed.

BINARY_EXTENSIONS

List of file extensions considered as binary.

Type:: list

BINARY_EXTENSIONS = ['.bw', '.bigwig']

log_message(format, *args)[source]

Log an arbitrary message.

This is used by all other logging functions. Override it if you have specific logging wishes.

The first argument, FORMAT, is a format string for the message to be logged. If the format string contains any % escapes requiring parameters, they should be specified as subsequent arguments (it’s just like printf!).

The client ip and current date/time are prefixed to every message.

Unicode control characters are replaced with escaped hex before writing the output to stderr.

guess_type(path)[source]

Guess the type of a file based on its extension.

Parameters:: path (str) – The file path.
Returns:: The MIME type of the file.
Return type:: str

send_head()[source]

Common code for GET and HEAD commands.

Returns:: The file object to be sent to the client, or None if an error occurs.
Return type:: file object or None

handle()[source]: Handle multiple requests if necessary.

handle_one_request()[source]: Handle a single HTTP request.

copyfile(source, outputfile)[source]

Copy all data between two file objects.

Parameters:

source (file object) – The source file object.
outputfile (file object) – The destination file object.

class GenomeSpy(height=600, server_port=18089)[source]

Bases: object

A Python wrapper for GenomeSpy visualization library.

Parameters:

height (int, optional) – The height of the visualization in pixels, by default 600
server_port (int)

height

The height of the visualization in pixels

Type:: int

spec

The GenomeSpy specification defining the visualization structure

Type:: dict

_server_port

The port number of the local HTTP server

Type:: int

_template

The HTML template for rendering the visualization

Type:: str

Notes

GenomeSpy is a toolkit for interactive visualization of genomic and other data. It enables tailored visualizations through a declarative grammar inspired by Vega-Lite, allowing mapping of data to visual channels (position, color, etc.) and composing complex visualizations from primitive graphical marks (points, rectangles, etc.).

Key Features: - GPU-accelerated rendering for fluid interaction with large datasets - Support for specialized genomic file formats (BigWig, BigBed, Indexed FASTA) - Built-in genomic coordinate handling and transformations - Interactive zooming and navigation - Composable visualization grammar

load_spec(spec, is_url=False)[source]

Load a GenomeSpy specification.

GenomeSpy specifications define how data should be visualized, including data sources, transformations, and visual encodings. Specifications can be loaded from a JSON file or directly as a dictionary.

Parameters:

spec (Union[str, Dict[str, Any]]) – Either a JSON string/dict containing the spec or a URL to a spec file.
is_url (bool, optional) – Whether the spec is a URL to a JSON file. Defaults to False.

Returns:

The current instance for method chaining.

Return type:

GenomeSpy

save_html(filename)[source]

Save the visualization as a standalone HTML file.

Parameters:: filename (str) – Output HTML file path.

show(filename=None)[source]

Display the visualization in a browser or Jupyter notebook.

Parameters:: filename (str, optional) – Optional filename to save the HTML file. If None, creates a temporary file.

Notes

When running in a Jupyter notebook, the visualization will be displayed inline. Otherwise, it will open in the default web browser.

Examples

>>> plot = GenomeSpy()
>>> # Configure visualization...
>>> plot.show()  # Display inline in notebook
>>>
>>> # Save to specific file
>>> plot.show("visualization.html")

close()[source]

Close the server if it’s running and cleanup temporary files.

Notes

This method should be called when you’re done with the visualization to: - Stop the local HTTP server if running - Remove any temporary files created during visualization - Free up system resources

Examples

>>> plot = GenomeSpy()
>>> # Create visualization...
>>> plot.show()
>>> plot.close()  # Cleanup when done

cleanup()[source]: Cleanup all temporary files, including from previous runs.

data(data, format='json')[source]

Set the data for the visualization.

Parameters:

data (Union[pd.DataFrame, np.ndarray, str]) – The data to visualize. Can be: - pandas DataFrame: Converted to records format - numpy array: Converted to list format - str: URL or path to data file
format (str, optional) – The format of the data file if using URL/path, by default “json” Options include: - “json”: JSON data - “csv”: Comma-separated values - “tsv”: Tab-separated values - “bigwig”: BigWig genomic data - “bigbed”: BigBed genomic data - “fasta”: FASTA sequence data - “gff3”: GFF3 genomic features

Returns:

The current instance for method chaining

Return type:

GenomeSpy

Notes

GenomeSpy utilizes a tabular data structure as its fundamental data model, similar to a spreadsheet or database table. Each dataset consists of records containing named data fields.

Data Sources: - Eager data: Fully loaded during initialization (CSV, TSV, JSON) - Lazy data: Loaded on-demand (BigWig, BigBed, Indexed FASTA) - Named data: Can be dynamically updated using the API

Examples

>>> import pandas as pd
>>> from genomespy import GenomeSpy
>>>
>>> # Using pandas DataFrame
>>> df = pd.DataFrame({'x': [1, 2, 3], 'y': [4, 5, 6]})
>>> plot = GenomeSpy()
>>> plot.data(df)
>>>
>>> # Using file path
>>> plot.data("data.bigwig", format="bigwig")

transform(transform)[source]

Add transformations to the visualization specification.

Parameters:: transform (List[Dict[str, Any]]) – A list of transformation specifications. Each transformation is a dictionary with at least a “type” field and transformation-specific parameters.
Returns:: The current instance for method chaining
Return type:: GenomeSpy

Notes

Transformations allow data manipulation before visualization. GenomeSpy provides specialized transformations for genomic data visualization and analysis tasks.

Common Transformations: - formula: Calculate new fields using expressions - filter: Filter data based on conditions - flatten: Flatten nested data structures - coverage: Calculate coverage from interval data - pileup: Create piled-up layout for overlapping features - flattenSequence: Split sequences into individual bases - collect: Group and sort data - project: Select and rename fields

Examples

>>> plot = GenomeSpy()
>>> plot.transform([
...     {
...         "type": "formula",
...         "expr": "datum.end - datum.start",
...         "as": "length"
...     },
...     {
...         "type": "filter",
...         "expr": "datum.length > 1000"
...     }
... ])

mark(mark_type, **kwargs)[source]

Set the mark type for the visualization.

Parameters:

mark_type (str) – The type of mark to use
**kwargs (dict) – Additional mark properties to configure appearance and behavior

Returns:

The current instance for method chaining

Return type:

GenomeSpy

Notes

Marks are the basic graphical elements used to represent data. GenomeSpy provides various mark types suitable for genomic data visualization.

Mark Types: - rect: Rectangles (good for intervals, exons) - point: Points (good for variants, peaks) - line: Lines (good for continuous data) - rule: Rules (good for boundaries) - text: Text labels - area: Filled areas

Mark Properties: - size: Size of the mark - color: Color of the mark - opacity: Transparency - strokeWidth: Width of stroke - tooltip: Tooltip configuration - minWidth: Minimum width for visibility - minOpacity: Minimum opacity for visibility

Examples

>>> plot = GenomeSpy()
>>> plot.mark("rect",
...     size=5,
...     minWidth=0.5,
...     tooltip={"content": "data"}
... )

encode(**kwargs)[source]

Set the encoding for the visualization.

Encodings map data fields to visual properties. GenomeSpy supports various encoding types and provides special support for genomic coordinates.

Parameters:

**kwargs (dict) – Encoding specifications for different channels. Each specification should be a dictionary defining the encoding properties.

Returns:

GenomeSpy – The current instance for method chaining.
Supported Channels
—————-
- x, y (Position encoding)
- x2, y2 (Secondary position for intervals)
- color (Color encoding)
- opacity (Transparency)
- size (Size of marks)
- text (Text content)
- tooltip (Tooltip content)
- sample (Sample ID for multi-sample visualizations)
Data Types
———
- quantitative (Numerical values)
- nominal (Categorical values)
- ordinal (Ordered categories)
- locus (Genomic coordinates (requires chrom and pos fields))

Examples

>>> plot = GenomeSpy()
>>> plot.encode(
...     x={"chrom": "chr", "pos": "start", "type": "locus"},
...     y={"field": "value", "type": "quantitative"},
...     color={"field": "category", "type": "nominal"}
... )

scale(**kwargs)[source]

Set the scales for the visualization.

Scales are functions that map abstract data values (e.g., a type of mutation) to visual values (e.g., colors). GenomeSpy implements most of Vega-Lite’s scale types and adds specialized scales for genomic data.

Parameters:

**kwargs (dict) – Scale specifications for different channels. Each specification can include: - type: The type of scale to use - domain: Input domain range - range: Output range values - nice: Whether to extend domain to nice round numbers - padding: Padding to add around domain - scheme: Color scheme for color scales

Returns:

GenomeSpy – The current instance for method chaining.
Supported Scale Types
——————-
- linear (Linear mapping for quantitative data)
- pow (Power scale for quantitative data)
- sqrt (Square root scale for quantitative data)
- symlog (Symmetric log scale)
- log (Logarithmic scale)
- ordinal (Discrete mapping for categorical data)
- band (Special scale for discrete ranges)
- point (Position-based scale)
- quantize (Binning for continuous data)
- threshold (Threshold-based binning)

Examples

>>> plot = GenomeSpy()
>>> plot.scale(
...     y={
...         "type": "linear",
...         "domain": [0, 1],
...         "range": [0, 100],
...         "nice": True
...     },
...     color={
...         "type": "ordinal",
...         "domain": ["A", "C", "G", "T"],
...         "range": ["red", "blue", "green", "yellow"]
...     }
... )

view(view_spec)[source]

Add a view to the visualization.

Views in GenomeSpy allow for hierarchical composition of visualizations. Views can be concatenated, layered, or arranged in other ways. Each view inherits data and encoding from its parent but can override them with its own specifications.

Parameters:

view_spec (Dict[str, Any]) – The view specification defining the visualization properties, data, marks, and encodings for this view.

Returns:

GenomeSpy – The current instance for method chaining.
View Properties
————–
- data (Data source for the view)
- transform (Data transformations)
- mark (Visual marks to represent data)
- encoding (Visual encodings)
- height (View height)
- width (View width)
- name (Unique identifier for the view)
- title (View title)
- description (View description)
- padding (Space around the view)
- opacity (View opacity)
- configurableVisibility (Whether view can be toggled)

Examples

>>> plot = GenomeSpy()
>>> plot.view({
...     "name": "genes",
...     "height": 120,
...     "data": {"url": "genes.bed"},
...     "mark": "rect",
...     "encoding": {
...         "x": {"chrom": "chr", "pos": "start", "type": "locus"},
...         "x2": {"chrom": "chr", "pos": "end"}
...     }
... })

import_view(url)[source]

Import a view from a URL.

This function allows importing external view specifications, enabling reuse and sharing of visualization components. Common uses include importing standard genomic tracks like: - Chromosome ideograms - Gene annotation tracks - Reference genome sequences

Parameters:

url (str) – The URL or path to the view specification to import. Can be absolute URL or relative to the base URL.

Returns:

GenomeSpy – The current instance for method chaining.
Built-in Views
————-
The following views are available in the .genomespy_shared/ directory
- cytobands.json (Chromosome ideogram track)
- genes.json (Gene annotation track)
- hg38.json (Reference genome sequence)

Examples

>>> plot = GenomeSpy()
>>> # Import chromosome ideogram
>>> plot.import_view(".genomespy_shared/cytobands.json")
>>>
>>> # Import gene annotations
>>> plot.import_view(".genomespy_shared/genes.json")
>>>
>>> # Import reference genome
>>> plot.import_view(".genomespy_shared/hg38.json")

expression(name, expr)[source]

Add an expression to the visualization.

Expressions in GenomeSpy allow for computing new data fields or modifying existing ones. They use a JavaScript-like syntax and can access the current data object using ‘datum’. Expressions can be used in transforms, encodings, and other places where dynamic computation is needed.

Parameters:

name (str) – The name of the expression to be referenced elsewhere in the specification.
expr (str) – The expression string using GenomeSpy’s expression syntax. Can access current data object via ‘datum’.

Returns:

GenomeSpy – The current instance for method chaining.
Common Uses
———-
- Computing derived values
- Conditional logic
- String manipulation
- Mathematical calculations
- Accessing parameters

Examples

>>> plot = GenomeSpy()
>>> # Calculate length of genomic interval
>>> plot.expression("length", "datum.end - datum.start")
>>>
>>> # Compute log ratio
>>> plot.expression("logRatio", "log2(datum.value / datum.control)")
>>>
>>> # Create conditional label
>>> plot.expression(
...     "label",
...     "datum.score > 0.05 ? 'High impact' : 'Low impact'"
... )

parameter(name, value)[source]

Add a parameter to the visualization.

Parameters enable dynamic behaviors and interactions in GenomeSpy visualizations. They can be used for interactive selections, conditional encoding, data filtering, and parameterizing imported specifications.

Parameters:

name (str) – The name of the parameter to be referenced in expressions and conditions.
value (Any) – The parameter value or configuration. Can be a simple value or a parameter definition object.

Returns:

GenomeSpy – The current instance for method chaining.
Parameter Types
————–
- Selection parameters (Enable interactive data selection)
- Value parameters (Store single values)
- Range parameters (Store numeric ranges)
- Vector parameters (Store arrays of values)
Common Uses
———-
- Interactive filtering
- Conditional encoding
- Dynamic thresholds
- Coordinated selections
- View parameterization

Examples

>>> plot = GenomeSpy()
>>> # Selection parameter for interactive highlighting
>>> plot.parameter("highlight", {
...     "select": {"type": "point", "on": "pointerover"}
... })
>>>
>>> # Value parameter for filtering
>>> plot.parameter("threshold", 0.05)
>>>
>>> # Use in encoding
>>> plot.encode(
...     opacity={
...         "condition": {"param": "highlight", "value": 1.0},
...         "value": 0.3
...     }
... )

to_json()[source]

Convert the specification to a JSON string.

This function serializes the current GenomeSpy specification into a JSON string, which can be used for saving or sharing the visualization configuration.

Returns:: The JSON string representation of the specification.
Return type:: str

Examples

>>> plot = GenomeSpy()
>>> plot.encode(x={"field": "value", "type": "quantitative"})
>>> json_spec = plot.to_json()

heatmap(data, x_label='x', y_label='y')[source]

Create a heatmap from a pandas DataFrame.

Heatmaps are a common way to visualize matrix-like data, where values are represented by colors. This function prepares the data and sets up the GenomeSpy specification for rendering a heatmap.

Parameters:

data (pd.DataFrame) – A pandas DataFrame containing the data for the heatmap.
x_label (str, optional) – The label for the x-axis. Defaults to “x”.
y_label (str, optional) – The label for the y-axis. Defaults to “y”.

Returns:

The current instance for method chaining.

Return type:

GenomeSpy

Examples

>>> import pandas as pd
>>> plot = GenomeSpy()
>>> data = pd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6],
...     'C': [7, 8, 9]
... })
>>> plot.heatmap(data, x_label="Samples", y_label="Features")

clustermap(data, x_label='x', y_label='y', method='ward', metric='euclidean', z_score=None, standard_scale=None, row_cluster=True, col_cluster=True, vmax=None, vmin=None, center=None, cmap='viridis')[source]

Create a clustermap from a pandas DataFrame.

A clustermap combines a heatmap with hierarchical clustering dendrograms on both axes. The clustering helps reveal patterns and relationships in the data by grouping similar rows and columns together.

Parameters:

data (pd.DataFrame) – Input data matrix to be clustered and visualized
x_label (str, optional) – Label for x-axis, by default “x”
y_label (str, optional) – Label for y-axis, by default “y”
method (str, optional) – Linkage method for hierarchical clustering, by default “ward”
metric (str, optional) – Distance metric for clustering, by default “euclidean”
z_score (int, optional) – Standardize the data along rows (0) or columns (1), by default None
standard_scale (int, optional) – Scale data along rows (0) or columns (1), by default None
row_cluster (bool, optional) – Whether to cluster rows, by default True
col_cluster (bool, optional) – Whether to cluster columns, by default True
vmax (float, optional) – Maximum value for color scaling, by default None
vmin (float, optional) – Minimum value for color scaling, by default None
center (float, optional) – Center value for diverging colormaps, by default None
cmap (str, optional) – Colormap name, either “viridis” or “blues”, by default “viridis”

Returns:

The current instance for method chaining

Return type:

GenomeSpy

Examples

>>> import pandas as pd
>>> from genomespy import GenomeSpy
>>>
>>> # Create sample data
>>> data = pd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [2, 4, 6],
...     'C': [3, 6, 9]
... })
>>>
>>> # Create and display clustermap
>>> plot = GenomeSpy()
>>> plot.clustermap(
...     data,
...     x_label="Samples",
...     y_label="Features",
...     z_score=1,
...     method="ward"
... )

dendrogram(data, method='ward', metric='euclidean')[source]

Create a dendrogram using GenomeSpy.

Dendrograms are tree-like diagrams used to visualize the arrangement of clusters produced by hierarchical clustering.

Parameters:

data (pd.DataFrame) – Input data matrix for clustering
method (str, optional) – Linkage method for clustering, by default “ward”
metric (str, optional) – Distance metric for clustering, by default “euclidean”

Returns:

The current instance for method chaining

Return type:

GenomeSpy

Examples

>>> import pandas as pd
>>> plot = GenomeSpy()
>>> data = pd.DataFrame({
...     'A': [1, 2, 3],
...     'B': [4, 5, 6]
... })
>>> plot.dendrogram(data, method="ward", metric="euclidean")

show_gradio(filename=None)[source]

Return the HTML content for Gradio integration.

Returns:: The HTML representation of the visualization.
Return type:: str

create_track_spec(track_name, track_config, region)[source]

Create a track specification for GenomeSpy.

Parameters:

track_name (str) – The name of the track
track_config (Dict[str, Any]) – The configuration for the track
region (Dict[str, Any]) – The genomic region for the track

Returns:

The complete track specification

Return type:

Dict[str, Any]

Examples

>>> region = {"chrom": "chr1", "start": 1000, "end": 2000}
>>> config = {
...     "type": "bigwig",
...     "url": "data.bw",
...     "height": 100
... }
>>> spec = create_track_spec("Coverage", config, region)

create_base_spec(region)[source]

Create the base specification for GenomeSpy visualization.

Parameters:: region (Dict[str, Any]) – The genomic region for the visualization
Returns:: The base specification including schema and default tracks
Return type:: Dict[str, Any]

Examples

>>> region = {"chrom": "chr1", "start": 1000, "end": 2000}
>>> base_spec = create_base_spec(region)

igv(file_dict, region=None, height=600, server_port=18089, gs=None)[source]

Create a GenomeSpy visualization with custom tracks in IGV style.

This function creates a genome browser visualization similar to IGV (Integrative Genomics Viewer), with support for various genomic data formats and customizable tracks.

Parameters:

file_dict (Dict[str, Dict[str, Any]]) – A dictionary mapping track names to their configurations. Each track configuration should specify: - url or path : Path to the data file - type : Data format (e.g., “bigwig”, “bigbed”) - height : Track height in pixels
region (Optional[Dict[str, Any]], optional) – The genomic region to display, by default None. Should contain: - chrom : Chromosome name - start : Start position - end : End position
height (int, optional) – The height of the visualization in pixels, by default 600
server_port (int, optional) – The port number for the GenomeSpy server, by default 18089
gs (GenomeSpy, optional) – An existing GenomeSpy instance to reuse, by default None

Returns:

The configured GenomeSpy instance ready for display

Return type:

GenomeSpy

Examples

>>> from genomespy import igv
>>> # Configure tracks
>>> tracks = {
...     "ZBTB7A": {
...         "url": "https://chip-atlas.dbcls.jp/data/hg38/eachData/bw/SRX3161009.bw",
...         "height": 40,
...         "type": "bigwig"
...     }
... }
>>> # Create visualization
>>> plot = igv(
...     tracks,
...     region={"chrom": "chr7", "start": 66600000, "end": 66800000}
... )
>>> plot.show()

create_ccre_track(region)[source]

Create the cCRE track specification.

Parameters:

(Dict[str (region)
Any]) (The genomic region for the track.)
region (Dict[str, Any])

Returns:

The cCRE track specification.

Return type:

Dict[str, Any]

create_gencode_track(region)[source]

Create the Gencode track specification.

Parameters:

(Dict[str (region)
Any]) (The genomic region for the track.)
region (Dict[str, Any])

Returns:

The Gencode track specification.

Return type:

Dict[str, Any]

create_gencode_encoding(region)[source]

Create the encoding specification for the Gencode track.

Parameters:: region (Dict[str, Any]) – The genomic region for the track.
Returns:: The encoding specification.
Return type:: Dict[str, Any]

create_gencode_layers()[source]

Create the layer specifications for the Gencode track.

Returns:: The list of layer specifications.
Return type:: list

create_gencode_exons_layer()[source]

Create the exons layer specification for the Gencode track.

Returns:: The exons layer specification.
Return type:: Dict[str, Any]

create_exon_layer()[source]

Create the exon sublayer specification.

Returns:: The exon sublayer specification.
Return type:: Dict[str, Any]

create_feature_layer()[source]

Create the feature sublayer specification.

Returns:: The feature sublayer specification.
Return type:: Dict[str, Any]

create_utr_label_layer()[source]

Create the UTR label sublayer specification.

Returns:: The UTR label sublayer specification.
Return type:: Dict[str, Any]

create_gencode_labels_layer()[source]

Create the labels layer specification for the Gencode track.

Returns:: The labels layer specification.
Return type:: Dict[str, Any]