Handling of Network File Formats

For All Network Types

  • sourceFormat
    • Each network has a "sourceFormat" attribute that records the format in which it was imported or otherwise created.
    • It is maintained by the NDEx Server and currently cannot be changed by the user.

SIF and Extended Binary SIF Networks

The simple interaction format is convenient for building a graph from a list of interactions. It also makes it easy to combine different interaction sets into a larger network, or add new interactions to an existing data set.

A . If a tab ‘\t' character is found in the first line of the file. The SIF is treated as tab delimited, otherwise it is parsed as a whitespace delimited file.

B . In NDEx, each line in a SIF network file is mapped to a NDEx edge object. The "relationship type" field in that line maps to the predicate of that edge. Each edge has one source node and one or more target nodes depend on the number of target nodes in that line.

C . Each node field in the SIF file is mapped to an NDEx node object. If the value of the "node field" is a URI or CURIE formated string, the NDEx server will create a BaseTerm object based on the string and then create a Node to represent that base term. If the value of the "node field" is a simple literal text, no BaseTerm will be created, only a Node will be created and the "name" attribute of the node will have the value of the "node field".

D . If the SIF file is an Extended Binary SIF file, a header line will define columns that are treated in the following manner:

1 . the "INTERACTION_PUBMED_ID" field will be used to create linked Citation objects.

2 . "PARTICIPANT_NAME" field will be used to populate the "name" attribute of the node.

3 . "UNIFICATION_XREF" field will be used to create an alias of a node.

4 . "RELATIONSHIP_XREF" field will be used to create related terms of a node.

5 . The "NAME" field in the Extended Binary SIF Property header will be use to set the name of the network. "ORGANISM" and "URI DATASOURCE" are treated as properties of the network.

OpenBEL Networks

The OpenBEL Language

OpenBEL (www.openbel.org) is the public standard for the BEL language. It is designed to represent scientific findings by capturing causal and correlative relationships in context, where context can include information about the biological and experimental system in which the relationships were observed, the supporting publications cited and the curation process used.

A BEL document is a set of statements represent specific assertions from cited information resources. Statements are, in most cases, triples with context annotations. The most common type of context annotation are specialized structures to cite specific supporting evidence from knowledge sources, but a more general mechanism allows the annotation of biological contexts such as species, cell type, or cell line. When encoded as a network, a BEL document may have multiple edges of the same type between two nodes, each edge representing a different assertion from a different citation.

BEL documents are not primarily intended as a format for biological inference, but rather as a means to store reusable facts in a form that is well suited to the assembly of purpose-built biological models. Assembly can be automated or may be the result of manual selection and incorporation of findings to produce a specialized model. The choice of assembly algorithm and parameters will lead to different output models for the same input BEL documents.

A particular form of assembled biological model suitable for some types of qualitative causal reasoning and for visualization is the "Knowledge Assembly Model" (KAM). NDEx networks are in principle capable of expressing KAM structures, but as of NDEx v1.2, there are no examples of KAMs in NDEx.

BEL is distinct from many other biological representation schemes in that it employs a system in which all concepts referenced in statements, such as protein abundances, complexes, modified proteins, or reactions are represented by functional composition of terms. This system is supported directly in NDEx networks using FunctionTerm network elements.

BEL documents are expressed in:

– XBEL, an XML format.

– BELScript, a line-oriented text format designed for human readability and composition.

– BEL RDF

NDEx currently supports import and export utilities for XBEL. The following section describes the rules used to transform BEL documents to and from NDEx Networks and XBEL .

NDEx Import rules

XBEL is an XML format in which XML nodes representing BEL statements are grouped by nested nodes that set the biological and citation context annotations for each statement that they contain. The context annotations from outer contexts apply to the statements of inner contexts unless specifically contradicted by annotations in inner contexts.

The following rules are applied based on the type of XML node processed:

  • Header

name, description and version are mapped to Network.name, Network.description and Network.version respectively.

"copyright", "contactInfo" and "Disclaimer" are stored as network properties.

Author list in AuthorGroup are flattened and each author name is stored as an individual property in the network. LicenseGroup is stored in the similar way.

NamespaceGroup

Elements are stored as Namespace objects in NDEx network.

annotationDefinitionGroup

internalAnnotationDefinition

the "id" attribute is mapped to a Namespace object.

"description" and "usage" are stored as properties in the Namespace object.

"listAnnotation" elements are flattened and stored as properties in the Namespace object.

annotationDefinitionGroup

Each element is stored as a Namespace object in the Network.

statementGroup

If element "name" or "comment" exists in statement group.

if a citation exists in the annotationGroup at the same level, "name" and "comment" are treated as properties of the citation.

if a support exists in the annotationGroup at the same level, "name" and "comment" are treated as properties of the support.

otherwise "comment" are stored as properties for each statement in the current statementGroup and it will be also passed on to the next level of statementGroup. "name" will be ignored in this case.

annotationGroup

evidence is mapped to a Support object in the Network.

citation is mapped to a Citation object in the Network.

annotations are stored as NDExPropertyValuePair objects on each edge (or node if the statement is mapped to a orphan node).

statement

Case 1: statement has subject, object, and predicate

statement maps to NDEx Edge element

Case 2: statement does NOT have object

statement maps to NDEx Node element

node may be an "orphan" with no edges, or possibly other edges will reference the node.

Case 3: statement object is a statement expression, S2

statement object is encoded by a node that represents a ReifiedEdgeTerm

The ReifiedEdgeTerm references an edge that is created based on S2

A comment attribute of a statement is stored as a property of the Network element that it is mapped to, i.e. either an edge or node.

XGMML Networks

The XGMML standard is defined by the Cytoscape application. The version of XGMML exported by different versions of Cytoscape are annotated with version strings. The current version of Cytoscape produces an XML document in which the <graph> element has a property of cy:documentVersion="3.0″

Handling of XGMML Network Properties

Properties of a network in XGMML are stored in several places within the document. Some of these properties are shared by all XGMML files.

The main <graph> element has the following properties in XGMML 3.0:

  • id=<id of the graph at the time it was exported from Cytoscape>
  • label=<label on the graph at the time it was exported from Cytoscape>
  • directed="1″
    • whether the edges should be treated as directional
  • cy:documentVersion="3.0″
    • XGMML version

The <graph> element also has these constant properties, common to all XGMML 3.0 files. Note that the URI for the XGMML namespace does not respond as of the NDEx v1.2 release.

  • xmlns:dc="http://purl.org/dc/elements/1.1/"
    • Dublin Core namespace
  • xmlns:xlink="http://www.w3.org/1999/xlink"
  • xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
    • RDF namespace root
  • xmlns:cy="http://www.cytoscape.org"
    • Cytoscape namespace root
  • xmlns="http://www.cs.rpi.edu/XGMML"
    • XGMML namespace

Within the <graph> element is an attribute: <att name="networkMetadata"> This contains RDF that expresses properties of the network using standard ontologies, especially Dublin Core. Properties typically include:

  • <dc:type>
    • A type descriptor of the network – intended for semantic categories, such as "Protein-Protein Interaction".
  • <dc:description>
    • NOT SUPPORTED in Cytoscape, always outputs "N/A"
  • <dc:identifier>
    • Some standard identifier for the network, defaults to "N/A"
  • <dc:date>
    • Creation date of network
  • <dc:title>
    • Title of network.
    • Often identical to label property of <graph>, but not clear that this is always true
  • <dc:source>
  • <dc:format>Cytoscape-XGMML</dc:format>
    • All XGMML networks have this value for their dc:format attribute

An XGMML network may have additional attribute <att> elements within the <graph> element.

In XGMML 1.1, a number of graphics properties of the entire network are expressed as attributes, such as

<att type="real" name="GRAPH_VIEW_ZOOM" value="0.41322728443244305″/>

In XGMML 3.0, a separate <graphics> element within the <graph> element separates the graphic <att> elements of the network from other attributes.

Treatment of XGMML Properties

XGMML->NDEx

  • All Graphics attributes are ignored as of NDEx v1.2
    • Note that if a particular XGMML network has general properties that, by name, imply that they are graphics attributes, they are handled as any other property. For example, a node might have a general property ‘color = blue' set by the user, but that is not encoded as a graphics attribute, as it would if it had been set in Cytoscape using graphic attribute facilities.
  • XGMML RDF in the networkMetadata attribute are stored as NDEx Network properties
    • (Note that the stored properties reference the Dublin Core namespace (dc) when used)
  • All other XGMML graph attributes are stored as NDEx Network properties

NDEx->XGMML

  • No presentation properties are output to XGMML.
  • All properties recognized as networkMetadata are expressed in the RDF section
  • All other NDEx Network properties are expressed as attributes of the <graph> section
Special XGMML Properties Mapped to Attributes in NDEx Network Objects
  • name
    • XGMML->NDEx
      • if : dc:title exists in networkMetadata <att> then dc:title -> Network.name
      • else if name attribute in networkMetadata <att> then name -> Network.name
      • else if label property in <graph> node then label -> Network.name
      • else: filename -> Network.name
    • NDEx ->XGMML
      • Network.name -> dc:title
  • description
    • dc:description <-> Network.description
  • version
    • dc:version <-> Network.version
  • UUID
    • XGMML->NDEx
      • if NDEX:UUID exists as an <att>: NDEx will ignore this attribute and assign a new UUID
    • NDEx ->XGMML
      • Network.UUID -> <graph><att>NDEX:UUID

BioPAX Networks

NDEx uses BioPAX paxtools to parse each imported BioPAX network into an org.biopax.paxtools.model.Model object, and then transforms the Paxtools Model object into an NDEx network.

Translation Rules:

  • The xmlBase attribute of the Paxtools model is stored as a "xmlBase" property in the Network.
  • Each BioPAXElement object is mapped to an NDEx node.
    • BioPAX type is stored in the "ndex:bioPAXType" property of the NDEx Node.
    • For each property on that BioPAXElement:
      • If the value of the property is a BioPAXElement object, create an Edge.
        • The subject Node of the Edge is based on this BioPAXElement node
        • The predicate of the Edge is a BaseTerm derived from the name of the property.
        • The object Node of the Edge is based on the value of the property
      • If the value of the property is a literal value, create an NDExPropertyValuePair object and add it to the properties of the Node.
  • For each Xref element, in addition to creating a Node, NDEx adds additional objects to ensure that the citations and controlled vocabulary references for the Network will be handled consistently with other Networks.
    • Each PublicationXref will result in a corresponding Citation object linked to the annotated Node.
    • Each UnificationXref will add a value to the aliases attribute for the annotated Node linking to a corresponding BaseTerm.
    • Each RelationshipXref will add a value to the relatedTerms attribute for the annotated Node, linking to a corresponding BaseTerm.