Jump to content

Semantic spectrum: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
m →‎Strategic nature of semantics: clean up, typos fixed: world wide web → World Wide Web using AWB
take out tag per WP:OVERTAG -- four tags on an article discourages people from trying to improve it
 
(36 intermediate revisions by 29 users not shown)
Line 1: Line 1:
{{multiple issues|
{{Primarysources|date=February 2007}}
{{Primarysources|date=February 2007}}
{{Inappropriate tone|date=December 2007}}
{{prose|date=March 2021}}
{{Essay-like|date=March 2021}}

}}
The '''semantic spectrum''' (sometimes referred to as the '''ontology spectrum''' or the '''smart data continuum''' or '''semantic precision''') is a series of increasingly precise or rather [[semantics|semantically]] expressive definitions for [[data element]]s in [[knowledge representation]]s, especially for machine use.
The '''[[Semantics|semantic]] [[spectrum]]''', sometimes referred to as the '''ontology spectrum''', the '''smart data continuum''', or '''semantic precision''', is a series of increasingly precise or rather [[semantics|semantically]] expressive definitions for [[data element]]s in [[knowledge representation]]s, especially for machine use.


At the low end of the spectrum is a simple binding of a single word or phrase and its definition. At the high end is a full [[ontology (computer science)|ontology]] that specifies relationships between data elements using precise [[URI]]s for relationships and properties.
At the low end of the spectrum is a simple binding of a single word or phrase and its definition. At the high end is a full [[ontology (computer science)|ontology]] that specifies relationships between data elements using precise [[URI]]s for relationships and properties.


With increased [[Specificity (tests)|specificity]] comes increased precision and the ability to use tools to automatically integrate systems but also increased cost to build and maintain a [[metadata registry]].
With increased [[Specificity (tests)|specificity]] comes increased precision and the ability to use tools to automatically [[System integration|integrate]] systems, but also increased cost to build and maintain a [[metadata registry]].


Some steps in the semantic spectrum include the following:
Some steps in the semantic spectrum include the following:
# [[glossary]]: A simple list of terms and their definitions. A glossary focuses on creating a complete list of the terminology of domain-specific terms and acronyms. It is useful for creating clear and unambiguous definitions for terms and because it can be created with simple word processing tools, few technical tools are necessary.
# <u>[[Glossary]]</u>: A simple list of terms and their definitions. A glossary focuses on creating a complete list of the terminology of domain-specific terms and [[Acronym|acronyms]]. It is useful for creating clear and unambiguous definitions for terms, and because it can be created with simple word processing tools, few technical tools are necessary.
# [[controlled vocabulary]]: A simple list of terms, definitions and naming conventions. A controlled vocabulary frequently has some type of oversight process associated with adding or removing data element definitions to ensure consistency. Terms are often defined in relationship to each other.
# [[controlled vocabulary|<u>Controlled vocabulary</u>]]: A simple list of terms, definitions and naming conventions. A controlled vocabulary frequently has some type of oversight process associated with adding or removing data element definitions to ensure consistency. Terms are often defined in relationship to each other.
# [[data dictionary]]: Terms, definitions, naming conventions and one or more representations of the data elements in a computer system. Data dictionaries often define data types, validation checks such as enumerated values and the formal definitions of each of the enumerated values.
# [[data dictionary|<u>Data dictionary</u>]]: Terms, definitions, naming conventions and one or more representations of the data elements in a computer system. Data dictionaries often define data types, validation checks such as enumerated values and the formal definitions of each of the enumerated values.
# [[data model]]: Terms, definitions, naming conventions, representations and one or more representations of the data elements as well as the beginning of specification of the relationships between data elements including abstractions and containers.
# [[data model|<u>Data model</u>]]: Terms, definitions, naming conventions, representations and one or more representations of the data elements as well as the beginning of specification of the relationships between data elements including abstractions and containers.
# [[taxonomy]]: A complete data model in an inheritance hierarchy where all data elements inherit their behaviors from a single "super data element". The difference between a data model and a formal taxonomy is the arrangement of data elements into a formal tree structure where each element in the tree is a formally defined concept with associated properties.
# [[Taxonomy (general)|<u>Taxonomy</u>]]: A complete data model in an inheritance hierarchy where all data elements inherit their behaviors from a single "super data element". The difference between a data model and a formal taxonomy is the arrangement of data elements into a formal tree structure where each element in the tree is a formally defined concept with associated properties.
# [[ontology (computer science)|ontology]]: A complete, machine-readable specification of a conceptualization using URLs for all data elements, properties and relationship types. The [[W3C]] standard language for representing ontologies is the [[Web Ontology Language]] (OWL). Ontologies frequently contain formal business rules formed in discrete logic statements that relate data elements to each another.
# [[ontology (computer science)|<u>Ontology</u>]]: A complete, machine-readable specification of a conceptualization using [[URI]]s (and then [[Internationalized Resource Identifier |IRI]]s) for all data elements, properties and relationship types. The [[W3C]] standard language for representing ontologies is the [[Web Ontology Language]] (OWL). Ontologies frequently contain formal business rules formed in discrete logic statements that relate data elements to each another.


==Typical questions for determining semantic precision==
==Typical questions for determining semantic precision==
{{Unreferenced-section|date=March 2021}}
The following is a list of questions that may arise in determining semantic precision.
The following is a list of questions that may arise in determining semantic precision.


;correctness: How can correct syntax and semantics be enforced? Are tools (such as [[XML Schema]]) readily available to validate syntax of data exchanges?
;<nowiki>Correctness:</nowiki>: How can correct syntax and semantics be enforced? Are tools (such as [[XML Schema (W3C)|XML Schema]]) readily available to validate syntax of data exchanges?
;adequacy/expressivity/scope: Does the system represent everything that is of practical use for the purpose? Is an emphasis being placed on data that is externalized (exposed or transferred between systems)?
;<nowiki>Adequacy/Expressiveness/Scope:</nowiki>: Does the system represent everything that is of practical use for the purpose? Is an emphasis being placed on data that is externalized (exposed or transferred between systems)?
;efficiency: How efficiently can the representation be searched / [[query|queried]], and - possibly - [[automatic reasoning|reasoned]] on?
;<nowiki>Efficiency:</nowiki>: How efficiently can the representation be searched/queried and possibly [[automatic reasoning|reasoned]] on?
;complexity: How steep is the [[learning curve]] for defining new concepts, querying for them or constraining them? are there appropriate tools for simplifying typical workflows? (See also: [[ontology editor]])
;<nowiki>Complexity:</nowiki>: How steep is the [[learning curve]] for defining new concepts, querying for them or constraining them? Are there appropriate tools for simplifying typical workflows? (See also: [[ontology editor]])
;translatability: Can the representation easily be transformed (e.g. by [[Vocabulary-based transformation]]) into an equivalent representation so that [[semantic equivalence]] is ensured?
;<nowiki>Translatability:</nowiki>: Can the representation easily be transformed (e.g. by [[Vocabulary-based transformation]]) into an equivalent representation so that [[semantic equivalence]] is ensured?


===Determining location on the semantic spectrum===
===Determining location on the semantic spectrum===
{{Unreferenced-section|date=March 2021}}


Many organizations today are building a [[metadata registry]] to store their data definitions and to perform [[metadata publishing]]. The question of where they are on the semantic spectrum frequently arises. To determine where your systems are, some of the following questions are frequently useful.
Many organizations today are building a [[metadata registry]] to store their data definitions and to perform [[metadata publishing]]. The question of where they are on the semantic spectrum frequently arises. To determine where your systems are, some of the following questions are frequently useful.


# Is there a centralized glossary of terms for the subject matter?
# Is there a centralized glossary of terms for the subject matter?
Line 34: Line 38:
# Is there an approval process associated with the creation and changes to data elements?
# Is there an approval process associated with the creation and changes to data elements?
# Are coded data elements fully enumerated? Does each enumeration have a full definition?
# Are coded data elements fully enumerated? Does each enumeration have a full definition?
# Is there a process in place to removed duplicate or redundant data elements from the metadata registry?
# Is there a process in place to remove duplicate or redundant data elements from the metadata registry?
# Is there one or more classification schemes used to classify data elements?
# Is there one or more classification schemes used to classify data elements?
# Are document exchanges and [[web services]] created using the data elements?
# Are document exchanges and [[web services]] created using the data elements?
# Can the central metadata registry be used as part of a [[Model-driven architecture]]?
# Can the central metadata registry be used as part of a [[Model-driven architecture]]?
# Are their staff members trained to extract data elements are reuse metadata structures?
# Are there staff members trained to extract data elements that can be reused in metadata structures?


==Strategic nature of semantics==
==Strategic nature of semantics==
{{Unreferenced-section|date=March 2021}}


Today, much of the World Wide Web is stored as [[Hypertext Markup Language]]. Search engines are severely hampered by their inability to understand the meaning of published web pages. These limitations have led to the advent of the [[Semantic web]] movement.
Today, much of the World Wide Web is stored as [[Hypertext Markup Language]]. Search engines are severely hampered by their inability to understand the meaning of published web pages. These limitations have led to the advent of the [[Semantic web]] movement. <ref>{{Cite web |last=Lassila |first=Tim Berners-Lee, James Hendler and Ora |date=2001-05-01 |title=The Semantic Web |url=https://www.scientificamerican.com/article/the-semantic-web/ |access-date=2024-05-05 |website=Scientific American |language=en}}</ref>


In the past, many organizations that created custom database application used isolated teams of developers that did not formally publish their data definitions. These teams frequently used internal data definitions that were incompatible with other computer systems. This made [[Enterprise Application Integration]] and [[Data warehousing]] extremely difficult and costly. Many organizations today require that teams consult a centralized data registry before new applications are created.
In the past, many organizations that created custom database application used isolated teams of developers that did not formally publish their data definitions. These teams frequently used internal data definitions that were incompatible with other computer systems. This made [[Enterprise Application Integration]] and [[Data warehousing]] extremely difficult and costly. Many organizations today require that teams consult a centralized data registry before new applications are created.


The job title of an individual that is responsible for coordinating an organization's data is a [[Data architect]].
The job title of an individual that is responsible for coordinating an organization's data is a [[Data architect]].


== History ==
== History ==
The first reference to this term was at the 1999 [http://www.aaai.org/ AAAI] [http://www.cs.vassar.edu/faculty/welty/presentations/aaai-99/ Ontologies Panel]. The panel was organized by Chris Welty, who at the prodding of Fritz Lehmann and
The first reference to this term was at the 1999 [http://www.aaai.org/ AAAI] [https://web.archive.org/web/20070118085450/http://www.cs.vassar.edu/faculty/welty/presentations/aaai-99/ Ontologies Panel]. The panel was organized by Chris Welty, who at the prodding of Fritz Lehmann and in collaboration with the panelists (Fritz, [[Mike Uschold]], [[Mike Gruninger]], and [[Deborah McGuinness]]) came up with a "spectrum" of kinds of information systems that were, at the time, referred to as ontologies. The "ontology spectrum" picture appeared in print in the introduction to
''[http://portal.acm.org/toc.cfm?id=505168&coll=GUIDE&dl=GUIDE&type=proceeding&CFID=10676962&CFTOKEN=42478708 Formal Ontology and Information Systems: Proceedings of the 2001 Conference].'' The ontology spectrum was also featured in a talk at the Semantics for the Web meeting in 2000 at Dagstuhl by Deborah McGuinness. McGuinness produced a [http://www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-mit-press-(with-citation).htm paper] describing the points on that spectrum that appeared in the book that emerged (much later) from that workshop called [https://web.archive.org/web/20070327234919/http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=9182 "Spinning the Semantic Web."] Later, Leo Obrst extended the spectrum into two dimensions (which technically is not really a spectrum anymore) and added a lot more detail, which was included in his book, ''The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management.''
in collaboration with the panelists (Fritz, Mike Uschold, Mike Gruninger, and [[Deborah McGuinness]]) came up with a "spectrum" of kinds of information systems that were, at the time, referred to as ontologies. The "ontology spectrum" picture appeared in print in the introduction to
''[http://portal.acm.org/toc.cfm?id=505168&coll=GUIDE&dl=GUIDE&type=proceeding&CFID=10676962&CFTOKEN=42478708 Formal Ontology and Information Systems: Proceedings of the 2001 Conference].'' The ontology spectrum was also featured in a talk at the Semantics for the Web meeting in 2000 at Dagstuhl by Deborah McGuinness. McGuinness produced a [http://www.ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-mit-press-(with-citation).htm paper] describing the points on that spectrum that appeared in the book that emerged (much later) from that workshop called [http://mitpress.mit.edu/catalog/item/default.asp?ttype=2&tid=9182 "Spinning the Semantic Web."] Later, Leo Obrst extended the spectrum into two dimensions (which technically is not really a spectrum anymore) and added a lot more detail, which was included in his book, ''The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management.''


The concept of the Semantic precision in [[business system]]s was popularized by [[Dave McComb]] in his book ''Semantics in Business Systems: The Savvy Managers Guide'' published in 2003 where he frequently uses the term '''Semantic Precision'''.
The concept of the Semantic precision in [[business system]]s was popularized by [[Dave McComb]] in his book ''Semantics in Business Systems: The Savvy Managers Guide'' published in 2003 where he frequently uses the term '''Semantic Precision'''.
Line 61: Line 65:
# Informal "[[Is-a]]" relationships
# Informal "[[Is-a]]" relationships
# Formal "Is-a" relationships
# Formal "Is-a" relationships
# Formal [[instance]]s
# Formal [[Instantiation principle|instances]]
# [[Frame (data structure)|Frames]] (properties)
# [[Frame (data structure)|Frames]] (properties)
# [[value restriction|Value Restrictions]]
# [[value restriction|Value Restrictions]]
Line 67: Line 71:
# [[Constraint logic programming|General Logical Constraints]]
# [[Constraint logic programming|General Logical Constraints]]


Note that there was a special emphasis on the adding of formal ''is-a'' relationships to the spectrum which seems to have been dropped.
Note that there was formerly a special emphasis on the adding of formal ''is-a'' relationships to the spectrum which has been dropped.


The company [http://cerebra.com Cerebra] has also popularized this concept by describing the data formats that exist within an enterprise in their ability to store semantically precise [[metadata]]. Their list includes:
The company [http://cerebra.com Cerebra] has also popularized this concept by describing the data formats that exist within an enterprise in their ability to store semantically precise [[metadata]]. Their list includes:
# [[HTML]]
# [[HTML]]
# [[PDF]]
# [[PDF]]
Line 76: Line 80:
# [[Relational database]]s
# [[Relational database]]s
# [[XML]]
# [[XML]]
# [[XML Schema]]
# [[XML Schema (W3C)|XML Schema]]
# [[Taxonomies]]
# [[Taxonomy (general)|Taxonomies]]
# [[Ontologies]]
# [[Ontologies]]


What the concepts share in common is the ability to store information with increasing precision to facilitate intelligent agents.
What these concepts share in common is the ability to store information with increasing precision to facilitate intelligent agents.


== See also ==
== See also ==
Line 86: Line 90:
* [[Semantics]]
* [[Semantics]]
* [[SKOS]]
* [[SKOS]]
* [[ontology (computer science)]]
* [[collabulary]]
* [[Web service]]
* [[Web service]]
* [[Classification scheme (information science)]]
* [[Conceptual interoperability]]


== References ==
== References ==
{{reflist}}
* ''Semantics in Business Systems: The Savvy Managers Guide'', [[Dave McComb]], 2003
* ''Semantics in Business Systems: The Savvy Managers Guide'', [[Dave McComb]], 2003
* [http://www-ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-mit-press-(with-citation).htm Ontologies Come of Age] by [[Deborah L. McGuinness]]
* [https://web.archive.org/web/20090211182933/http://www-ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-mit-press-(with-citation).htm Ontologies Come of Age] by [[Deborah L. McGuinness]]
* [http://www-ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-mit-press-(with-citation).htm#_ftnref2 Figure 2 that includes Ontological Spectrum]
* [https://web.archive.org/web/20090211182933/http://www-ksl.stanford.edu/people/dlm/papers/ontologies-come-of-age-mit-press-(with-citation).htm#_ftnref2 Figure 2 that includes Ontological Spectrum]


{{DEFAULTSORT:Semantic Spectrum}}
{{DEFAULTSORT:Semantic Spectrum}}

Latest revision as of 03:12, 16 June 2024

The semantic spectrum, sometimes referred to as the ontology spectrum, the smart data continuum, or semantic precision, is a series of increasingly precise or rather semantically expressive definitions for data elements in knowledge representations, especially for machine use.

At the low end of the spectrum is a simple binding of a single word or phrase and its definition. At the high end is a full ontology that specifies relationships between data elements using precise URIs for relationships and properties.

With increased specificity comes increased precision and the ability to use tools to automatically integrate systems, but also increased cost to build and maintain a metadata registry.

Some steps in the semantic spectrum include the following:

  1. Glossary: A simple list of terms and their definitions. A glossary focuses on creating a complete list of the terminology of domain-specific terms and acronyms. It is useful for creating clear and unambiguous definitions for terms, and because it can be created with simple word processing tools, few technical tools are necessary.
  2. Controlled vocabulary: A simple list of terms, definitions and naming conventions. A controlled vocabulary frequently has some type of oversight process associated with adding or removing data element definitions to ensure consistency. Terms are often defined in relationship to each other.
  3. Data dictionary: Terms, definitions, naming conventions and one or more representations of the data elements in a computer system. Data dictionaries often define data types, validation checks such as enumerated values and the formal definitions of each of the enumerated values.
  4. Data model: Terms, definitions, naming conventions, representations and one or more representations of the data elements as well as the beginning of specification of the relationships between data elements including abstractions and containers.
  5. Taxonomy: A complete data model in an inheritance hierarchy where all data elements inherit their behaviors from a single "super data element". The difference between a data model and a formal taxonomy is the arrangement of data elements into a formal tree structure where each element in the tree is a formally defined concept with associated properties.
  6. Ontology: A complete, machine-readable specification of a conceptualization using URIs (and then IRIs) for all data elements, properties and relationship types. The W3C standard language for representing ontologies is the Web Ontology Language (OWL). Ontologies frequently contain formal business rules formed in discrete logic statements that relate data elements to each another.

Typical questions for determining semantic precision

[edit]

The following is a list of questions that may arise in determining semantic precision.

Correctness:
How can correct syntax and semantics be enforced? Are tools (such as XML Schema) readily available to validate syntax of data exchanges?
Adequacy/Expressiveness/Scope:
Does the system represent everything that is of practical use for the purpose? Is an emphasis being placed on data that is externalized (exposed or transferred between systems)?
Efficiency:
How efficiently can the representation be searched/queried and possibly reasoned on?
Complexity:
How steep is the learning curve for defining new concepts, querying for them or constraining them? Are there appropriate tools for simplifying typical workflows? (See also: ontology editor)
Translatability:
Can the representation easily be transformed (e.g. by Vocabulary-based transformation) into an equivalent representation so that semantic equivalence is ensured?

Determining location on the semantic spectrum

[edit]

Many organizations today are building a metadata registry to store their data definitions and to perform metadata publishing. The question of where they are on the semantic spectrum frequently arises. To determine where your systems are, some of the following questions are frequently useful.

  1. Is there a centralized glossary of terms for the subject matter?
  2. Does the glossary of terms include precise definitions for each terms?
  3. Is there a central repository to store data elements that includes data types information?
  4. Is there an approval process associated with the creation and changes to data elements?
  5. Are coded data elements fully enumerated? Does each enumeration have a full definition?
  6. Is there a process in place to remove duplicate or redundant data elements from the metadata registry?
  7. Is there one or more classification schemes used to classify data elements?
  8. Are document exchanges and web services created using the data elements?
  9. Can the central metadata registry be used as part of a Model-driven architecture?
  10. Are there staff members trained to extract data elements that can be reused in metadata structures?

Strategic nature of semantics

[edit]

Today, much of the World Wide Web is stored as Hypertext Markup Language. Search engines are severely hampered by their inability to understand the meaning of published web pages. These limitations have led to the advent of the Semantic web movement. [1]

In the past, many organizations that created custom database application used isolated teams of developers that did not formally publish their data definitions. These teams frequently used internal data definitions that were incompatible with other computer systems. This made Enterprise Application Integration and Data warehousing extremely difficult and costly. Many organizations today require that teams consult a centralized data registry before new applications are created.

The job title of an individual that is responsible for coordinating an organization's data is a Data architect.

History

[edit]

The first reference to this term was at the 1999 AAAI Ontologies Panel. The panel was organized by Chris Welty, who at the prodding of Fritz Lehmann and in collaboration with the panelists (Fritz, Mike Uschold, Mike Gruninger, and Deborah McGuinness) came up with a "spectrum" of kinds of information systems that were, at the time, referred to as ontologies. The "ontology spectrum" picture appeared in print in the introduction to Formal Ontology and Information Systems: Proceedings of the 2001 Conference. The ontology spectrum was also featured in a talk at the Semantics for the Web meeting in 2000 at Dagstuhl by Deborah McGuinness. McGuinness produced a paper describing the points on that spectrum that appeared in the book that emerged (much later) from that workshop called "Spinning the Semantic Web." Later, Leo Obrst extended the spectrum into two dimensions (which technically is not really a spectrum anymore) and added a lot more detail, which was included in his book, The Semantic Web: A Guide to the Future of XML, Web Services, and Knowledge Management.

The concept of the Semantic precision in business systems was popularized by Dave McComb in his book Semantics in Business Systems: The Savvy Managers Guide published in 2003 where he frequently uses the term Semantic Precision.

This discussion centered around a 10 level partition that included the following levels (listed in the order of increasing semantic precision):

  1. Simple Catalog of Data Elements
  2. Glossary of Terms and Definitions
  3. Thesauri, Narrow Terms, Relationships
  4. Informal "Is-a" relationships
  5. Formal "Is-a" relationships
  6. Formal instances
  7. Frames (properties)
  8. Value Restrictions
  9. Disjointness, Inverse, Part-of
  10. General Logical Constraints

Note that there was formerly a special emphasis on the adding of formal is-a relationships to the spectrum which has been dropped.

The company Cerebra has also popularized this concept by describing the data formats that exist within an enterprise in their ability to store semantically precise metadata. Their list includes:

  1. HTML
  2. PDF
  3. Word Processing documents
  4. Microsoft Excel
  5. Relational databases
  6. XML
  7. XML Schema
  8. Taxonomies
  9. Ontologies

What these concepts share in common is the ability to store information with increasing precision to facilitate intelligent agents.

See also

[edit]

References

[edit]
  1. ^ Lassila, Tim Berners-Lee, James Hendler and Ora (2001-05-01). "The Semantic Web". Scientific American. Retrieved 2024-05-05.{{cite web}}: CS1 maint: multiple names: authors list (link)