New Spec
From Auzigog Wiki
Metome Project Notes
~~~~~~~~~~~~~~~~~~~~
The Metome project has six major components:
1. MusicML, a set of XML elements for packaging metadata,
2. Controlled Vocabularies that define the permitted values for certain
MusicML fields,
3. The database that accumulates MusicML-specified metadata,
4. Tools for getting metadata in and out of the database,
5. Discovery tools that associate content with metadata, and
6. Web pages that provide a human interface to the database.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A Short Overview Of MusicML
~~~~~~~~~~~~~~~~~~~~~~~~~~~
MusicML is a set of XML elements for describing metadata. It was originally
just music, now it's all media, but we don't have a better name.
~~~~~~~
Unlike many of the rigorous XML standards like RDF, MusicML contains lots of
duplicate information. This information is checked for consistency. The
duplicate information allows system without complete information to function
better.
~~~~~~~
Metadata is enclosed in a <musicml> element as shown below. The persistent
URL needs to be registered once we're done thrashing on names.
<?xml version="1.0" encoding="ISO-8859-1"?>
<musicml xmlns="http://www.purl.org/net/musiclml/elements/1.1">
metadata goes here
</musicml>
~~~~~~~
One or more "chunks" of metadata goes inside the <musicml> element. Metadata
chunks are enclosed by the <metadata> element as follows:
<metadata language="language-code">
chunk information
</metadata>
The language attribute value is the national language used in the chunk.
If unspecified it defaults to en-us.
Note that xml:lang attributes are ignored for the most part. If one
exists on an element in the chunk information it means that the element
value was intentionally specified in a particular language and should
not be translated. A good example is the German version of The Beatles
"I Want To Hold You Hand"; the song name is "Komm, Gib Mir Deine Hand".
~~~~~~~
The chunk organization allows factual information to be separated from
licensable content, and allows different versions of the same content
available under different licenses to be associated with each other and
with the factual information.
~~~~~~~
A metadata chunk looks like this:
<metadata language="language-code">
<publisher catalog="string" identifier="string" additional="string" description="string"
type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean" created="timestamp">
string that can contain name elements
</publisher>
<id catalog="string" identifier="string" additional="string" description="string" type="string"/>
<name catalog="string" identifier="string" additional="string" description="string"
type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean" variation="string">
string that can contain name elements
</name>
<creator catalog="string" identifier="string" additional="string" description="string"
type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean">
string that can contain name elements
</creator>
<event catalog="string" identifier="string" additional="string" description="string"
type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean">
string that can contain name elements
</event>
<place catalog="string" identifier="string" additional="string" description="string"
type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean">
string that can contain name elements
</place>
<contributor>
<name catalog="string" identifier="string" additional="string" description="string"
type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean">
string that can contain name elements
</name>
<contribution catalog="string" identifier="string" additional="string" description="string"
type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean" style="string">
string that can contain name elements
</contribution>
</contributor>
<gain-adjust> floating-point </gain-adjust>
<play-time> floating-point </play-time>
<date description="string" type="string" show="string"> string </date>
<note start="floating-point" end="floating-point" desciption="string" type="string"
auto-translated="boolean" auto-generated="boolean" sample-start="floating-point"
sample-end="floating-point" catalog="string" identifier="string" additional="string">
string that contain XHTML and also the following MusicML elements:
creator, date, event, id, place, name, contributor, contribution, license
</note>
<license catalog="string" identifier="string" additional="string" description="string"
type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean">
string
</license>
<fingerprint catalog="string" identifier="string" additional="string"> string </fingerprint>
<lineage description="string" medium="string">
string that contain XHTML and also the following MusicML elements:
creator, date, event, id, place, name, contributor, contribution, license
</lineage>
<child part="string" group="string" sequence="integer" catalog="string" identifier="string" additional="string">
string
</child>
<parent catalog="string" identifier="string" additional="string" description="string" type="string"> string </parent>
</metadata>
~~~~~~~
Each piece of licensable information should be in a separately published
metadata chunk. This allows all of the metadata for a particular set of
ids to be kept together while keeping the license information separate.
~~~~~~~
Most elements have catalog, identifier, and additional attributes. Just about all
information has a unique identifier in one or more catalogs.
Each chunk of metadata must have at least one catalog/identifier specified by an
<id> element
An example of a catalog and identifier is "Parlophone" "PMC 1202".
MusicML maintains its own set of catalogs and identifiers out of necessity.
While it is not necessary to use the MusicML catalogs, all metadata in our
system must have MusicML catalogs and identifiers. The MusicML catalogs
have rules for identifier construction.
MusicML includes a catalog that contains information on catalogs including
how to map identifiers in one catalog to another, for example between an
identifier in a MusicML catalog and an identifier in the Parlophone catalog.
MusicML catalogs things down to the "atomic" level. For example, there is an
identifier in a catalog for each song on an album. This level of detail does
not exist in any freely available catalogs. The Parlophone example above
references the British monaural release of "Please Please Me". But, there is
no way to describe track 2, "Misery", using the catalog/identifier pair
"Parlophone" "PMC 1202"
That's where the additional attribute come into play. This track would be
defined as
<id catalog="Parlophone" identifier="PMC 1202" additional="2"/>
~~~~~~~
Most elements have auto-translated, auto-generated, the, sort, description,
and type attributes.
auto-translated means that the content was automatically translated
from another language.
auto-generated means that the content was automatically generated
and probably needs human review
the means that a "the" should prefix the value in a
sentence.
sort is a string used to sort the value in lists
description says what the value is, like a person, band, city, etc.
These values come out of controlled vocabularies.
type says what type of the description it is, like a birth
name, official name, alias, etc. These values come
out of controlled vocabularies.
~~~~~~~
A publisher is defined using the <publisher> element. A publisher is an
entity that is providing the metadata in the chunk.
The created attribute is the date and time of publication in UTC. It's in
UTC because MySql doesn't handle timezone stuff easily.
<publisher created="timestamp" c/i/a a/a/t/s/d/t> xhtml </publisher>
Here's an overly complete example:
<publisher created="2007-02-23 01:20:07" catalog="MusicML/People"
identifier="Jonathan_Eliot_Steinhart" description="person" type="birth name">
<name catalog="MusicML/People" identifier="Jonathan"
description="person" type="first name">Jonathan</name>
<name catalog="MusicML/People" identifier="Eliot"
description="person" type="middle name">Eliot</name>
<name catalog="MusicML/People" identifier="Steinhart"
description="person" type="last name">Steinhart</name>
</publisher>
Note that it would be fine to just have
<publisher created="2007-02-23 01:20:07" catalog="MusicML/People"
identifier="Jonathan_Eliot_Steinhart" description="person" type="birth name">
Jonathan Eliot Steinhart
<publisher>
~~~~~~~
Every chunk of metadata has a publisher element and one or more id elements.
~~~~~~~
A chunk of metadata is published by some entity (the publisher) about some thing
(the id) at an instant in time (created).
There are a bunch of rules about publishers and ids.
o A publisher can publish metadata about one or more ids in a chunk.
o A publisher can update metadata for any ids for which it has published
metadata by having a newer created timestamp. Publishing older data
has no effect.
o When a publisher updates metadata, the effect is as if all prior
metadata for those ids by that publisher is deleted, and then the
new metadata is entered. This means that if a publisher publishes
A, B, and C about some ids, and then updates it only publishing
information about A and C, the information B will vanish.
o When a publisher publishes metadata about more than one id in the
same chunk, these ids are treated as equivalent.
o Publishing metadata that cojoins unrelated ids is considered unsporting.
For example, publishing a chunk of metadata that links ids for two
different songs. Nothing in the metadata format can prevent this, but
publishers who try this may find their submissions rejected, at least
from our system. In order to make this work, all metadata submitted
to our database must include an identifier in a MusicML catalog that
does not map to any other identifier in a MusicML catalog.
These rules are awkward if a publisher wants to publish more than one copy of
the same content under different licenses. The way to handle this is to use
the "additional" attribute on the publisher element. The publisher essentially
defines a namespace using the catalog/identifier, and can then use the additional
value in that namespace to distinguish among published instances.
~~~~~~~
Our web site will most likely include information from all publishers. We
won't host their information if we don't like it for some reason. Media
players that use our software should include configuration information that
let's users select which information from which publishers is preferred.
This configuration also allows end users to access information purchased
from third parties. We wouldn't host this information since it isn't ours.
~~~~~~~
<id> element
This assigns identifiers to metadata objects.
~~~~~~~
<name> element
This specifies the name of a metadata object. The variation attribute is used to
call out variations of objects that are otherwise identical. For example,
<name catalog="MusicML/Titles/Songs" identifier="Al_Green/Take_Me_To_The_River"
description="song" type="official">
Take Me To The River
</name>
and
<name catalog="MusicML/Titles/Songs" identifier="Al_Green/Take_Me_To_The_River"
description="song" type="official" variation="instrumental">
Take Me To The River
</name>
Objects can have more than one name. The definition of a person could include
both their birth name and their stage name:
<name catalog="MusicML/People" identifier="Bob_Dylan" description="person" type="stage name"> Bob Dylan </name>
<name catalog="MusicML/People" identifier="Bob_Dylan" description="person" type="birth name"> Robert Zimmerman </name>
<name catalog="MusicML/People" identifier="Bob_Dylan" description="person" type="alias"> Raspy Bob </name>
Users can personalize their experience by publishing their own aliases:
<name catalog="MusicML/Titles/Songs" identifier="Garcia_Hunter/The_Wheel" description="song" type="official">
The Wheel
</name>
<name catalog="MusicML/Titles/Songs" identifier="Garcia_Hunter/The_Wheel" description="song" type="alias">
The Meal Is Burning And It Won't Go Out
</name>
~~~~~~~
<creator> element
This is just like the name element without a variation attribute.
It specifies the name of the creator of the object such as a band.
~~~~~~~
<event> element
This element specifies the name of the event that included the object.
Things like Woodstock, AIDS Benefit, Bill Graham Memorial. Most objects
do not have events.
~~~~~~~
<place> element
This specifies the place at which the object was created. Things like
Radio City Music Hall, San Francisco. Zero or more place elements are
permitted.
~~~~~~~
<contributor> element
Contributors are (usually) people involved in the creation of the object
The contributor element associates a name with zero or more contributions.
The name is optional; we may know that a trumpet was played without knowing
the identity of the player. Likewise, we may know that someone contributed
without knowing their contribution.
The name may be a band instead of a person.
The style attribute on the contribution sub-element indicates variations on
a mode of contribution. For example, "slide" could be a variation on playing
guitar.
All contributions by a particular contributor must be enclosed by the same
<contributor> element.
~~~~~~~
<gain-adjust> element
This is the amount to adjust the volume of a track or collection of tracks
in order to have a consistent perceived loudness. It should be calculated
using the flac replay gain. The album gain is used for objects that contain
more than one track; the track gain for objects with a single track.
~~~~~~~
<play-time> element
Length in seconds of the audio/video. Should only exist for simple items
(tracks), not composite items such as albums, and movies as it can be
calculated from the referenced items.
~~~~~~~
<date> element
Specifies the date and time. Partial values are accepted, such as 2007-04.
The show attribute contains things like "late show", "afternoon performance",
etc. It would be best to include the times in the element values, but that
information isn't commonly available.
The type attribute can provide a short description. This is intended for
albums where the same content is released on different dates. For example:
<metadata>
<id catalog="MusicML/Audio/Collections" identifier="The_Beatles/Please_Please_Me" description="audio" type="album"/>
<name catalog="MusicML/Titles/Albums" identifier="The_Beatles/Please_Please_Me" description="album" type="official">
Please Please Me
</name>
</metadata>
<metadata>
<id catalog="MusicML/Audio/Collections" identifier="The_Beatles/Please_Please_Me" description="audio" type="album"/>
<id release" catalog="Parlophone" identifier="PMC 1202" description="audio" type="album">
<lineage description="studio" medium="LP"/>
<date description="released" type="British monaural release"> 1963-03-22 </date>
</metadata>
<metadata>
<id catalog="MusicML/Audio/Collections" identifier="The_Beatles/Please_Please_Me" description="audio" type="album"/>
<id release" catalog="Parlophone" identifier="PCS 3042" description="audio" type="album">
<lineage description="studio" medium="LP"/>
<date description="released" type="British stereophonic release"> 1963-03-22 </date>
</metadata>
<metadata>
<id catalog="MusicML/Audio/Collections" identifier="The_Beatles/Please_Please_Me" description="audio" type="album"/>
<id release" catalog="Parlophone" identifier="CDP 7 46435 2" description="audio" type="album">
<lineage description="studio" medium="CD"/>
<date description="released" type="re-release"> 1987-02-26 </date>
</metadata>
~~~~~~~
<license> element
Describes a license. The value can be a short description of the license.
~~~~~~~
<fingerprint> element
Contains a fingerprint for an object.
~~~~~~~
<lineage> element
Defines the origin of the object.
Description is studio/soundboard/audience
Medium is LP/CD/Cassette/DAT/DCC/DVD/MD/SACD/Single/EP and is omitted
for digital bits that were never officially released on any medium.
The value describes the lossy chain, and is omitted for most studio
stuff as it's unavailable. The XHTML allows pieces of equipment to
be referenced.
~~~~~~~
<child> element
Lists components of the object, like songs on an album.
The part is stuff like Set 1, Set 2, Encore, Side A, etc.
The group defines a group of included elements of the same type
such as the audio tracks or the album artwork images.
The sequence is the ordering of the included elements. Sequence
numbers are interpreted with respect to named groupes. If they are
unique they define the ordering in that named group. For example,
an album may have a group that is named "tracks" which has the
songs in order. It may also have another group which is "booklet"
which has the pages in the booklet included with the album in order.
~~~~~~~
<parent> element
Lists object for which encompass this object. For example, a
state may list its country as a parent.
This is not used for songs on albums and stuff like that. It's
used for places, musical instrument and equipment relationships,
personal relationships, roots of songs, etc.
The description and type attributes describe the relationship.
These should match the parent object description and type for
direct relationships, like that of a city to a county. They
can also be used for indirect relationships like father and mother
and agent.
~~~~~~~
<note> element
The note element collects all sorts of things. Notes have optional
start time and end time attributes. The note applies to the entire
object if these are omitted. An missing start defaults to the end; a
missing end defaults to the start. If both are missing then start
defaults to 0 and end defaults to the length of the object. An
empty start defaults to 0, an empty end defaults to the length of the
object.
Factual notes have their content as the element value. Notes reference
external content when a non-fact license applies to that content. This
content is located using the catalog/identifier/additional attributes.
The actual display of the content depends on the license for the metadata
chunk containing the note. The element value for referenced content should
be a URI that can be used for reasonable attribution, such as "Image courtesy
of <a ...> foo.com </a> or "Review courtesy of <a ...> bar.com </a>. The
value should just be the URI; we can generate the attribution text automatically.
The attribution is displayed regardless of whether or not the license allows
the content to be displayed inline, and provides a link to the content provider.
This has to be done carefully. If the reference is to an image, the attribution
should refer to some page containing the image. Likewise with a review, if the
reference is specific to just the review text, the attribution should point
to some page that contains that text.
The following sections describe the behavior for different descriptions.
altitude Altitude for places.
defect Describes problems resulting from the recording chain.
Such as "left channel drops out".
general General information about the object.
genre Defines a musical genre, whatever that is. Also used
for movie genres and image genres.
key Defines a musical key, which is ABCDEFG optionally
followed by # for sharp or b for flat, optionally
followed by the word major or minor. A missing
type defaults to the Western musical system. Other
value may be defined someday.
latitude Latitude for places.
license Code that implements the behavior of different licenses.
Things like in-line, reference, excerpt, thumbnail, etc.
longitude Longitude for places.
middle-c The frequency in Hertz of middle C for the object.
patch A patch to the data from a different source than the
object lineage.
performer This indicates a guest artist. Guest artists should
also be contributors. The element value should be
something readable in the language, such as
"With Branford Marsalis on saxophone."
post There are two types: intro and outro. No value is
needed. The intro is the moment in time when vocals
start; the outro is the moment in time immediately
after the vocals end.
postal-code The value is a postal code. This note description is
only used when providing information about a place.
review A review of the object.
sample A reference to another piece of content, such as
snippets used for hip-hop mixes. The sample-start
and sample-end are the start and end of the referenced
content. The catalog/identifier/additional reference
the content. The element value is a short description
such as "played backwards and half speed."
Work is needed here to specify sampling images and video.
street-address The value is a street address. This note description is
only used when providing information about a place.
tempo The tempo in beats per minute.
time-signature The time signature as two slash-separated numbers. The
first is number of beats per measure, the second is the
type of note that gets one beat. This is in standard
Western notation, the default if no type attribute.
Other system might eventually get defined.
transcription A textual representation of the content.
transition The change from one track to another. Can be used for
changes internal to a track but that's less useful.
There are three transition types: audio, conceptual,
and visual. Audio values are audience, cut, fade, segue,
silence. Conceptual values are related and unrelated.
Visual values are blend, clean, cut, fade, segue.
xslt Specific to catalog definitions. The value is XSLT code
that translates identifiers in the catalog.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Controlled Vocabularies
~~~~~~~~~~~~~~~~~~~~~~~
The controlled vocabularies cover three areas:
1. Names of catalogs controlled by the metome project,
2. The structure of identifiers in certain metome catalogs, and
3. Description and type attribute values for MusicML elements.
~~~~~~~~~~~~~
MusicML includes a bunch of catalogs. These are not really part of the XML
specification, but knowing about them makes the rest easier to understand.
All of these catalog names begin with "MusicML/", nobody else is allowed to
use this prefix.
TODO EEEK! Should probably change this from MusicML to metome.
Identifiers in MusicML catalogs are ASCII strings with punctuation characters
removed and spaces converted to underscores. This makes it easy to access
things using shell commands. Concatenating a catalog '/', and identifer
makes a file name. Cheesy but effective.
Some of the MusicML identifiers have a structure that provides additional
information. More on this in a bit.
There are cases where identifiers could conflict. For example, if there
are two people named "John Smith". The system appends distinguishing
characters as needed, so the first one would be John_Smith and the second
would be John_Smith_1. This does not happen automatically and requires the
services of a community editor sort of person.
The additional attribute is used sparingly in MusicML catalogs. Right now
it's only used for distinguishing different items by the same publisher as
discussed above.
MusicML/People Used for names of people. Identifiers are made
by concatenating the components of the persons name.
MusicML/Corporations Used for names of companies. Identifiers are made
by concatenating the components of the company
name. For example, Lulu_Incorporated.
MusicML/Organizations Used for names of organizations such as Internet_Archive.
MusicML/Labels Used for names of record labels.
MusicML/Titles/Songs Used for song titles. Identifiers are the song author
name followed by a slash followed by the song name, such
as "Noah_Lewis/Minglewood_Blues". If there is more that
one author then the name portion is the last names of the
authors, such as "Lieber_Stoller/Kansas_City".
MusicML/Titles/Albums Used for album titles. Identifiers are the creator (band)
followed by the album title, such as "The_Beatles/Please_Please_Me".
Some creativity is required when a band releases different albums
with the same name, such as A Hard Day's Night by The Beatles.
MusicML/Titles/Concerts Used for the names of concerts unreleased on albums. Identifiers
are the creator (band) followed by the concert title followed by
the concert date and some other identifier such as an etree number.
For example: "The_Grateful_Dead/1977-12-30-etree-30624".
MusicML/Audio/Tracks Used for individual audio tracks. Identifiers are either album
or concert identifiers followed by a sequence number and song
name, such as "The_Beatles/Please_Please_Me/01-Misery".
MusicML/Audio/Collections Used for collections of audio tracks such as albums and concerts.
MusicML/Places Used for geographic names. Identifiers are of the form
Planet/Continent/Country/State/County/City/Venue. The
general form is taken from the Getty Thesauraus of
Geographic Names. State/County/City is mapped as appropriate
to the rules of the country. If the three levels do not
exist the missing ones are left blank, for example
Earth/Europe/France/Ile-de-France//Paris/Pathe_Marconi_Studios
MusicML/Events Used for event names.
MusicML/Instruments Used for musical instruments. The form is based on the
Hornbostel-Sachs taxonomy and still needs work.
Chordophones/Plucked/Guitar/Electric_Guitar/Guild/Starfire
MusicML/Roles Used for non-musical contributions like author, engineer,
producer, grip. Needs work.
MusicML/Licenses Used for the names of different content licenses. Currently
defined identifiers include Facts, Strictly_Commercial, and
The_Grateful_Dead.
MusicML/Fingerprints Used to keep track of content fingerprints. There are
several subcatalogs:
MusicML/Fingerprints/MD5/FLAC contains fingerprints of
lossless flac files.
MusicML/Fingerprints/MD5/SHN contains fingerprints of
lossless shorten files.
MusicML/Fingerprints/MD5/WAV contains fingerprints of
lossless wav files.
TODO maybe these should be md5s of the wavs and the
subcatalog should be the sampling rate. Or do we just
want the originally distributed material?
MusicML/Fingerprints/PUID contains MusicDNS puids.
MusicML/Bands Contains band names.
MusicML/Equipment Contains names of things like recording equipment.
An organization is needed.
MusicML/Images For images. There are lots of sub-catalogs including
Concert_Tickets, Backstage_Passes, Posters, Laminates,
Album_Covers.
MusicML/Titles/Movies Movie titles.
MusicML/Catalogs Information about catalogs.
MusicML/URI Identifiers in this catalog are URIs.
~~~~~~~
OK, most things in MusicML have a description and a type. Don't know if
this is the best choice of terms but it seems to work.
description: This is the category into which an object falls.
type: This is the subdivision of the category.
All MusicML objects have an identifier. For starters, description and type
values for object identifiers are below. Keep in mind that these value are
for the object itself, not for other things that may get hung on the object
such as names. Oh yeah, by identifier I mean a catalog/identifier/additional
packaged into an <id> element. If you think that terms are overloaded now
wait until we get into the database! Not that all equivalent identifiers
must have the same description and type.
TODO where do tours and collections fit it?
collection tour, playlist
Description Type Definition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
audio ................................. Pure audio content.
album A commercial or bootleg album. This is independent of
the medium (LP, CD, etc.) which we'll get to later.
This includes short versions of albums like EPs and
singles.
concert This is a recording of a performance that has not been
released on an album. Note that there are plenty of
cases where concert recordings and commercial albums are
available for the same performance.
track A single indivisible audio item, such as a song, poem,
newscast, etc.
video ................................. Video content that includes and audio track.
silent ................................ Video content with no audio track.
movie A complete piece of video/silent.
episode One of a series of related movies/silents.
scene A logical division of a movie/silent.
image ................................. A still image.
photo A photograph.
poster Concert poster.
laminate Backstage laminate.
concert ticket Ticket to a concert.
backstage pass Backstage pass.
album cover Album cover art.
TODO jeremy will get more stuff
print ................................. Printed media.
book These should be self-explanatory. Some library research is needed here.
journal
periodical
magazine
text .................................. Online text content.
opinion An opinion piece about something.
place ................................. A physical place. Might also be for virtual places.
planet These are all self-explanatory. There may need to be some
continent additions for countries and states that use different
country geographic divisions.
state
county
city
venue A performance venue.
event ................................. A named event, such as "Woodstock" or "Merlefest".
festival A music or film festival.
benefit A fund raiser.
catalog ............................... A catalog of identifiers external to us.
record label A record company catalog.
trading site A content trading web site.
government A government catalog like the Copyright Office.
industry An industry catalog like ISBN.
equipment ............................. A piece of equipment.
musical instrument All of these are just what they say.
amplifier
microphone
cable
recorder
tape
preamp
converter
power supply
mixing board
camera
lens
software program
song TODO
date .................................. A calendar date.
person ................................ A person.
group ................................. A group of people.
band A group of people playing music together.
performance A group of people doing things like juggling.
meeting ............................... A company meeting.
normal Normal alcohol consumption.
Belgian A lulu of a meeting.
~~~~~~~~~~~~~`
Description and type values for other elements:
Element Description Type Definition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
publisher ............................................. Publisher of metadata.
person ................................ Name of a person.
(none) ................ Unsure of name.
birth name ............ What it says on their birth certificate.
stage name ............ Name that they use as a performer.
alias ................. Another name for the person.
corporation ........................... Name of a company.
(none) ................ Nothing special.
alias ................. Another name for the company.
organization .......................... Name of an organization.
(none) ................ Nothing special.
alias ................. Another name for the organization.
web site .............................. A web site.
(none) ................ Nothing special.
fan ................... A site run by fans of something.
trading ............... A media trading site.
alias ................. Another name for the site.
name .................................................. Name of something.
person ................................ Name of a person.
(none) ................ Unsure of name.
birth name ............ What it says on their birth certificate.
stage name ............ Name that they use as a performer.
alias ................. Another name for the person.
corporation ........................... Name of a company.
(none) ................ Nothing special.
alias ................. Another name for the company.
organization .......................... Name of an organization.
(none) ................ Nothing special.
alias ................. Another name for the organization.
web site .............................. A web site.
(none) ................ Nothing special.
fan ................... A site run by fans of something.
trading ............... A media trading site.
alias ................. Another name for the site.
band .................................. Musical group.
(none) ................ Nothing special.
alias ................. Another name for the band.
performance group ..................... Non-musical group.
(none) ................ Nothing special.
alias ................. Another name for the group.
planet ................................ Geographical stuff.
continent ............................. Geographical stuff.
country ............................... Geographical stuff.
state ................................. Geographical stuff.
county ................................ Geographical stuff.
city .................................. Geographical stuff.
venue ................................. Geographical stuff.
(none) ................ Nothing special.
alias ................. Another name for the geographical thing.
song .................................. Name of a song.
(none) ................ Nothing special.
alias ................. Another name for the song.
event ................................. Name of an event.
(none) ................ Nothing special.
alias ................. Another name for the event.
album ................................. Name of an album.
(none) ................ Nothing special.
alias ................. Another name for the album.
concert ............................... Name of a concert, usually constructed.
(none) ................ Nothing special.
alias ................. Another name for the album.
tour .................................. Name of a concert tour.
(none) ................ Nothing special.
alias ................. Another name for the tour.
catalog ............................... Name of a catalog.
(none) ................ Nothing special.
trading site .......... Media trading site.
record label .......... Record label catalog.
reference ............. Reference material
place ................................................. Name of a place.
planet ................................ Geographical stuff.
continent ............................. Geographical stuff.
country ............................... Geographical stuff.
state ................................. Geographical stuff.
county ................................ Geographical stuff.
city .................................. Geographical stuff.
venue ................................. Geographical stuff.
(none) ................ Nothing special.
alias ................. Another name for the geographical thing.
creator ............................................... Content creator.
person ................................ Name of a person.
(none) ................ Unsure of name.
birth name ............ What it says on their birth certificate.
stage name ............ Name that they use as a performer.
alias ................. Another name for the person.
corporation ........................... Name of a company.
(none) ................ Nothing special.
alias ................. Another name for the company.
organization .......................... Name of an organization.
(none) ................ Nothing special.
alias ................. Another name for the organization.
web site .............................. A web site.
(none) ................ Nothing special.
fan ................... A site run by fans of something.
trading ............... A media trading site.
alias ................. Another name for the site.
band .................................. Musical group.
(none) ................ Nothing special.
alias ................. Another name for the band.
performance group ..................... Non-musical group.
(none) ................ Nothing special.
alias ................. Another name for the group.
event ................................................. An event.
event ................................. Name of an event.
(none) ................ Nothing special.
benefit ............... A benefit for some cause.
festival .............. An arts festival.
memorial .............. A remembrance of someone or something.
wake .................. A celebration of someone's life.
alias ................. Another name for the event.
contribution .......................................... A contribution to an object.
author ................................ Wrote the material.
(none) ................ Nothing special.
lyrics ................ The words.
music ................. The music.
arranger .............................. Arranged the material.
performer ............................. Performed the material.
(none) ................ Nothing special.
voice ................. Sang.
instrument ............ Played an instrument.
conductor ............. Conducted the other performers.
non-performer ......................... Some other contribution.
engineer .............. Engineering.
producer .............. Production.
TODO do we need types when we have taxonomy?
parent ................................................ An element relation.
person ................................ A person.
father ................ Their father.
mother ................ Their mother.
date .................................................. Date something happened.
performed ............................. Date of performance.
recorded .............................. Date of recording. Implies performance.
released .............................. Date material was released. Might be by broadcast.
mixed ................................. Mix-down date.
born .................................. Birth date of a person.
died .................................. Death date of a person.
formed ................................ Birth date of a group.
disbanded ............................. Death date of a group.
~~~~~~~~~~~~~`
Lineage element:
Description:
soundboard ............ Live concert feed from soundboard, includes FM and matrix.
audience .............. Audience microphones.
studio ................ Studio recording.
Medium:
CD .................... Compact disc.
DVD ................... Digital video disk.
LP .................... Long playing vinyl.
Single ................ Very short vinyl.
EP .................... Longer than a single, shorter than a LP.
Cassette .............. Compact cassette.
VHS ................... VHS video tape.
Betamax ............... Betamax video tape.
HD-DVD ................ High-definition DVD.
Blue-Ray .............. The other high definition DVD.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Short Overiew of the Database Organization
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The database contains two main tables and a collection of indices. The indices
are generated from the two main tables and exist to speed searching. The biggest
performance issue is sorting; the indices are pre-sorted so avoid it.
~~~~~~~~~~~~~
Main tables:
CREATE TABLE IF NOT EXISTS Id (
Id BIGINT UNSIGNED NOT NULL PRIMARY KEY, # internal identifier for this metadata source
Catalog CHAR(64) NOT NULL DEFAULT '', # Catalog for this chunk of metadata
Identifier CHAR(255) NOT NULL DEFAULT '', # Identifier in the catalog for this metadata chunk
Additional CHAR(64) NOT NULL DEFAULT '' # Additional information for catalog mapping
);
CREATE TABLE IF NOT EXISTS Value (
File TEXT, #TODO temporary to simplify debugging
Publisher BIGINT UNSIGNED NOT NULL, # internal identifier for this publisher from Id table
Created CHAR(19) NOT NULL, # time stamp for this chunk of metadata by this publisher
Id BIGINT UNSIGNED NOT NULL DEFAULT 0, # internal identifier of element from Id table
Chunk BIGINT UNSIGNED NOT NULL, # internal identifier for all elements in this metadata chunk
Link SMALLINT UNSIGNED NOT NULL DEFAULT 0, # internal identifier that links elements in a group
Equivalent BIGINT UNSIGNED NOT NULL DEFAULT 0, # internal identifier that maps all elements cojoined by id
Direct BOOL DEFAULT TRUE, # element specified directly or indirectly via note
Language CHAR(10) NOT NULL DEFAULT 'en-us', # National language code for element value
Description CHAR(64) NOT NULL DEFAULT '', # element description
Type CHAR(128) NOT NULL DEFAULT '', # element type
The BOOL DEFAULT FALSE, # put "the" in front of value if set
AutoGenerated BOOL DEFAULT FALSE, # set if automatically generated
AutoTranslated BOOL DEFAULT FALSE, # set if automatically translated
Sort CHAR(255) NOT NULL, # string used for sorting the value
Value TEXT NOT NULL DEFAULT '', # element value
Search TEXT NOT NULL DEFAULT '', # search string for main index use
Extra CHAR(64) NOT NULL DEFAULT '', # extra spot for extra attributes like variation and style
Start FLOAT NOT NULL DEFAULT 0, # starting time for portion of an object
End FLOAT NOT NULL DEFAULT 0.0000001, # ending time for portion of an object
Sequence SMALLINT UNSIGNED NOT NULL DEFAULT 0, # for sequence numbers, also flag for contributors
Dark BOOL DEFAULT FALSE, # set if data exists but should not be shown
Cache TEXT NOT NULL DEFAULT '', # location of cached data, if any
Element SET( # element that defined this row
'Child', 'Contribution', 'Creator', 'Date', 'Event', 'Fingerprint', 'Gain-adjust',
'Id', 'License', 'Lineage', 'Name', 'Note', 'Parent', 'Place', 'Play-time', 'Publisher'
)
);
~~~~~~
The CHAR columns are an attempt to increase speed and save space. Not sure
that it'll work; they may need to be turned into TEXT fields at a later date.
~~~~~~
All catalog/identifier/additional attribute sets are mapped into internal
identifiers via the Id table. This is done for performance reasons. When
you see "id" below, it's one of these internal identifiers.
~~~~~~
Each row of the Value table includes the id of the Publisher and the Created
timestamp. These values are used for two things. They're used to order data
for display according to user preference, and also to locate data for updates.
~~~~~~
The Element column indicates the type of MusicML element that produced the row.
~~~~~~
The Id column is stuffed with an id generated from the element
catalog/identifier/additional. A value of 0 means "none".
~~~~~~
Each chunk of metadata is assigned a Chunk id. That is, a Chunk id is
assigned for every Publisher, <id> element combination. A publisher may
only publish one chunk of data for each id. If a publisher publishes
data for an <id> that is newer than what's in the database the old data
is deleted and replaced by the new data. The internal Chunk id is reused
when data is updated.
~~~~~~
Some entries take more than one row, such as <contributor> elements that
include <name> and <contribution> elements. Multi-row entries are connected
using the Link column. A link counter is initalized for each Chunk. This
counter is incremented for each element, unless it is a multi-row entry. The
result is that two rows in a chunk with the same Link value are connected.
Other than contributor, Link is used for <note description="sample" ... />.
Rows that are linked also have their Sequence set to 1. Sequence is used
because it happens to be available. The reason for this is that things that
are linked, such as contributor information, needs to be recognized so that
it can be displayed appropriately. There are degenerate cases where we have
a contributor name but the contribution is unknown, and vice versa. In these
cases we can't tell that something is linked by their being more than one
element with the same Link value, hence the additional flag.
~~~~~~
Equivalent is the last of the ids. All metadata chunks that reference the
same MusicML id (an id in a MusicML catalog) are assigned the same Equivalent
id. The Equivalent id is the same id assigned to the <id> element for that
MusicML id. This allows all information about a particular object to be
obtained using a single query.
~~~~~~
The Direct flag is TRUE for all rows defined by top-level MusicML elements.
It is FALSE for everything else. What this means is that, for example, if
a metadata chunk has a <place> element that defines a venue, the Direct flag
is set to true. But, if there is a <note> element, and there is a <place>
element in the value of that <note> element, then Direct is set to FALSE.
The interpretation is that the flag is set for things directly referenced by
a metadata chunk, and clear for things indirectly referenced.
~~~~~~
The Language column holds the national language code. The value comes from
any xml:lang attributes, not from the <metadata language= /> attribute. The
latter is handled by having a different database for each national language.
~~~~~~
The AutoGenerated, AutoTranslated, and The flags map to the related attributes.
~~~~~~
The Dark flag is set for data that is in the database that is hidden from
external users. This might get set in response to a DMCA takedown notice.
We wouldn't delete the content since so many of these notices are bogus.
But we would hide it until things are resolved.
~~~~~~
The Description and Type columns hold the values of those attributes.
They are overloaded; they are also used for the <includes> element
part and group attributes.
The Extra column is overloaded to hold the various extra attributes sported
by some elements:
<contribution> element "style" attribute
<name> element "variation" attribute
<lineage> element "medium" attribute
~~~~~~
The Start and End columns hold the values of the <note> element Start and
End attributes. These columns are also overloaded.
Start is used for the value of the <play-time> and <gain-adjust> elements.
Start and End are also used for the <note description="sample" ...>
sample-start and sample-end elements. This type of note is stored as two
table rows. The first row gets Sequence=0 and holds the start and end,
the second row gets Sequence=1 and holds the sample-start and sample-end.
~~~~~~
The Sequence column holds sequence numbers for things like tracks on an
album.
~~~~~~
The Value column holds the element value. Any XHTML in the value is intact.
~~~~~~
The Sort column is used for sorting the Value for display. It is smaller than
the Value column because sorting beyond what someone is likely to read is a
wasted. An example of use is <name sort="Steinhart,Jon"> Jon Steinhart </name>.
~~~~~~
The Search column is what is searched, not the Value column. There are two
differences. First, the Search column is text-only, all XHTML and MusicML
elements are removed. Second, it may contain data not in the Value column.
For example, the Value may be the name of a song, and the Search may be the
lyrics.
~~~~~~
Cache is the location of cached data for the element. This is not well thought
out yet. This may go into a separate table since few rows would actually use this.
~~~~~~
The biggest performance problem results from needing sorted results. We work
around this by building pre-sorted tables. The downside is that it takes time
to build those tables meaning that changes don't immediately appear. The long
term plan is to hack the database back end so that it does insertion sorts.
~~~~~~
On place elements the additional value is also stored in the Extra field.
~~~~~~
The "search everything" page uses the MainIndex table.
CREATE TABLE IF NOT EXISTS MainIndex (
Description CHAR(64) NOT NULL,
Type CHAR(128) NOT NULL,
Value CHAR(255) NOT NULL,
Search TEXT NOT NULL
);
This table is built from all rows in the Value table where Direct is TRUE, the Value
is not empty, and the Element is not Note, Play-time, Gain-adjust, Date, Lineage,
Child, or Parent. It is ordered by the Sort, of course.
Not all data is copied as is; some tinkering is performed.
The creator value is prepended to the value of names of of description audio and type
album or concert. So rather than something showing up as "Howlin' At The Moon" it
would show up as "Sam Bush, Howlin' At The Moon".
Some data is also invented. "untitled" is inserted for items with description audio
that do not have a name element.
this won't work well
additional indices?
contribution, content
person, contribution, content
song, concert or album
on gdxml
albums go into a completely different catalog than unreleased shows
where does publisher reputation go?
WHAT DO WE NEED FOR INDICES?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
main index
name, creator, includer (description type)
name index
place index
contributor index
This page is hidden to everyone except administrators.
