New Spec

From Auzigog Wiki

Jump to: navigation, search

MusicML Spec

        Metome Project Notes
       ~~~~~~~~~~~~~~~~~~~~

The Metome project has six major components:

 1.  MusicML, a set of XML elements for packaging metadata,

 2.  Controlled Vocabularies that define the permitted values for certain
    MusicML fields,

 3.  The database that accumulates MusicML-specified metadata,

 4.  Tools for getting metadata in and out of the database,

 5.  Discovery tools that associate content with metadata, and

 6.  Web pages that provide a human interface to the database.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

       A Short Overview Of MusicML
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~

MusicML is a set of XML elements for describing metadata.  It was originally
just music, now it's all media, but we don't have a better name.

~~~~~~~

Unlike many of the rigorous XML standards like RDF, MusicML contains lots of
duplicate information.  This information is checked for consistency.  The
duplicate information allows system without complete information to function
better.

~~~~~~~

Metadata is enclosed in a <musicml> element as shown below.  The persistent
URL needs to be registered once we're done thrashing on names.

<?xml version="1.0" encoding="ISO-8859-1"?>
<musicml xmlns="http://www.purl.org/net/musiclml/elements/1.1">
       metadata goes here
</musicml>

~~~~~~~

One or more "chunks" of metadata goes inside the <musicml> element.  Metadata
chunks are enclosed by the <metadata> element as follows:

<metadata language="language-code">
       chunk information
</metadata>

The language attribute value is the national language used in the chunk.
If unspecified it defaults to en-us.

Note that xml:lang attributes are ignored for the most part.  If one
exists on an element in the chunk information it means that the element
value was intentionally specified in a particular language and should
not be translated.  A good example is the German version of The Beatles
"I Want To Hold You Hand"; the song name is "Komm, Gib Mir Deine Hand".

~~~~~~~

The chunk organization allows factual information to be separated from
licensable content, and allows different versions of the same content
available under different licenses to be associated with each other and
with the factual information.

~~~~~~~

A metadata chunk looks like this:

<metadata language="language-code">
       <publisher catalog="string" identifier="string" additional="string" description="string"
        type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean" created="timestamp">
               string that can contain name elements
       </publisher>

       <id catalog="string" identifier="string" additional="string" description="string" type="string"/>

       <name catalog="string" identifier="string" additional="string" description="string"
        type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean" variation="string">
               string that can contain name elements
       </name>

       <creator catalog="string" identifier="string" additional="string" description="string"
        type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean">
               string that can contain name elements
       </creator>

       <event catalog="string" identifier="string" additional="string" description="string"
        type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean">
               string that can contain name elements
       </event>

       <place catalog="string" identifier="string" additional="string" description="string"
        type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean">
               string that can contain name elements
       </place>

       <contributor>
               <name catalog="string" identifier="string" additional="string" description="string"
                type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean">
                       string that can contain name elements
               </name>

               <contribution catalog="string" identifier="string" additional="string" description="string"
                type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean" style="string">
                       string that can contain name elements
               </contribution>
       </contributor>

       <gain-adjust> floating-point </gain-adjust>

       <play-time> floating-point </play-time>

       <date description="string" type="string" show="string"> string </date>

       <note start="floating-point" end="floating-point" desciption="string" type="string"
        auto-translated="boolean" auto-generated="boolean" sample-start="floating-point"
        sample-end="floating-point" catalog="string" identifier="string" additional="string">
               string that contain XHTML and also the following MusicML elements:
                       creator, date, event, id, place, name, contributor, contribution, license
       </note>

       <license catalog="string" identifier="string" additional="string" description="string"
        type="string" sort="string" auto-translated="boolean" auto-generated="boolean" the="boolean">
               string
       </license>

       <fingerprint catalog="string" identifier="string" additional="string"> string </fingerprint>

       <lineage description="string" medium="string">
               string that contain XHTML and also the following MusicML elements:
                       creator, date, event, id, place, name, contributor, contribution, license
       </lineage>

       <child part="string" group="string" sequence="integer" catalog="string" identifier="string" additional="string">
               string
       </child>

       <parent catalog="string" identifier="string" additional="string" description="string" type="string"> string </parent>
</metadata>

~~~~~~~

Each piece of licensable information should be in a separately published
metadata chunk.  This allows all of the metadata for a particular set of
ids to be kept together while keeping the license information separate.

~~~~~~~

Most elements have catalog, identifier, and additional attributes.  Just about all
information has a unique identifier in one or more catalogs.

Each chunk of metadata must have at least one catalog/identifier specified by an
<id> element

An example of a catalog and identifier is "Parlophone" "PMC 1202".

MusicML maintains its own set of catalogs and identifiers out of necessity.
While it is not necessary to use the MusicML catalogs, all metadata in our
system must have MusicML catalogs and identifiers.  The MusicML catalogs
have rules for identifier construction.

MusicML includes a catalog that contains information on catalogs including
how to map identifiers in one catalog to another, for example between an
identifier in a MusicML catalog and an identifier in the Parlophone catalog.

MusicML catalogs things down to the "atomic" level.  For example, there is an
identifier in a catalog for each song on an album.  This level of detail does
not exist in any freely available catalogs.  The Parlophone example above
references the British monaural release of "Please Please Me".  But, there is
no way to describe track 2, "Misery", using the catalog/identifier pair
"Parlophone" "PMC 1202"

That's where the additional attribute come into play.  This track would be
defined as
       <id catalog="Parlophone" identifier="PMC 1202" additional="2"/>

~~~~~~~

Most elements have auto-translated, auto-generated, the, sort, description,
and type attributes.

       auto-translated means that the content was automatically translated
                       from another language.

       auto-generated  means that the content was automatically generated
                       and probably needs human review

       the             means that a "the" should prefix the value in a
                       sentence.

       sort            is a string used to sort the value in lists

       description     says what the value is, like a person, band, city, etc.
                       These values come out of controlled vocabularies.

       type            says what type of the description it is, like a birth
                       name, official name, alias, etc.  These values come
                       out of controlled vocabularies.

~~~~~~~

A publisher is defined using the <publisher> element.  A publisher is an
entity that is providing the metadata in the chunk.

The created attribute is the date and time of publication in UTC.  It's in
UTC because MySql doesn't handle timezone stuff easily.

       <publisher created="timestamp" c/i/a a/a/t/s/d/t> xhtml </publisher>

Here's an overly complete example:

       <publisher created="2007-02-23 01:20:07" catalog="MusicML/People"
        identifier="Jonathan_Eliot_Steinhart" description="person" type="birth name">

               <name catalog="MusicML/People" identifier="Jonathan"
                description="person" type="first name">Jonathan</name>

               <name catalog="MusicML/People" identifier="Eliot"
                description="person" type="middle name">Eliot</name>

               <name catalog="MusicML/People" identifier="Steinhart"
                description="person" type="last name">Steinhart</name>
       </publisher>

Note that it would be fine to just have

       <publisher created="2007-02-23 01:20:07" catalog="MusicML/People"
        identifier="Jonathan_Eliot_Steinhart" description="person" type="birth name">
               Jonathan Eliot Steinhart
       <publisher>

~~~~~~~

Every chunk of metadata has a publisher element and one or more id elements.

~~~~~~~

A chunk of metadata is published by some entity (the publisher) about some thing
(the id) at an instant in time (created).

There are a bunch of rules about publishers and ids.

 o  A publisher can publish metadata about one or more ids in a chunk.

 o  A publisher can update metadata for any ids for which it has published
   metadata by having a newer created timestamp.  Publishing older data
   has no effect.

 o  When a publisher updates metadata, the effect is as if all prior
   metadata for those ids by that publisher is deleted, and then the
   new metadata is entered.  This means that if a publisher publishes
   A, B, and C about some ids, and then updates it only publishing
   information about A and C, the information B will vanish.

 o  When a publisher publishes metadata about more than one id in the
   same chunk, these ids are treated as equivalent.

 o  Publishing metadata that cojoins unrelated ids is considered unsporting.
   For example, publishing a chunk of metadata that links ids for two
   different songs.  Nothing in the metadata format can prevent this, but
   publishers who try this may find their submissions rejected, at least
   from our system.  In order to make this work, all metadata submitted
   to our database must include an identifier in a MusicML catalog that
   does not map to any other identifier in a MusicML catalog.

These rules are awkward if a publisher wants to publish more than one copy of
the same content under different licenses.  The way to handle this is to use
the "additional" attribute on the publisher element.  The publisher essentially
defines a namespace using the catalog/identifier, and can then use the additional
value in that namespace to distinguish among published instances.

~~~~~~~

Our web site will most likely include information from all publishers.  We
won't host their information if we don't like it for some reason.  Media
players that use our software should include configuration information that
let's users select which information from which publishers is preferred.

This configuration also allows end users to access information purchased
from third parties.  We wouldn't host this information since it isn't ours.

~~~~~~~

<id> element

This assigns identifiers to metadata objects.

~~~~~~~

<name> element

This specifies the name of a metadata object.  The variation attribute is used to
call out variations of objects that are otherwise identical.  For example,

       <name catalog="MusicML/Titles/Songs" identifier="Al_Green/Take_Me_To_The_River"
        description="song" type="official">
               Take Me To The River
       </name>
and
       <name catalog="MusicML/Titles/Songs" identifier="Al_Green/Take_Me_To_The_River"
        description="song" type="official" variation="instrumental">
               Take Me To The River
       </name>

Objects can have more than one name.  The definition of a person could include
both their birth name and their stage name:

       <name catalog="MusicML/People" identifier="Bob_Dylan" description="person" type="stage name"> Bob Dylan </name>
       <name catalog="MusicML/People" identifier="Bob_Dylan" description="person" type="birth name"> Robert Zimmerman </name>
       <name catalog="MusicML/People" identifier="Bob_Dylan" description="person" type="alias"> Raspy Bob </name>

Users can personalize their experience by publishing their own aliases:

       <name catalog="MusicML/Titles/Songs" identifier="Garcia_Hunter/The_Wheel" description="song" type="official">
               The Wheel
       </name>

       <name catalog="MusicML/Titles/Songs" identifier="Garcia_Hunter/The_Wheel" description="song" type="alias">
               The Meal Is Burning And It Won't Go Out
       </name>

~~~~~~~

<creator> element

This is just like the name element without a variation attribute.
It specifies the name of the creator of the object such as a band.

~~~~~~~

<event> element

This element specifies the name of the event that included the object.
Things like Woodstock, AIDS Benefit, Bill Graham Memorial.  Most objects
do not have events.

~~~~~~~

<place> element

This specifies the place at which the object was created.  Things like
Radio City Music Hall, San Francisco.  Zero or more place elements are
permitted.

~~~~~~~

<contributor> element

Contributors are (usually) people involved in the creation of the object
The contributor element associates a name with zero or more contributions.
The name is optional; we may know that a trumpet was played without knowing
the identity of the player.  Likewise, we may know that someone contributed
without knowing their contribution.

The name may be a band instead of a person.

The style attribute on the contribution sub-element indicates variations on
a mode of contribution.  For example, "slide" could be a variation on playing
guitar.

All contributions by a particular contributor must be enclosed by the same
<contributor> element.

~~~~~~~

<gain-adjust> element

This is the amount to adjust the volume of a track or collection of tracks
in order to have a consistent perceived loudness.  It should be calculated
using the flac replay gain.  The album gain is used for objects that contain
more than one track; the track gain for objects with a single track.

~~~~~~~

<play-time> element

Length in seconds of the audio/video.  Should only exist for simple items
(tracks), not composite items such as albums, and movies as it can be
calculated from the referenced items.

~~~~~~~

<date> element

Specifies the date and time.  Partial values are accepted, such as 2007-04.
The show attribute contains things like "late show", "afternoon performance",
etc.  It would be best to include the times in the element values, but that
information isn't commonly available.

The type attribute can provide a short description.  This is intended for
albums where the same content is released on different dates.  For example:

<metadata>
       <id catalog="MusicML/Audio/Collections" identifier="The_Beatles/Please_Please_Me" description="audio" type="album"/>
       <name catalog="MusicML/Titles/Albums" identifier="The_Beatles/Please_Please_Me" description="album" type="official">
               Please Please Me
       </name>
</metadata>

<metadata>
       <id catalog="MusicML/Audio/Collections" identifier="The_Beatles/Please_Please_Me" description="audio" type="album"/>
       <id release" catalog="Parlophone" identifier="PMC 1202" description="audio" type="album">
       <lineage description="studio" medium="LP"/>
       <date description="released" type="British monaural release"> 1963-03-22 </date>
</metadata>

<metadata>
       <id catalog="MusicML/Audio/Collections" identifier="The_Beatles/Please_Please_Me" description="audio" type="album"/>
       <id release" catalog="Parlophone" identifier="PCS 3042" description="audio" type="album">
       <lineage description="studio" medium="LP"/>
       <date description="released" type="British stereophonic release"> 1963-03-22 </date>
</metadata>

<metadata>
       <id catalog="MusicML/Audio/Collections" identifier="The_Beatles/Please_Please_Me" description="audio" type="album"/>
       <id release" catalog="Parlophone" identifier="CDP 7 46435 2" description="audio" type="album">
       <lineage description="studio" medium="CD"/>
       <date description="released" type="re-release"> 1987-02-26 </date>
</metadata>

~~~~~~~

<license> element

Describes a license.  The value can be a short description of the license.

~~~~~~~

<fingerprint> element

Contains a fingerprint for an object.

~~~~~~~

<lineage> element

Defines the origin of the object.

Description is studio/soundboard/audience

Medium is LP/CD/Cassette/DAT/DCC/DVD/MD/SACD/Single/EP and is omitted
for digital bits that were never officially released on any medium.

The value describes the lossy chain, and is omitted for most studio
stuff as it's unavailable.  The XHTML allows pieces of equipment to
be referenced.

~~~~~~~

<child> element

Lists components of the object, like songs on an album.

The part is stuff like Set 1, Set 2, Encore, Side A, etc.

The group defines a group of included elements of the same type
such as the audio tracks or the album artwork images.

The sequence is the ordering of the included elements.  Sequence
numbers are interpreted with respect to named groupes.  If they are
unique they define the ordering in that named group.  For example,
an album may have a group that is named "tracks" which has the
songs in order.  It may also have another group which is "booklet"
which has the pages in the booklet included with the album in order.

~~~~~~~

<parent> element

Lists object for which encompass this object.  For example, a
state may list its country as a parent.

This is not used for songs on albums and stuff like that.  It's
used for places, musical instrument and equipment relationships,
personal relationships, roots of songs, etc.

The description and type attributes describe the relationship.
These should match the parent object description and type for
direct relationships, like that of a city to a county.  They
can also be used for indirect relationships like father and mother
and agent.

~~~~~~~

<note> element

The note element collects all sorts of things.  Notes have optional
start time and end time attributes.  The note applies to the entire
object if these are omitted.  An missing start defaults to the end; a
missing end defaults to the start.  If both are missing then start
defaults to 0 and end defaults to the length of the object.  An
empty start defaults to 0, an empty end defaults to the length of the
object.

Factual notes have their content as the element value.  Notes reference
external content when a non-fact license applies to that content.  This
content is located using the catalog/identifier/additional attributes.

The actual display of the content depends on the license for the metadata
chunk containing the note.  The element value for referenced content should
be a URI that can be used for reasonable attribution, such as "Image courtesy
of <a ...> foo.com </a> or "Review courtesy of <a ...> bar.com </a>.  The
value should just be the URI; we can generate the attribution text automatically.
The attribution is displayed regardless of whether or not the license allows
the content to be displayed inline, and provides a link to the content provider.
This has to be done carefully.  If the reference is to an image, the attribution
should refer to some page containing the image.  Likewise with a review, if the
reference is specific to just the review text, the attribution should point
to some page that contains that text.

The following sections describe the behavior for different descriptions.

altitude        Altitude for places.

defect          Describes problems resulting from the recording chain.
               Such as "left channel drops out".

general         General information about the object.

genre           Defines a musical genre, whatever that is.  Also used
               for movie genres and image genres.

key             Defines a musical key, which is ABCDEFG optionally
               followed by # for sharp or b for flat, optionally
               followed by the word major or minor.  A missing
               type defaults to the Western musical system.  Other
               value may be defined someday.

latitude        Latitude for places.

license         Code that implements the behavior of different licenses.
               Things like in-line, reference, excerpt, thumbnail, etc.

longitude       Longitude for places.

middle-c        The frequency in Hertz of middle C for the object.

patch           A patch to the data from a different source than the
               object lineage.

performer       This indicates a guest artist.  Guest artists should
               also be contributors.  The element value should be
               something readable in the language, such as
               "With Branford Marsalis on saxophone."

post            There are two types: intro and outro.  No value is
               needed.  The intro is the moment in time when vocals
               start; the outro is the moment in time immediately
               after the vocals end.

postal-code     The value is a postal code.  This note description is
               only used when providing information about a place.

review          A review of the object.

sample          A reference to another piece of content, such as
               snippets used for hip-hop mixes.  The sample-start
               and sample-end are the start and end of the referenced
               content.  The catalog/identifier/additional reference
               the content.  The element value is a short description
               such as "played backwards and half speed."

               Work is needed here to specify sampling images and video.

street-address  The value is a street address.  This note description is
               only used when providing information about a place.

tempo           The tempo in beats per minute.

time-signature  The time signature as two slash-separated numbers.  The
               first is number of beats per measure, the second is the
               type of note that gets one beat.  This is in standard
               Western notation, the default if no type attribute.
               Other system might eventually get defined.

transcription   A textual representation of the content.

transition      The change from one track to another.  Can be used for
               changes internal to a track but that's less useful.
               There are three transition types: audio, conceptual,
               and visual.  Audio values are audience, cut, fade, segue,
               silence.  Conceptual values are related and unrelated.
               Visual values are blend, clean, cut, fade, segue.

xslt            Specific to catalog definitions.  The value is XSLT code
               that translates identifiers in the catalog.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

       Controlled Vocabularies
       ~~~~~~~~~~~~~~~~~~~~~~~

The controlled vocabularies cover three areas:

 1.  Names of catalogs controlled by the metome project,

 2.  The structure of identifiers in certain metome catalogs, and

 3.  Description and type attribute values for MusicML elements.

~~~~~~~~~~~~~

MusicML includes a bunch of catalogs.  These are not really part of the XML
specification, but knowing about them makes the rest easier to understand.
All of these catalog names begin with "MusicML/", nobody else is allowed to
use this prefix.

TODO EEEK!  Should probably change this from MusicML to metome.

Identifiers in MusicML catalogs are ASCII strings with punctuation characters
removed and spaces converted to underscores.  This makes it easy to access
things using shell commands.  Concatenating a catalog '/', and identifer
makes a file name.  Cheesy but effective.

Some of the MusicML identifiers have a structure that provides additional
information.  More on this in a bit.

There are cases where identifiers could conflict.  For example, if there
are two people named "John Smith".  The system appends distinguishing
characters as needed, so the first one would be John_Smith and the second
would be John_Smith_1.  This does not happen automatically and requires the
services of a community editor sort of person.

The additional attribute is used sparingly in MusicML catalogs.  Right now
it's only used for distinguishing different items by the same publisher as
discussed above.

MusicML/People          Used for names of people.  Identifiers are made
                       by concatenating the components of the persons name.

MusicML/Corporations    Used for names of companies.  Identifiers are made
                       by concatenating the components of the company
                       name.  For example, Lulu_Incorporated.

MusicML/Organizations   Used for names of organizations such as Internet_Archive.

MusicML/Labels          Used for names of record labels.

MusicML/Titles/Songs    Used for song titles.  Identifiers are the song author
                       name followed by a slash followed by the song name, such
                       as "Noah_Lewis/Minglewood_Blues".  If there is more that
                       one author then the name portion is the last names of the
                       authors, such as "Lieber_Stoller/Kansas_City".

MusicML/Titles/Albums   Used for album titles.  Identifiers are the creator (band)
                       followed by the album title, such as "The_Beatles/Please_Please_Me".
                       Some creativity is required when a band releases different albums
                       with the same name, such as A Hard Day's Night by The Beatles.

MusicML/Titles/Concerts Used for the names of concerts unreleased on albums.  Identifiers
                       are the creator (band) followed by the concert title followed by
                       the concert date and some other identifier such as an etree number.
                       For example: "The_Grateful_Dead/1977-12-30-etree-30624".

MusicML/Audio/Tracks    Used for individual audio tracks.  Identifiers are either album
                       or concert identifiers followed by a sequence number and song
                       name, such as "The_Beatles/Please_Please_Me/01-Misery".

MusicML/Audio/Collections Used for collections of audio tracks such as albums and concerts.

MusicML/Places          Used for geographic names.  Identifiers are of the form
                       Planet/Continent/Country/State/County/City/Venue.  The
                       general form is taken from the Getty Thesauraus of
                       Geographic Names.  State/County/City is mapped as appropriate
                       to the rules of the country.  If the three levels do not
                       exist the missing ones are left blank, for example

                       Earth/Europe/France/Ile-de-France//Paris/Pathe_Marconi_Studios

MusicML/Events          Used for event names.

MusicML/Instruments     Used for musical instruments.  The form is based on the
                       Hornbostel-Sachs taxonomy and still needs work.

                       Chordophones/Plucked/Guitar/Electric_Guitar/Guild/Starfire

MusicML/Roles           Used for non-musical contributions like author, engineer,
                       producer, grip.  Needs work.

MusicML/Licenses        Used for the names of different content licenses.  Currently
                       defined identifiers include Facts, Strictly_Commercial, and
                       The_Grateful_Dead.

MusicML/Fingerprints    Used to keep track of content fingerprints.  There are
                       several subcatalogs:

                       MusicML/Fingerprints/MD5/FLAC contains fingerprints of
                       lossless flac files.

                       MusicML/Fingerprints/MD5/SHN contains fingerprints of
                       lossless shorten files.

                       MusicML/Fingerprints/MD5/WAV contains fingerprints of
                       lossless wav files.

                       TODO maybe these should be md5s of the wavs and the
                       subcatalog should be the sampling rate.  Or do we just
                       want the originally distributed material?

                       MusicML/Fingerprints/PUID contains MusicDNS puids.

MusicML/Bands           Contains band names.

MusicML/Equipment       Contains names of things like recording equipment.
                       An organization is needed.

MusicML/Images          For images.  There are lots of sub-catalogs including
                       Concert_Tickets, Backstage_Passes, Posters, Laminates,
                       Album_Covers.

MusicML/Titles/Movies   Movie titles.

MusicML/Catalogs        Information about catalogs.

MusicML/URI             Identifiers in this catalog are URIs.

~~~~~~~

OK, most things in MusicML have a description and a type.  Don't know if
this is the best choice of terms but it seems to work.

       description:    This is the category into which an object falls.

       type:           This is the subdivision of the category.

All MusicML objects have an identifier.  For starters, description and type
values for object identifiers are below.  Keep in mind that these value are
for the object itself, not for other things that may get hung on the object
such as names.  Oh yeah, by identifier I mean a catalog/identifier/additional
packaged into an <id> element.  If you think that terms are overloaded now
wait until we get into the database!  Not that all equivalent identifiers
must have the same description and type.

TODO where do tours and collections fit it?

collection      tour, playlist

Description     Type                    Definition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

audio ................................. Pure audio content.

               album                   A commercial or bootleg album.  This is independent of
                                       the medium (LP, CD, etc.) which we'll get to later.
                                       This includes short versions of albums like EPs and
                                       singles.

               concert                 This is a recording of a performance that has not been
                                       released on an album.  Note that there are plenty of
                                       cases where concert recordings and commercial albums are
                                       available for the same performance.

               track                   A single indivisible audio item, such as a song, poem,
                                       newscast, etc.

video ................................. Video content that includes and audio track.
silent ................................ Video content with no audio track.

               movie                   A complete piece of video/silent.

               episode                 One of a series of related movies/silents.

               scene                   A logical division of a movie/silent.

image ................................. A still image.

               photo                   A photograph.

               poster                  Concert poster.

               laminate                Backstage laminate.

               concert ticket          Ticket to a concert.

               backstage pass          Backstage pass.

               album cover             Album cover art.

               TODO jeremy will get more stuff

print ................................. Printed media.

               book                    These should be self-explanatory.  Some library research is needed here.
               journal
               periodical
               magazine

text .................................. Online text content.

               opinion                 An opinion piece about something.

place ................................. A physical place.  Might also be for virtual places.

               planet                  These are all self-explanatory.  There may need to be some
               continent               additions for countries and states that use different
               country                 geographic divisions.
               state
               county
               city

               venue                   A performance venue.

event ................................. A named event, such as "Woodstock" or "Merlefest".

               festival                A music or film festival.

               benefit                 A fund raiser.

catalog ............................... A catalog of identifiers external to us.

               record label            A record company catalog.

               trading site            A content trading web site.

               government              A government catalog like the Copyright Office.

               industry                An industry catalog like ISBN.

equipment ............................. A piece of equipment.

               musical instrument      All of these are just what they say.
               amplifier
               microphone
               cable
               recorder
               tape
               preamp
               converter
               power supply
               mixing board
               camera
               lens
               software program

song    TODO

date .................................. A calendar date.

person ................................ A person.

group ................................. A group of people.

               band                    A group of people playing music together.

               performance             A group of people doing things like juggling.

meeting ............................... A company meeting.

               normal                  Normal alcohol consumption.

               Belgian                 A lulu of a meeting.

~~~~~~~~~~~~~`

Description and type values for other elements:

Element         Description     Type                    Definition
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
publisher ............................................. Publisher of metadata.

               person ................................ Name of a person.
                               (none) ................ Unsure of name.
                               birth name ............ What it says on their birth certificate.
                               stage name ............ Name that they use as a performer.
                               alias ................. Another name for the person.

               corporation ........................... Name of a company.
                               (none) ................ Nothing special.
                               alias ................. Another name for the company.

               organization .......................... Name of an organization.
                               (none) ................ Nothing special.
                               alias ................. Another name for the organization.

               web site .............................. A web site.
                               (none) ................ Nothing special.
                               fan ................... A site run by fans of something.
                               trading ............... A media trading site.
                               alias ................. Another name for the site.

name .................................................. Name of something.

               person ................................ Name of a person.
                               (none) ................ Unsure of name.
                               birth name ............ What it says on their birth certificate.
                               stage name ............ Name that they use as a performer.
                               alias ................. Another name for the person.

               corporation ........................... Name of a company.
                               (none) ................ Nothing special.
                               alias ................. Another name for the company.

               organization .......................... Name of an organization.
                               (none) ................ Nothing special.
                               alias ................. Another name for the organization.

               web site .............................. A web site.
                               (none) ................ Nothing special.
                               fan ................... A site run by fans of something.
                               trading ............... A media trading site.
                               alias ................. Another name for the site.

               band .................................. Musical group.
                               (none) ................ Nothing special.
                               alias ................. Another name for the band.

               performance group ..................... Non-musical group.
                               (none) ................ Nothing special.
                               alias ................. Another name for the group.

               planet ................................ Geographical stuff.
               continent ............................. Geographical stuff.
               country ............................... Geographical stuff.
               state ................................. Geographical stuff.
               county ................................ Geographical stuff.
               city .................................. Geographical stuff.
               venue ................................. Geographical stuff.
                               (none) ................ Nothing special.
                               alias ................. Another name for the geographical thing.

               song .................................. Name of a song.
                               (none) ................ Nothing special.
                               alias ................. Another name for the song.

               event ................................. Name of an event.
                               (none) ................ Nothing special.
                               alias ................. Another name for the event.

               album ................................. Name of an album.
                               (none) ................ Nothing special.
                               alias ................. Another name for the album.

               concert ............................... Name of a concert, usually constructed.
                               (none) ................ Nothing special.
                               alias ................. Another name for the album.

               tour .................................. Name of a concert tour.
                               (none) ................ Nothing special.
                               alias ................. Another name for the tour.

               catalog ............................... Name of a catalog.
                               (none) ................ Nothing special.
                               trading site .......... Media trading site.
                               record label .......... Record label catalog.
                               reference ............. Reference material

place ................................................. Name of a place.

               planet ................................ Geographical stuff.
               continent ............................. Geographical stuff.
               country ............................... Geographical stuff.
               state ................................. Geographical stuff.
               county ................................ Geographical stuff.
               city .................................. Geographical stuff.
               venue ................................. Geographical stuff.
                               (none) ................ Nothing special.
                               alias ................. Another name for the geographical thing.

creator ............................................... Content creator.

               person ................................ Name of a person.
                               (none) ................ Unsure of name.
                               birth name ............ What it says on their birth certificate.
                               stage name ............ Name that they use as a performer.
                               alias ................. Another name for the person.

               corporation ........................... Name of a company.
                               (none) ................ Nothing special.
                               alias ................. Another name for the company.

               organization .......................... Name of an organization.
                               (none) ................ Nothing special.
                               alias ................. Another name for the organization.

               web site .............................. A web site.
                               (none) ................ Nothing special.
                               fan ................... A site run by fans of something.
                               trading ............... A media trading site.
                               alias ................. Another name for the site.

               band .................................. Musical group.
                               (none) ................ Nothing special.
                               alias ................. Another name for the band.

               performance group ..................... Non-musical group.
                               (none) ................ Nothing special.
                               alias ................. Another name for the group.

event ................................................. An event.

               event ................................. Name of an event.
                               (none) ................ Nothing special.
                               benefit ............... A benefit for some cause.
                               festival .............. An arts festival.
                               memorial .............. A remembrance of someone or something.
                               wake .................. A celebration of someone's life.
                               alias ................. Another name for the event.

contribution .......................................... A contribution to an object.
               author ................................ Wrote the material.
                               (none) ................ Nothing special.
                               lyrics ................ The words.
                               music ................. The music.

               arranger .............................. Arranged the material.

               performer ............................. Performed the material.
                               (none) ................ Nothing special.
                               voice ................. Sang.
                               instrument ............ Played an instrument.
                               conductor ............. Conducted the other performers.

               non-performer ......................... Some other contribution.
                               engineer .............. Engineering.
                               producer .............. Production.

               TODO do we need types when we have taxonomy?

parent ................................................ An element relation.
               person ................................ A person.
                               father ................ Their father.
                               mother ................ Their mother.

date .................................................. Date something happened.
               performed ............................. Date of performance.
               recorded .............................. Date of recording.  Implies performance.
               released .............................. Date material was released.  Might be by broadcast.
               mixed ................................. Mix-down date.
               born .................................. Birth date of a person.
               died .................................. Death date of a person.
               formed ................................ Birth date of a group.
               disbanded ............................. Death date of a group.
~~~~~~~~~~~~~`

Lineage element:

Description:
       soundboard ............ Live concert feed from soundboard, includes FM and matrix.
       audience .............. Audience microphones.
       studio ................ Studio recording.

Medium:
       CD .................... Compact disc.
       DVD ................... Digital video disk.
       LP .................... Long playing vinyl.
       Single ................ Very short vinyl.
       EP .................... Longer than a single, shorter than a LP.
       Cassette .............. Compact cassette.
       VHS ................... VHS video tape.
       Betamax ............... Betamax video tape.
       HD-DVD ................ High-definition DVD.
       Blue-Ray .............. The other high definition DVD.

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

       Short Overiew of the Database Organization
       ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

The database contains two main tables and a collection of indices.  The indices
are generated from the two main tables and exist to speed searching.  The biggest
performance issue is sorting; the indices are pre-sorted so avoid it.

~~~~~~~~~~~~~

Main tables:

CREATE TABLE IF NOT EXISTS Id   (
       Id              BIGINT UNSIGNED NOT NULL PRIMARY KEY,   # internal identifier for this metadata source
       Catalog         CHAR(64) NOT NULL DEFAULT '',           # Catalog for this chunk of metadata
       Identifier      CHAR(255) NOT NULL DEFAULT '',          # Identifier in the catalog for this metadata chunk
       Additional      CHAR(64) NOT NULL DEFAULT ''            # Additional information for catalog mapping
);

CREATE TABLE IF NOT EXISTS Value (
       File            TEXT,                                   #TODO temporary to simplify debugging
       Publisher       BIGINT UNSIGNED  NOT NULL,              # internal identifier for this publisher from Id table
       Created         CHAR(19) NOT NULL,                      # time stamp for this chunk of metadata by this publisher
       Id              BIGINT UNSIGNED NOT NULL DEFAULT 0,     # internal identifier of element from Id table
       Chunk           BIGINT UNSIGNED NOT NULL,               # internal identifier for all elements in this metadata chunk
       Link            SMALLINT UNSIGNED NOT NULL DEFAULT 0,   # internal identifier that links elements in a group
       Equivalent      BIGINT UNSIGNED NOT NULL DEFAULT 0,     # internal identifier that maps all elements cojoined by id
       Direct          BOOL DEFAULT TRUE,                      # element specified directly or indirectly via note
       Language        CHAR(10) NOT NULL DEFAULT 'en-us',      # National language code for element value
       Description     CHAR(64) NOT NULL DEFAULT '',           # element description
       Type            CHAR(128) NOT NULL DEFAULT '',          # element type
       The             BOOL DEFAULT FALSE,                     # put "the" in front of value if set
       AutoGenerated   BOOL DEFAULT FALSE,                     # set if automatically generated
       AutoTranslated  BOOL DEFAULT FALSE,                     # set if automatically translated
       Sort            CHAR(255) NOT NULL,                     # string used for sorting the value
       Value           TEXT NOT NULL DEFAULT '',               # element value
       Search          TEXT NOT NULL DEFAULT '',               # search string for main index use
       Extra           CHAR(64) NOT NULL DEFAULT '',           # extra spot for extra attributes like variation and style
       Start           FLOAT NOT NULL DEFAULT 0,               # starting time for portion of an object
       End             FLOAT NOT NULL DEFAULT 0.0000001,       # ending time for portion of an object
       Sequence        SMALLINT UNSIGNED NOT NULL DEFAULT 0,   # for sequence numbers, also flag for contributors
       Dark            BOOL DEFAULT FALSE,                     # set if data exists but should not be shown
       Cache           TEXT NOT NULL DEFAULT '',               # location of cached data, if any
       Element         SET(                                    # element that defined this row
                               'Child', 'Contribution', 'Creator', 'Date', 'Event', 'Fingerprint', 'Gain-adjust',
                               'Id', 'License', 'Lineage', 'Name', 'Note', 'Parent', 'Place', 'Play-time', 'Publisher'
                       )
);

~~~~~~

The CHAR columns are an attempt to increase speed and save space.  Not sure
that it'll work; they may need to be turned into TEXT fields at a later date.

~~~~~~

All catalog/identifier/additional attribute sets are mapped into internal
identifiers via the Id table.  This is done for performance reasons.  When
you see "id" below, it's one of these internal identifiers.

~~~~~~

Each row of the Value table includes the id of the Publisher and the Created
timestamp.  These values are used for two things.  They're used to order data
for display according to user preference, and also to locate data for updates.

~~~~~~

The Element column indicates the type of MusicML element that produced the row.

~~~~~~

The Id column is stuffed with an id generated from the element
catalog/identifier/additional.  A value of 0 means "none".

~~~~~~

Each chunk of metadata is assigned a Chunk id.  That is, a Chunk id is
assigned for every Publisher, <id> element combination.  A publisher may
only publish one chunk of data for each id.  If a publisher publishes
data for an <id> that is newer than what's in the database the old data
is deleted and replaced by the new data.  The internal Chunk id is reused
when data is updated.

~~~~~~

Some entries take more than one row, such as <contributor> elements that
include <name> and <contribution> elements.  Multi-row entries are connected
using the Link column.  A link counter is initalized for each Chunk.  This
counter is incremented for each element, unless it is a multi-row entry.  The
result is that two rows in a chunk with the same Link value are connected.

Other than contributor, Link is used for <note description="sample" ... />.

Rows that are linked also have their Sequence set to 1.  Sequence is used
because it happens to be available.  The reason for this is that things that
are linked, such as contributor information, needs to be recognized so that
it can be displayed appropriately.  There are degenerate cases where we have
a contributor name but the contribution is unknown, and vice versa.  In these
cases we can't tell that something is linked by their being more than one
element with the same Link value, hence the additional flag.

~~~~~~

Equivalent is the last of the ids.  All metadata chunks that reference the
same MusicML id (an id in a MusicML catalog) are assigned the same Equivalent
id.  The Equivalent id is the same id assigned to the <id> element for that
MusicML id.  This allows all information about a particular object to be
obtained using a single query.

~~~~~~

The Direct flag is TRUE for all rows defined by top-level MusicML elements.
It is FALSE for everything else.  What this means is that, for example, if
a metadata chunk has a <place> element that defines a venue, the Direct flag
is set to true.  But, if there is a <note> element, and there is a <place>
element in the value of that <note> element, then Direct is set to FALSE.

The interpretation is that the flag is set for things directly referenced by
a metadata chunk, and clear for things indirectly referenced.

~~~~~~

The Language column holds the national language code.  The value comes from
any xml:lang attributes, not from the <metadata language= /> attribute.  The
latter is handled by having a different database for each national language.

~~~~~~

The AutoGenerated, AutoTranslated, and The flags map to the related attributes.

~~~~~~

The Dark flag is set for data that is in the database that is hidden from
external users.  This might get set in response to a DMCA takedown notice.
We wouldn't delete the content since so many of these notices are bogus.
But we would hide it until things are resolved.

~~~~~~

The Description and Type columns hold the values of those attributes.
They are overloaded; they are also used for the <includes> element
part and group attributes.

The Extra column is overloaded to hold the various extra attributes sported
by some elements:

       <contribution> element "style" attribute
       <name> element "variation" attribute
       <lineage> element "medium" attribute

~~~~~~

The Start and End columns hold the values of the <note> element Start and
End attributes.  These columns are also overloaded.

Start is used for the value of the <play-time> and <gain-adjust> elements.

Start and End are also used for the <note description="sample" ...>
sample-start and sample-end elements.  This type of note is stored as two
table rows.  The first row gets Sequence=0 and holds the start and end,
the second row gets Sequence=1 and holds the sample-start and sample-end.

~~~~~~

The Sequence column holds sequence numbers for things like tracks on an
album.

~~~~~~

The Value column holds the element value.  Any XHTML in the value is intact.

~~~~~~

The Sort column is used for sorting the Value for display.  It is smaller than
the Value column because sorting beyond what someone is likely to read is a
wasted.  An example of use is <name sort="Steinhart,Jon"> Jon Steinhart </name>.

~~~~~~

The Search column is what is searched, not the Value column.  There are two
differences.  First, the Search column is text-only, all XHTML and MusicML
elements are removed.  Second, it may contain data not in the Value column.
For example, the Value may be the name of a song, and the Search may be the
lyrics.

~~~~~~

Cache is the location of cached data for the element.  This is not well thought
out yet.  This may go into a separate table since few rows would actually use this.

~~~~~~

The biggest performance problem results from needing sorted results.  We work
around this by building pre-sorted tables.  The downside is that it takes time
to build those tables meaning that changes don't immediately appear.  The long
term plan is to hack the database back end so that it does insertion sorts.

~~~~~~

On place elements the additional value is also stored in the Extra field.

~~~~~~

The "search everything" page uses the MainIndex table.

CREATE TABLE IF NOT EXISTS MainIndex (
       Description     CHAR(64) NOT NULL,
       Type            CHAR(128) NOT NULL,
       Value           CHAR(255) NOT NULL,
       Search          TEXT NOT NULL
);

This table is built from all rows in the Value table where Direct is TRUE, the Value
is not empty, and the Element is not Note, Play-time, Gain-adjust, Date, Lineage,
Child, or Parent.  It is ordered by the Sort, of course.

Not all data is copied as is; some tinkering is performed.

The creator value is prepended to the value of names of of description audio and type
album or concert.  So rather than something showing up as "Howlin' At The Moon" it
would show up as "Sam Bush, Howlin' At The Moon".

Some data is also invented.  "untitled" is inserted for items with description audio
that do not have a name element.

this won't work well




additional indices?
       contribution, content
       person, contribution, content
       song, concert or album

on gdxml
       albums go into a completely different catalog than unreleased shows

where does publisher reputation go?

WHAT DO WE NEED FOR INDICES?
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
main index
       name, creator, includer (description type)

name index

place index

contributor index


This page is hidden to everyone except administrators.

Personal tools