Why is raw data only a fact and not an expression?

This post is an extension of conversation on the issues raised in the post https://metastudio.org/t/map-data-sets-as-cc-4-0/5410 . I am creating a different thread, since it is more general than the specific issue that the referred post is about.

Data represent facts. And facts cannot be copyrighted. Compilation of data can be an expression, which can be copyrighted. These are some take-home points that one can gather from any 101 course on copyright.

This raises serious philosophical questions about what is a fact and how to represent a fact. When I say “Tom is a cat.”, is certainly an expression of a fact. It is not in a spreadsheet or a table. But it is possible to make a generic table for it with three columns, subject, predicate and object. And add more rows to represent more facts, e.g. “Jerry is a rat”, as well as “Tom chases Jerry”. By now I am already telling a story through a table. Now, is this a compilation of facts sufficient to say that this is an expression of a story and therefore copyrightable? Apart from these, there are multiple ways in which this data can be represented, as an SQL dump, CSV file, JSON format, RDF format, XML format, and embed this in an expressive programming language like LISP or PROLOG, where the distinction between data and function begins to blurr.

Give me any data set, I can convert that into a triple store (a triple store is a generic way of representing data in the form of a subject, predicate, and object.) Or give a triple store, I can convert that into a paragraph of sentences each representing a fact. The paragraph may appear boring, but who cares. I just want to make the point that there seems to be not much difference after all between data/facts and what we can express as sentences.

When data is presented in the form of a table it tells us a story much better than when each of the facts told as sentences. Then the table is a creative expression of the data, making us gather the meaning more effectively. Then why do we consider only charts, graphs of data as a compilation/expression and not a table or a spreadsheet is not clear to me. One may say that a chart or a table is an image, therefore it is an expression, but it is an image only because it is represented as a table (at least in a digital computer). I am interpreting a 2D representation of an image also as a table of pixel values.

Can I tell a story in the form of a triple store? It is challenging but possible. Who would like to read a story in that format? Who cares, when I am doing that for the sake of an argument.

So, the moral of this story: the assumption that data/facts when represented in a table format or a database are not expressions may stand in a court, but does not appeal @G_N.

One question that I did not address in this post is about what makes an expression creative? We can discuss that some other time.


Is it fact irrespective of factuality of what is represented?

Eg. if the Mahabarata was written as a collection of triples using English words, would it be fact or expression?


That is why I find it difficult to digest how legal experts consider that data represents a fact and therefore can’t be considered an expression, but the compilation of data/facts can be.

That is why I used Tom and Jerry to show the fallacy that what is represented as data need not be factual.

An expression is either true or false. One cannot get into the business of finding out factuality or falsity unless it is expressed.

A data set can be incorrect. It does not become a fact because it is represented in a spreadsheet. That is why I don’t understand why law depends on such baseless assumptions.

1 Like

Mahabharata is a creative work of fiction (?) by author(s) unknown and predates copyright of any form by more than 1500 years. You can write it in any form a thousand ways from Sunday and it will never be considered fact for the purposes of any copyright law. The form in which something is written doesn’t make something a fact or an expression. If something is a fact, it starts off as a fact. A fact can be expressed in many different ways. Some ways are more creative than others. When a work meets a certain baseline level of creativity, it can be copyrighted. Well, it can be “copyrighted” (note the quotes) at any point. But a baseline level of creativity is required to successfully defend the copyright if challenged in court.


yes, it is an expression of a fact, but it is not creative enough to be copyrightable. You could slap a copyright symbol on it but I would laugh at you and “reuse” your work and you won’t be able to do anything.

I mentioned this in my reply to @planemad but I repeat it here – a fact is a fact because it starts off a a fact. When you put it in any tangible form, it becomes an expression (actually, it becomes an expression when you put it in any form, but it has to be in a tangible form to be copyrightable). If the expression has some minimum level of creativity, it can be copyrighted. That doesn’t mean its copyright can be defended in court, for that is yet another story.

Think of a fact as something that you discover, and an expression as something you invent/create. That Tom is a cat is something you discover. Nature made Tom that way. That you decided to name that cat Tom or Tommy or Thomson and wrote in your diary that “Tommy is a fiery feline though feckless and fickle of fiber” is a creative expression.


almost a decade ago I explored this in a series of posts on raw data vs interpreted/processed data


Do we have a baseline for creativity? Is novelty the same as creativity? When does an expression transform into a creative expression?


Unfortunately there is no established baseline for creativity or a formula for determining it. It is established on a case by case basis by a judge only when challenged in a court of law. The problem is that if you don’t challenge a potentially bogus copyright, then that work goes unshared/underused, and if you challenge it, it is expensive to do so. Which is why a copyright is usually only challenged when doing so is worth it. Conversely, if someone uses my copyrighted work wrongly, I can take them to a court, but doing so is expensive. So mostly I will just rant and rave and write an angry blog post about it or wring my hands on Twitter.


That is why this distinction supports inequity.

From what you say, what exists is not a baseline for creativity but there exists a bias line. The bias line determined from case to case basis by an expensive legal system that can pull and push the bar by hundreds of pages of argument to which commoners have no access even if the judgment itself is in the public domain.

This is a very unhappy situation for the commoners and a very happy situation for the non-commoners. The absence of clear criteria is the main reason for most exploitative situations in society.

If the criteria are so difficult that only an expensive legal system can identify them, then this entire mechanism is not meant for the commons. Until the criteria for a creative expression can become clear to every literate citizen, it is better to protect the raw “material” as well as the “value-added” expressions extracted from the raw material. If we don’t do that only the wealthy and powerful can take the benefit of using the raw material and become wealthy. A system preserved in this way will undermine the value of what commoners produce by calling that mere ‘raw data’, so that no or least compensation can be granted for the use of that raw data.

Since there are no clear criteria it is good to render every cultural practice of humans as creative, even if it is an act of imitation. Protect every copy, because copying is a fundamental means and a fundamental right of every being on earth, and a primary process of learning. But this kind of copyright does not exist. The so-called copyright law is originally and continues to be a restriction to copy.

Public domain declares no rights reserved, which appears, let me underline appears, there are no restrictions on copying, but at the same time, there are no restrictions on restricting the freedom to copy on derived works. In the current social systems, this will not protect the commoners. But this will certainly help some of them who by restricting the rights to copy could become wealthy and powerful. This is dubbed as creative freedom, but it is actually the freedom to restrict sustained participation.

In the end, learning cannot happen without imitation (copy), and innovations cannot happen from scratch, and raw data is not scratch. The best way to acknowledge that all innovations stand on the shoulders of preexisting innovations is by preserving access to the derived resources.

There exists legislative protection to prevent others from learning, and that is called copyright law (meaning copy restriction). The current legal system is not invented to protect equitable learning. The way I understand is that copyleft (ShareAlike) is a way commoners can gain default protection from existing copyright law, without seeking any modification within the legislation by declaring that you are granted the freedom to copy or mix without any restriction, but you have no freedom to restrict the freedom you are granted for the derivative. Thus copyleft is a creative way of negotiating in the current system to promote universal access.


Please explain what happened or link post if you have explained.