This is about Erel's Natural Language Processing library here:
I can't post there, so I put it here.
What has been bugging me is what we do with the parsed data.
Language processors lack something: real semantic processing.
No matter what language we speak, we all have the same ideas or entities in our minds. Like everyone knows what a river or a mountain is, independently from the language.
And our thinking is clearly object-oriented, with inheritance. We don't need to memorize one by one that a table has color, size, etc, and a chair also has color, size, etc. These are "inherited" in our mind.
Processed texts should end up in an entity structure and a "shape" structure. The shape is the "shape" of an idea, it may be written or spoken(or even pictoral) and it may be tied to a language.
I have a project that maps ideas and words in different classes. There are three basic classes:
Type TEntity(ID As Long, EntityType As Int, Relevance As Long)
Type TShape(ID As Long, Language As Int, Shape As String, PartOfSpeech As Int, Probability As Long, Relevance As Long)
Type TRelationship(ID As Long, ID1 As Long, RelationshipType As Int, ID2 As Long, Probability As Long, Importance As Long, Relevance As Long)
The idea is that after processing a text of any language, we map it to some general language, which is object-oriented. If I say: "He went home late yesterday", we can break it down like this:
He.Go(home, late, yesterday)
And from this sentence we can draw the conclusion that:
1) There is a "he". This instantiates some object of class "person". Who "he" is depends on the "session" from where we got this sentence.
2) This person is a male.
3) "he" has a home. (Creates an instance of "home").
4) "he" went home
5) The action #4 was yesterday.
6) The action #4 happened late.
These are broken into relationship objects.
Once we have a list of objects(entities) and relationships stored, it gives us a way to have a language-independent structure. From that we can recreate sentences in any language, to translate the meaning rather than the sentence.
No matter what language we use, that language has to provide the following functions:
1) Create an instance of a class. I saw A DOG on the street yesterday. - from here on, we use "the dog" because the first sentence created an instance.
2) Assign a value to a property: The car was blue is Car.Color = Blue, a shortcut leaving out the property name.
3) Assign the instance of a method to an object: He ran away.
...and so on. Quite like the programming languages.
These functions that a language can perform, are finite, so we can look for these function patterns and break up the sentences into entities and relationships. The shapes for a language are probably already in our database, otherwise we could not parse the sentences.
This is what I am working on, in B4J. We'll see how far it can get.
If someone has a similar a project or is interested in this topic, I am open for any discussion.
I can't post there, so I put it here.
What has been bugging me is what we do with the parsed data.
Language processors lack something: real semantic processing.
No matter what language we speak, we all have the same ideas or entities in our minds. Like everyone knows what a river or a mountain is, independently from the language.
And our thinking is clearly object-oriented, with inheritance. We don't need to memorize one by one that a table has color, size, etc, and a chair also has color, size, etc. These are "inherited" in our mind.
Processed texts should end up in an entity structure and a "shape" structure. The shape is the "shape" of an idea, it may be written or spoken(or even pictoral) and it may be tied to a language.
I have a project that maps ideas and words in different classes. There are three basic classes:
Type TEntity(ID As Long, EntityType As Int, Relevance As Long)
Type TShape(ID As Long, Language As Int, Shape As String, PartOfSpeech As Int, Probability As Long, Relevance As Long)
Type TRelationship(ID As Long, ID1 As Long, RelationshipType As Int, ID2 As Long, Probability As Long, Importance As Long, Relevance As Long)
The idea is that after processing a text of any language, we map it to some general language, which is object-oriented. If I say: "He went home late yesterday", we can break it down like this:
He.Go(home, late, yesterday)
And from this sentence we can draw the conclusion that:
1) There is a "he". This instantiates some object of class "person". Who "he" is depends on the "session" from where we got this sentence.
2) This person is a male.
3) "he" has a home. (Creates an instance of "home").
4) "he" went home
5) The action #4 was yesterday.
6) The action #4 happened late.
These are broken into relationship objects.
Once we have a list of objects(entities) and relationships stored, it gives us a way to have a language-independent structure. From that we can recreate sentences in any language, to translate the meaning rather than the sentence.
No matter what language we use, that language has to provide the following functions:
1) Create an instance of a class. I saw A DOG on the street yesterday. - from here on, we use "the dog" because the first sentence created an instance.
2) Assign a value to a property: The car was blue is Car.Color = Blue, a shortcut leaving out the property name.
3) Assign the instance of a method to an object: He ran away.
...and so on. Quite like the programming languages.
These functions that a language can perform, are finite, so we can look for these function patterns and break up the sentences into entities and relationships. The shapes for a language are probably already in our database, otherwise we could not parse the sentences.
This is what I am working on, in B4J. We'll see how far it can get.
If someone has a similar a project or is interested in this topic, I am open for any discussion.