Natural Language Processing

Heuristx

Active Member
Licensed User
Longtime User
This is about Erel's Natural Language Processing library here:

I can't post there, so I put it here.

What has been bugging me is what we do with the parsed data.

Language processors lack something: real semantic processing.

No matter what language we speak, we all have the same ideas or entities in our minds. Like everyone knows what a river or a mountain is, independently from the language.

And our thinking is clearly object-oriented, with inheritance. We don't need to memorize one by one that a table has color, size, etc, and a chair also has color, size, etc. These are "inherited" in our mind.

Processed texts should end up in an entity structure and a "shape" structure. The shape is the "shape" of an idea, it may be written or spoken(or even pictoral) and it may be tied to a language.

I have a project that maps ideas and words in different classes. There are three basic classes:

Type TEntity(ID As Long, EntityType As Int, Relevance As Long)
Type TShape(ID As Long, Language As Int, Shape As String, PartOfSpeech As Int, Probability As Long, Relevance As Long)
Type TRelationship(ID As Long, ID1 As Long, RelationshipType As Int, ID2 As Long, Probability As Long, Importance As Long, Relevance As Long)

The idea is that after processing a text of any language, we map it to some general language, which is object-oriented. If I say: "He went home late yesterday", we can break it down like this:

He.Go(home, late, yesterday)

And from this sentence we can draw the conclusion that:

1) There is a "he". This instantiates some object of class "person". Who "he" is depends on the "session" from where we got this sentence.
2) This person is a male.
3) "he" has a home. (Creates an instance of "home").
4) "he" went home
5) The action #4 was yesterday.
6) The action #4 happened late.

These are broken into relationship objects.

Once we have a list of objects(entities) and relationships stored, it gives us a way to have a language-independent structure. From that we can recreate sentences in any language, to translate the meaning rather than the sentence.

No matter what language we use, that language has to provide the following functions:
1) Create an instance of a class. I saw A DOG on the street yesterday. - from here on, we use "the dog" because the first sentence created an instance.
2) Assign a value to a property: The car was blue is Car.Color = Blue, a shortcut leaving out the property name.
3) Assign the instance of a method to an object: He ran away.

...and so on. Quite like the programming languages.
These functions that a language can perform, are finite, so we can look for these function patterns and break up the sentences into entities and relationships. The shapes for a language are probably already in our database, otherwise we could not parse the sentences.

This is what I am working on, in B4J. We'll see how far it can get.

If someone has a similar a project or is interested in this topic, I am open for any discussion.
 

Beja

Expert
Licensed User
Longtime User
Hi Heuristx,
Very brilliant idea.. I see it a good start even though it can be applied to a special case.. you can develop it further and open it to a more generalized method..
For example:
He.Go(home, late, yesterday)
He.Go.Home.Late.Yesterday
and later...
He.CameFrom.Office.Early.Today.... etc..

BTW: I posted a question long time ago here.. I wanted to know how to get a unique value for 3 RGB color values.. I don't remember I got any solutions.
e.g.: 234,233,10 .. I know if I just concatenated the numbers by removing the separators (23423310) then I can have the unique value, but then the number will be very long.
 
Last edited:

Heuristx

Active Member
Licensed User
Longtime User
Hi, Beja,

The above is just a notation.

In reality the information ends up in a database in the form of a lot of TRelationships.
 

Heuristx

Active Member
Licensed User
Longtime User
MyColor = xui.Color_RGB(234, 233, 10)
 

Beja

Expert
Licensed User
Longtime User
MyColor = xui.Color_RGB(234, 233, 10)
Hi Heuristx,
Thanks so much for this, and sorry but I am away from my computer.. I think this function will return a single unique number
 

Heuristx

Active Member
Licensed User
Longtime User
Hi Heuristx,
Thanks so much for this, and sorry but I am away from my computer.. I think this function will return a single unique number
It returns an integer for a specific color that matches a jpeg picture color range. RGB = Red, Green, Blue.
 

hatzisn

Expert
Licensed User
Longtime User

I was designing something like this before something more than 20 years when the movie "Bicentennial man" gave me royal size homework. Mine though was to give meaning to a machine from a speaking person and react to this. I also included a linear algebra space of experiences. So if I tell you: "Go and get some bread" what will you think? Something like: I must get some bread > Where do I buy bread? In the bakery. > How do I go to the bakery? I go out, turn left and in the second junction I turn right and it is there > How do I get out on the road? I wear my jeans, I wear my shoes, I comp my hair, I get my keys, I get out of my apartment, I take the elevator, I go to ground floor and I get on the road. This construct of the linear algebra space of experiences is the foundation stone for everyone to build the meanings and entities you describe. It is also what our mama and grandma did when they were taking us for a walk or telling us stories to connect real life experiences with meanings. According to this I do not believe that the meanings are build in. That is why a computer can never get conscious unless it also gets sensors that correspond to our five senses and become also a robot by getting feet and hands. Surely it can get experiences from an error/punish function but they will be limited to what we are programming to it.
 
Last edited:

Heuristx

Active Member
Licensed User
Longtime User
Hi, Hatzisn,

You are looking WAY ahead, but you are very right to raise this question. It is a truly interesting one!

I do think there are solutions even to this problem. You see, I have been tinkering with this since 1996.

We have some algorithm in our head. It does not matter what the hardware does, the pure algorithm can be modelled in a computer. I do think one day computers will think. But the road is long and we are just at the beginning of it. And it will not come from real numbers, but integers. That is, today's "AI" neural networks are just calculators, like a chess engine is just a calculator.

Of course I would not ask my computer to buy bread. Unless it means that it orders it online. You are right, though, that a sentence like this has to invoke associations in the program. But that is what we humans do when we think: connect classes, methods and instances through connecting properties and "modifiers", which are roughly the equivalent of method property value instances(or, in linguistics, adverbs.

But I could ask it to answer business email.

And who knows, in a hundred years there may be robots who go and buy bread. If there will be stores at that time...
 

hatzisn

Expert
Licensed User
Longtime User

Heuristx

Active Member
Licensed User
Longtime User
If there will be human kind at this time which I doubt a lot.
We always think the problems of our times are the most terrible.
In reality, think of what people thought when the Mongols invaded Europe in 1240, or what they thought in 1920, just after WWI. It really looked like the end of the world and civilization.
And we are still here.
 

hatzisn

Expert
Licensed User
Longtime User

I really hope you are right. But we also are chemistry machines that work with certain rules in equilibrium with our environment. I can only speak for your second example which I know as I am not a history guy and aware of the first. Let's accept the first also on second thought. So in both examples the equilibrium that has been disrupted was the social one. This time the equilibrium that is disrupted is the environmental one and we are the ones to blame. Being forced to live in equilibrium with our environment, undoughtly will move our living point to this disrupted equilibrium point. I am really afraid for the future of our kids and hope that the Fermi filter is not in front of us.
 

Heuristx

Active Member
Licensed User
Longtime User
I confess I have some irreverent thoughts about it, too. People say that trees cool the atmosphere, so we need a lot more trees.
But humanity survived an ice age! How drastic was that! But even the cave men survived it, without the tools we have now...
And I think that if the current leaders had been around during the ice age, they would have started the "cut all the trees, they cool down the atmosphere" campaign, would have allowed no discussion about it, and they would have caused more damage than necessary... but they could not have beaten nature.
I remember well that in the '70's and early '80's we were told about the "global ice age coming", so I wish there were a more balanced discussion about it.
 

rabbitBUSH

Well-Known Member
Licensed User
Language processors lack something: real semantic processing.
Mmmm - interesting - takes me back to days in the linguistics department lecture rooms.

but - what would this do with / how would it deal with - complex sentences and ones containing homonyms and homophones - not to mention multiple sub-clauses?

for instance :
The bailiff chased down the man who chased up the chaste woman, who, being fond of the chase, chased the game, and, chasing his name in silver made him go chase himself across Cannock Chase.

[[ i constructed this sentence in an email to a friend - which was based on my having just read a book relating the development of the OED - the tip here is to look up the word CHASE in the extended OED - ie the full-text unabridged version. ]]
 

Heuristx

Active Member
Licensed User
Longtime User
An excellent question!

It also demonstrates why we should not think in linguistic terms.

My initial thought, many years ago, was: if a human being can figure it out, then the program must be able to figure it out, just follow your mind how it works with sentences like this.
Again, why we can do it is that we don't break it up by morphology and syntax only, but by semantics.
Try to look for "an agent", I'd call it entity here, which "does something" or "something happens to it". Try to fit it into the basic functions of a language, like "assigning a property value to something, assigning an instance of a method to something" etc.

When you notice that something is wrong: there are multiple, confused solutions, then go further and try to break it up into sections. Linguistically, they would be "clauses", semantically "basic language functions".

In my database now I have ideas and "shapes"(words) separately. So there is the IDEA of chase and the English shape: "chase". So when we look up the English word "chase", we will find that it points to several ideas(shall I call it sememes? Meanings?)

We can try one by one to see in what combination it fits an assumed language function. If none fits, we will try the next language function.

As to nesting one into another, it does not present a problem for a computer: if you can do one, you can do a million, recursively, or otherwise if you can escape recursion.

We need, of course, the database that tells us what property or method CAN belong to what entity or relationship.

So the above example can be figured out easier than this one:

The table ran away.

Here the program can see that
1) The table does not have the "run" method, so it does not make sense. We could stop here, and pronounce that the sentence is "invalid", but humans don't do that.

2) The program can investigate and find that the table has a property of legs, and legs have the property of run.

3) So the program can look up similar relationships in the past and find properties of those relationships and list possibilities:
1) The speaker made a mistake.
2) This is in a fictional story like in a children's tale.
3) The speaker was lying.
4) The speaker was drunk(a subclass of lying).
5) This is a joke.(Subclass of fiction.)

It may sound far-fetched, but as an example, this is what I was working on today:

"The Tragedy of Hamlet, Prince of Denmark, often shortened to Hamlet, is a tragedy written by William Shakespeare sometime between 1599 and 1601."

(I don't do the difficult ones first.)
I hope to have it done soon and then eventually multiple clauses can come up, too. And then: multiple clauses with lazy and careless typing, where the writer forgets to use punctuation correctly, and even mistypes some words... But we can fail. How many times do humans misunderstand one another? If they can do it, we are allowed to do it, too. Perfection is not the goal.
 

rabbitBUSH

Well-Known Member
Licensed User
if a human being can figure it out, then the program must be able to figure it out
Mmmm except, that, whatever one calls it OOP or Goop - programs are simply linear (barring the prospects attached to multi-threading CPUs but ultimately everything is in a queque at the check-out) - the human mind performs at another level of interconnection (welllll OK?!?). If I remember from years ago, this is what researchers bumped into. So, seeing / hearing language through "instant" recognition - without much cognitive activity - we simply 'see' the context [ a kind of quantum something-or-other]. As you're describing above - for a computer to detect the context is one BIG spiderweb. For instance, computers don't do that well at writing poetry - OK sort of the reverse of what you seem to be aimed at - I think.(?) - but the same process of assignation around word and syntactic construction must apply. Approaching form a different direction but - - does it leave the same footprints in the protons and electrons,?
As to nesting one into another, it does not present a problem for a computer: if you can do one, you can do a million, recursively, or otherwise if you can escape recursion.
I guess for this search you're using / would have to do, a binary tree search to run through things a bit quicker?

LONG LONG ago, before the Rinderpest, as we say here - there was a development thread Expert Systems - they began in trying to link together observations (environmental for example) and then work out from a "table" of possibilities what the condition was and how to deal with it (of course based on the inexact science of enironmental corrective action - that was amongst other things like language stuff.) Not sure whether Expert System became AI as we talk about it now. There might be argument that would separate the two.

We need, of course, the database that tells us what property or method CAN belong to what entity or relationship.
Eish! yes, there are those quite simple monosyllabic dictionary/vocabulary "entities" that have 118 usage references each one differing in property characterisitic from the next.


Um - overall -
-
-
-
And then: multiple clauses with lazy and careless typing, where the writer forgets to use punctuation correctly, and even mistypes some words

do you have a Cray at your disposal? ?
 

rabbitBUSH

Well-Known Member
Licensed User
It also demonstrates why we should not think in linguistic terms.
I read #1 and #14 again : apart from a number more intriguing discussion points arising in the latter - is this all working towards using natural language programming to construct machine code?

I forgot a point in my last post : somewhere in the chit chat I posted a link to a listing of almost every known programming language - I'm sure there must be one or two in there that try to provide tools for natural language processing and some that try to give a program as a result. For instance, a friend once said the shortest program specification his bosses had given him was written (literally) on a cigarette box - from which they had to obviously do the obvious.

intriguing topic this......
 

Heuristx

Active Member
Licensed User
Longtime User
What you are raising are: problems. Problems are there to be solved.
Mind you, I am not interested in electron paths in the brain, the goal is not building a brain-emulator, but a software model of the algorithm, ignoring the actual working of the brain.
You are right, the recognition is one big spiderweb, but there are ways to cut it down. In the classes I use, there is a common property: relevance, which gets incremented every time the program touches the object in a meaningful manner. More relevant items come to the top, so we don't have to "think deep" for most cases.
The fact that "computers don't do well at writing poetry" is true today. But it is exactly the approach I use that I hope can change that. Computers didn't use to do well at chess, either, and then that changed. I know what I am doing is more complex.

And because of this complexity, I am not trying to use binary trees or anything else that is so mechanical and limiting. In exchange the speed is terrible - we'll see how terrible - compared to what it could be, but speed is not important for now.

Monosyllabic shapes have no structure, so they are easy. In every culture, dialect, sociolect and usage group, they mean something and that is just remembered by us. The word "hi" just points to the entity "greeting" and that's that.

I have a Windows 7 machine, with 3 TB SSD's and 16 GB memory, and B4J is just zipping through tens of thousands of these User Type records a second, so so far I have no complaint. SQLight is SQLightning, too.
 
Cookies are required to use this site. You must accept them to continue using the site. Learn more…