Posts filed under 'SpaceMapper'
After the first public release of SpaceMapper DataStore and MN8 last week I have some data to draw some conclusions. Unfortunately, on the release date, the FreshMeat announcement contained links which did not passed through the SourceForge counters
(the link was directly to prdownloads instead of the downloads section
on the status page), so I have no Idea about the downloads in the first
day. However seems that there is a lot more interest in an XML database
than in a new scripting language and even so 78% are interested in the
binaries and only 22% in the source of the database project. With the
scripting language the situation is reversed, 79% interest in the
source and only 21% in the binaries. I guess peoples are more
interested in how to write an interpreter than in using a new one
No feedback, no bugs, no mailing list interest no contributors which is reasonable to a first public release.
What is not reasonable is that the Klez virus on somebody's computer
noticed the release and it sends thousands of virused mails with my
email address in the from. In case anyone receives one I'm really
sorry, it's not my fault, it's not from me and you can verify that by
looking at the source of the message. All the mails I send goes through
our server (194.102.233.6) which I'm sure you won't find in the
received headers.
November 11th, 2002
Just wanted to let you know that after many months of hard work a first public release of SpaceMapper DataStore and MN8 is available.
What is SpaceMapper DataStore ?
DataStore is a Java based document repository server for storing,
querying and fetching XML based documents. It is built on practical
needs allowing the storage of semi-structured (well formatted, maybe
validated, XML, XHTML and HTML) documents and un-structured documents
(TXT).
The documents are stored in conventional relational database
(Postgresql, MySQL, DB2, SAP DB) assuring that way the maximum
advantages and reliability of these products. Being built on top of the
Avalon Phoenix
framework, it allows server components to be easily developed, deployed
and shared. The documents are managed through a BEEP and/or XML-RPC
interface using a subset of the SEP (Simple Exchange Profile) protocol.
What is SpaceMapper mn8 ?
mn8 is an experimental object oriented scripting language, tightly
integrated with the net, which emulates the concepts at the core of XML
in order to simplify and make as transparent as possible information
extraction and manipulation from the WWW and XML documents.
Written in Java works with most operating systems and allows easy
reuse of the huge number of libraries available trough simple wrappers.
At this point mn8 has concepts for: HTML, HTML-Forms, Cookies, RSS,
OPML, HTTP, FTP, POP3, SMTP, Jabber, BEEP, XML-RPC, SOAP, MBox.
Then what is SpaceMapper ?
The SpaceMapper
effort was born from the classic Internet desire to see if there is a
better way. The effort evolved from an early RFP on the now-defunct
SourceXchange which was awarded to the Romanian open source development
firm noLimits Technologies. The project is Open Source (Apache like license) and was sponsored by the 501(c)(3) non-profit arm of media.org (Internet Mulicasting Service ) and noLimits Technologies.
For any questions related to the SpaceMapper and/or mn8 project please write to the mailto:spacemapper-user@lists.sourceforge.net mailing lists.
MN8 and DataStore
is still very young and far away for reaching it's purpose, so any
feedback, ideas, questions and constructive criticism is more than
welcomed
November 5th, 2002
URL: http://radio.weblogs.com/0109405/2002/09/03.html#a30
.
Started the Java Community Server It's an implementation of the Radio Community Server in Java of course. It consists of an XML-RPC backend, HSQL for the DB, and Pnuts to script all the XML-RPC stuff.
My main focus is get the xmlStorageServer stuff working first, which is the bulk of the community server anyway. <
So far I've got the basic stuff in place to allow all the XML-RPC stuff to be scripted via Pnuts.
[
Miceda]
This is cool, would it be even cooler to have it as a block in Apache Phoenix (part of the Avalon project).
Even more the storage part is already existing and working well (do in alpha stage yet). Check out DataStore which is our XML data store on top of regular RDBM's. What makes DataStore unique is that it is not only for XML (you can store well formated documents (think XML, XHTML), well formated and validated (DTD only yet) documents or regular text documents. It's a block in Phoenix
with an XML-RPC server block which assures you access to the storage.
You can work with a cool XML based query
language called SEP (Simple Exchange Profile) which allows you to search, update and add documents over BEEP or XML-RPC. You even have a Java client API similar to JDBC. Plus that the license is Apache like
. We tested it with PostgreSQL, SAP DB and IBM DB2. Unfortunately MySQL 4 doesn't support yet SELECT INTERSECT so the intersect part in SEP doesn't work yet with MySQL.
September 4th, 2002
Today I took some time and wrote the mn8 script to harvest my subscription file from Radio and using a simple XSL to render it in some basic HTML. Once I had the HTML files rendered locally I only had to make a main page so that Plucker would convert them in it's internal format. Here are some screen shots.

Not quite satisfied yet, not sure if I should leave the regular HTML
tags in description or I should convert it to txt, since doesn't make
to much sense. The text would be a lot more readable. Also have to make
it to display only the differences since the last run. The buy line
from the shots is from the screen dumper program
Updated (8 june 2005): This was a simple experiment and by no way a final product or solution. You can check Plucker for an complete Palm based news reader.
August 14th, 2002
After I managed to install last week Postgresql under Windows and tested DataStore with disappointing results, today I installed SAPDB and tested again.
Fantastic! I used the RFC reference XML documents from xml.resource.org, 3124 XML documents, small ones around 4M at all, DataStore over SAPDB, and an MN8 script to load all this documents and store them using SEP (the Simple Exchange Profile) over XML-RPC.
The procedure lasted less than an hour, that means over 60 documents per minute, this is the same as Postgresql, but what is different is that by the time the storing of the documents was completed the indexing was to. With Postgesql this requested around 30 hours.
Now, DataStore
stores the document as it receives it (it does not alter it), but to
allow very fast structured searches (get me the documents which have
this element containing this value and this attribute with this value)
it breaks the documents in little entities (prety much as Google
does) entities which can be used efficiently (actually use their
indexes in their queries) by relational databases along with keeping
information about the structure of that entity (part of that element
...). That way you can actually query toons of well or not so well
formatted documents using complex structure information and get the
results instantly.
After the 3124 documents, DataStore extracted 27000 entities (most of the time words).
And in all this time I worked as usually on the same computer.
I can't explain the poor performances of Postgresql,
it does not perform much better on the Linux server either and we tried
many versions, must be something related to their JDBC drivers, in
comparation to IBM DB2 or MySQL is way slower.
However SAPDB rocks and it's free (GPL/LGPL)
with versions for Linux and Windows (and many other), so if you are
looking for an free, industrial strength relation database look no
further. Should I mention that it comes (the Windows version) with
excellent GUI management tools.
June 14th, 2002
So after finishing the first fully functional internal release of mn8 we considered that running mn8 on a web server side is also useful so we started to make a servlet which will run mn8 based scripts and concepts.
A couple of days should have been enough, still it’s not ready. When I started I tried to make an accurate picture of the steps involved in order to try to improve the estimated time.
The reasons are: a malfunction of the custom scheme based url’s when mn8 is run inside Tomcat and a synchronization problem as now multiple mn8 scripts can be run concomitantly inside the same Java instance.
The synchronization issue could have been foresaw but the Tomcat thing not.
So, what takes to make accurate project estimates, experience ?
May 18th, 2002
Hm, it took me 3 weeks to solve a bug, that's definitely a record.
Truth is I'm not sure it was a bug or a feature. It is a bug because
doesn't worked, it did when I implemented the feature, but then "mn8"
changed and nobody tested those two methods. Now works, and it does it
nicely that's what counts. It is also covered with tests cases so next
time it gets screwed will know. Cool we have now 1114 tests for "mn8",
tests containing "mn8" script code running in a "mn8" script base test
framework. Cool.
We also have some new scrappers and a primitive "Jabber" client. For
now it is only able to send and receive normal messages. Will see
latter what more can we do with it.
RSS, RDF, OPML, scriptingNews2.xml concepts implemented. A cool thing all do is allowing you to have a new RSS, RDF ... feed containing only the new items since it was last invoked.
The only remaining things to do till we hand it over to our publisher are:
finishing the documents for the language syntax (the API for the
concepts is almost done), tackle a bit the basic error handling and do
some more examples.
April 19th, 2002
We said six months and it will be one year when the first phase of the project will
be over. That means 100% late. True it is a bit of an research project
and the number of features needed and implemented for a pleasant
release is double than the initial proposal. Still how could I be this
wrong?
Since I've been leadering at my old company I use to keep a time
registration. What I notice is that for 80% of the tasks I have to do
I'm bellow or there with the estimation. But with the remaining of 20%
I'm chaos. Actually there is all the extra time. Maybe one day I will
have enough information and I will manage to find all the correlations
about the type of tasks or maybe the sphere of the tasks which are
rebelling my management
.
March 19th, 2002
In 14 March we will have one year since we actually started the coding on
all of the "SpaceMapper" projects, a very difficult one but full with
rewards.
So, I took JMetric
and I did a couple of measurements on mn8. Here are the results:
- Lines of code: 20311
- Statements: 14218
- Classes: 211
- Methods: 2296
- Variables: 981
- Public Methods: 1877
This is just the code, no documentation, no unit tests, just the core java
source files. This was till recently a man/month effort. With the actual
code started only from August, till August I was working on the prototype.
Not bad, I guess.
I only have two features open with
one task in each so I'm really at the
end of a first serious release. It's not a bad feeling but it is not good
either, it's exhaustion, accomplishment and scare. In a couple of days/weeks
your secret will be publicly exposed. It's like having a child and giving
it away to strangers to take care of him.
Enough of this mumbling this is not what I had to say. I was about to tell
you about testing and bugs.
The last two months among closing the remaining features we started
intense testing. What I found is that bugs comes in layers. Three
particular type of of bugs, each type with it's own schedule.
The first layer is the soft and easy bugs. Plenty of them, quick to catch
and fix. Unit tests are great investment for this layer.
Then it comes the more complex layer. Not difficult to find but a bit more
trickier to fix. Most of this bugs can be catched by unit test and can
easily be kept under control for the future, again through the unit tests.
But then comes the last layer, at the end when, you are really tired and
seek of bugs. These bugs are nasty ones. Very hard to catch, very hard to
reproduce, very hard to understand what the hack is going on. Should I
mention about fixing them? I spent the last two days chasing such a bug,
I'm not there yet, but I will. Sometimes I wonder if it is a good idea at
all to spend so much time for just one bug?
Unit tests won't help you with these bugs, except maybe after you fixed
them to make sure don't reappear.
Also was interesting to notice that whoever said that 80% of bugs are situated in 20% of
code was absolutely right. The majority of bugs where around 3 classes which where
extremely complicated. Hard to believe that 80% of the bugs where actually in around 200
lines of code from 20,000. The problem is that when I designed those particular portions
of code I was aware of the grade of difficulty exposed so I tried to code in the way Kent Beck recommends and explicitly
expressing intention. All this by using meaningful names, breaking the code in many
minuscule methods and so on. Still, even if was a lot easier that way to understand
functionality it continues not to be extremely easy.
Another interesting conclusion was that even if at the beginning all of us blame somebody
or something else, always, and I mean always we are the stupid ones, and probably the
debugging time would be reduced considerably if we would always start checking the code
instead of trying to catch what we imagine is happening which almost always is miles away
from what is actually happening.
March 5th, 2002
Somehow I remembered today about the old cgi days and how arguments where
passed to cgi's as environment variables. Now I already knew how to make under
bash an mn8 script to be executable, as simple as having an #!/usr/bin/env
mn8 as the first line of the script and making the script executable. BTW
another very simple trick which improves life a great deal is to make a simple
symbolic link from your mn8.sh somewhere in your path like:ln -s
/dev/spacemapper/mn8/mn8.sh /usr/local/bin/mn8 that way from anywhere you
can say mn8 xxxx and voila mn8 works as any executable.
So, back to our track, this week I was playing with Vanilla and noticed that
Vanila is in fact written in Rebol. That gave me the idea that if I can write
bash cgi's and rebol cgi's why wouldn't I be able to write mn8 cgi's. Quickly I
made a small script, throw-ed it into my cgi-bin directory and pointed my browser
to it. Worked, the first time! Isn't life wonderfull ?
Cool I could write mn8 cgi's and use any kind of GET posts. But what
about the POST posts? In the case of GET name/value pairs are sent as url
parameters, but in the case of a POST method the name/value pairs are sent
through the default input stream. But I remembered that in order to make pipes
work with mn8 I check the default input stream at start-up and if I find
something on it I give it as parameter to the script
Exactly! That meant that I can use POST and I will have the parameters in
the argument of the main method. Quickly another script and test. Yep it worked
like magic.
The morale of the story? Good design and model always pays off
There is only one small issue. Every time a script is invoked the Java
machine is started from scratch which is very slow. How could I keep an instance
of an Java VM in memory and convince Apache to use that one to run my
scripts?
BTW, Here it is the test script:
#!/usr/bin/env mn8
$pathInfo from "env:/system/properties/PATH_INFO"
$scriptName from "env:/system/properties/SCRIPT_NAME"
$queryString from "env:/system/properties/QUERY_STRING"
$remoteHost from "env:/system/properties/REMOTE_HOST"
$osName from "env:/system/properties/os.name"
$javaHome from "env:/system/properties/JAVA_HOME"
print "Content-type: text/html\n"
print "<html><title>mn8 CGI Script</title><body bgcolor="#fefefe">"
if $pathInfo != "" then [
print "<form action='" + $scriptName + "' method='post'>"
print "<textarea name='text' cols='65' rows='15'></textarea><br>"
print "<input type='submit' name='submit' value='submit'>"
print "</form>"
] else [
print "<b>Os Name</b>: " + $osName + "<br>"
print "<b>Java Home</b>: " + $javaHome + "<br>"
print "<b>Path Info</b>: " + $pathInfo + "<br>"
print "<b>Script Name</b>: " + $scriptName + "<br>"
print "<b>Query String</b>: " + $queryString + "<br>"
print "<b>numarul de argumete</b>:" + $args@length + "</br>"
if $args@length > 0 then [
each $i in $args do [
print "<p>" + $i + "</p>"
]
]
]
print "</body></html>"
February 15th, 2002
Previous Posts