Posts filed under 'XDStore'
After the first public release of SpaceMapper DataStore and MN8 last week I have some data to draw some conclusions. Unfortunately, on the release date, the FreshMeat announcement contained links which did not passed through the SourceForge counters
(the link was directly to prdownloads instead of the downloads section
on the status page), so I have no Idea about the downloads in the first
day. However seems that there is a lot more interest in an XML database
than in a new scripting language and even so 78% are interested in the
binaries and only 22% in the source of the database project. With the
scripting language the situation is reversed, 79% interest in the
source and only 21% in the binaries. I guess peoples are more
interested in how to write an interpreter than in using a new one
No feedback, no bugs, no mailing list interest no contributors which is reasonable to a first public release.
What is not reasonable is that the Klez virus on somebody's computer
noticed the release and it sends thousands of virused mails with my
email address in the from. In case anyone receives one I'm really
sorry, it's not my fault, it's not from me and you can verify that by
looking at the source of the message. All the mails I send goes through
our server (194.102.233.6) which I'm sure you won't find in the
received headers.
November 11th, 2002
Just wanted to let you know that after many months of hard work a first public release of SpaceMapper DataStore and MN8 is available.
What is SpaceMapper DataStore ?
DataStore is a Java based document repository server for storing,
querying and fetching XML based documents. It is built on practical
needs allowing the storage of semi-structured (well formatted, maybe
validated, XML, XHTML and HTML) documents and un-structured documents
(TXT).
The documents are stored in conventional relational database
(Postgresql, MySQL, DB2, SAP DB) assuring that way the maximum
advantages and reliability of these products. Being built on top of the
Avalon Phoenix
framework, it allows server components to be easily developed, deployed
and shared. The documents are managed through a BEEP and/or XML-RPC
interface using a subset of the SEP (Simple Exchange Profile) protocol.
What is SpaceMapper mn8 ?
mn8 is an experimental object oriented scripting language, tightly
integrated with the net, which emulates the concepts at the core of XML
in order to simplify and make as transparent as possible information
extraction and manipulation from the WWW and XML documents.
Written in Java works with most operating systems and allows easy
reuse of the huge number of libraries available trough simple wrappers.
At this point mn8 has concepts for: HTML, HTML-Forms, Cookies, RSS,
OPML, HTTP, FTP, POP3, SMTP, Jabber, BEEP, XML-RPC, SOAP, MBox.
Then what is SpaceMapper ?
The SpaceMapper
effort was born from the classic Internet desire to see if there is a
better way. The effort evolved from an early RFP on the now-defunct
SourceXchange which was awarded to the Romanian open source development
firm noLimits Technologies. The project is Open Source (Apache like license) and was sponsored by the 501(c)(3) non-profit arm of media.org (Internet Mulicasting Service ) and noLimits Technologies.
For any questions related to the SpaceMapper and/or mn8 project please write to the mailto:spacemapper-user@lists.sourceforge.net mailing lists.
MN8 and DataStore
is still very young and far away for reaching it's purpose, so any
feedback, ideas, questions and constructive criticism is more than
welcomed
November 5th, 2002
After I managed to install last week Postgresql under Windows and tested DataStore with disappointing results, today I installed SAPDB and tested again.
Fantastic! I used the RFC reference XML documents from xml.resource.org, 3124 XML documents, small ones around 4M at all, DataStore over SAPDB, and an MN8 script to load all this documents and store them using SEP (the Simple Exchange Profile) over XML-RPC.
The procedure lasted less than an hour, that means over 60 documents per minute, this is the same as Postgresql, but what is different is that by the time the storing of the documents was completed the indexing was to. With Postgesql this requested around 30 hours.
Now, DataStore
stores the document as it receives it (it does not alter it), but to
allow very fast structured searches (get me the documents which have
this element containing this value and this attribute with this value)
it breaks the documents in little entities (prety much as Google
does) entities which can be used efficiently (actually use their
indexes in their queries) by relational databases along with keeping
information about the structure of that entity (part of that element
...). That way you can actually query toons of well or not so well
formatted documents using complex structure information and get the
results instantly.
After the 3124 documents, DataStore extracted 27000 entities (most of the time words).
And in all this time I worked as usually on the same computer.
I can't explain the poor performances of Postgresql,
it does not perform much better on the Linux server either and we tried
many versions, must be something related to their JDBC drivers, in
comparation to IBM DB2 or MySQL is way slower.
However SAPDB rocks and it's free (GPL/LGPL)
with versions for Linux and Windows (and many other), so if you are
looking for an free, industrial strength relation database look no
further. Should I mention that it comes (the Windows version) with
excellent GUI management tools.
June 14th, 2002
If I look at the mn8 cvs commits, I end up (again) at the conclusion
that this was yet another unproductive week, half the number of
commits than last week. Hmm, this is something very hard to believe.
"mn8" is working more than ever, everything is on it's way happily
doing as planned every feature I was afraid of seems to work. This
should be the time to frenetically code for the last 100 matters and
still ... Maybe is just a matter of discipline?
Finally DataStore seems to get on frozen status, no new bugs
discovered so Crow and Atech are 100% procent on "mn8". There is only
one trick issue to be solved around the net centered part. It is very
important and decisive for "mn8" and is about integrating the net
part with the query language in an transparent way. In this way you
could do an each query from an URL and the from command would get the
query, select only the variable that can be transformed in a protocol
specific query (HTTP and HTML pages, beep and SEP queries, file and
filtering, ...) and filter out the documents before being filtered by
the where clause. If this doesn't work (but it will) we have serious
problems, you want to work on a couple of documents from an SEP
subtree and that would mean getting all the documents from the
subtree and filtering them, that is a waste of bandwidth and
processing power. This is what Atech is doing now.
Crow was busy doing some word wrapping and is working on stripping out
text from HTML pages. This could prove usefull to convert them to XML
and also to get some result formated in HTML and send them by email.
"HyperPad" is quite usable and stable now, Borzy still have a few bugs
to crash, but it looks and works quite decently, for an Java
application
Got a funny idea last week! "mn8" doesn't have too many user
interaction possibilities right now, and I don't think the usual ones
(button, text field, ...) have to be directly implemented, but,
concepts could be developed to emulate XForms. Now, having the
concepts for some basic XForms widgets, someone could easily write
a requested interface to the respective concept. Getting a simple
output from the wrapper concept (the one which produce the desired
interface to the concept, just like in MVC) to an external program
which know how to do the rendering on it and to output the resulted
XML to the initial concept and whoala you have the much wanted user
interaction.
The good parts ? First you are not forced to an Java based GUI
application to get the interaction, there are and will be some native
XForms implementations. Second but also important you could provide
a web based form starting from an XForms XML and a nice style sheet
:). Heh you could even have a CLI or a fake user interaction this
way. I can't wayt to get there, but this will be lateter after the
basic "mn8" will work, in a second phase.
November 4th, 2001
mn8: O God, I'm running so late, like never before. I barely think or
do anything else, yet still progress is so slow. The real problem is
that designing an OO interpreter is not so trivial. All the time I have
to come back to revise some design flows. I think that is called
refactoring
. At least I do it!
Spent the whole weekend working like a crazy to refactor a few
things instead of adding functionality, which means four more days of
delay. But I had to do it, it was just not looking and functioning
right. This was also the first time (I remember anyway) of hating Java.
I just don't understand why they did the static
behave the way it does during inheritance. At the end had to use the
Singleton pattern, it works but I'm still not very happy. I will leave
it as it is anyway, can't afford and don't think there is other
solution.
Being so late and still having to work on it, always makes me think
if I'm not like the cowboy programmer in the project management
examples. But mn8 is (some thing which never stops amaze me) working
as it was planned, more ready and more complete every day. But as I
look around me, I don't really see people doing as radical and as much
refactoring as me, and this worries me. Is that possible that the
design was right from the first time, I don't think so, there is no
such thing. I'm afraid that the others rather patch things instead of
doing refactoring.
This time being late had some benefits too. DataStore got a alpha but released Avalon (finally we are not going to release it with a CVS version of Avalon),
lot's of bug fixes, and a brand new SEP interpreter plus a more
stabilized XML-RPC server, and a full blown PHP/XML-RPC example. Yep it
works great. Crow is working on some small Java tools to let us
transforms mails from mbox format in XML and then to feed them to
DataStore. Will need them latter anyway, plus that is a good way of
testing DataStore.
Atech did the BEEP
handler so now you can open an URLConnection to a beep://xxx URL and it
will work, still you have to know what to talk over the connection, but
at least will allow mn8 to open url's transparently. Now he works on
the XML-RPC handler. Let's see how that works out.
A, not to forget about "HyperPad". It got a pair of skin handlers
so you can have skins in it (doesn't really work well, but don't think
is our fault). It amazes me how well the new Linux Java works. It has
better font rendering than the IBM one, and definitely is faster than
under Windows 2000. It wasn't always like that. Linux rulez!
Thank's to neurogato for pointing out that: Alan's code crew text is actually lyrics to the tune of Motorhead's "(We Are) The Road Crew",
yes indeed fits beautifully. BTW, if we are at the SmoothWall chapter it just happent last week to replace an old router based on LRP to a firewall running SmoothWall,
it took us about two hours but only because, we went for the
installation first and then to read the manuals, just like any (in)sane
person would do. Great piece of software!
October 9th, 2001