NGL

Unicode problem

Classic

List

Threaded

9 messages Options

papaiking

Jul 13, 2011; 4:35am

Unicode problem

Newgenlib supports Unicode 32, but Search result in OPAC is empty.
It seem caused by Apache Solr doesn't support Unicode.

I want to input Unicode UTF-8 for Vietnamese character (This is being standard character code in Vietnam) and make search engine support it.
How do I solve this issue?

Thank developer team for your support.

verussolutions

Jul 13, 2011; 5:16am

Re: Unicode problem

Administrator

1. Stop NewGenLib server (apache-tomcat)
2. Go to apache-tomcat-6.0.XX/conf directory. Open server.xml with your favourite text editor
3. In approximately lines numbers 69 to 71 you will find the below lines
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443"/>

You need to add URIEncoding as UTF-8 and maxHttpHeaderSize=16000.
4. Hence your above lines must be replaced with these
<Connector port="8080" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443" URIEncoding="UTF-8" maxHttpHeaderSize="8192" />
5. Start NewGenLib Server
6. Reindex. (Double click on BuildIndex.bat file available in the downloaded InstallNGL3.0)

Now you must see your Vietnamese records

papaiking

Jul 13, 2011; 7:39am

Re: Unicode problem

I did it, but the result no change.
I think the problem at:
MarcReader reader = new MarcStreamReader(input);

MARC4j may use iso8859-1 as default instead of UTF-8.
We need to specify UTF-8 when using MarcStreamReader.
Here is my console output in server:

org.marc4j.MarcException: error parsing data field for tag: 245 with data: aBàn về t�
at org.marc4j.MarcStreamReader.next(MarcStreamReader.java:220)
at newgenlib.marccomponent.conversion.Converter.getMarcModelsFromMarc(Converter.java:469)
at org.verus.ngl.indexing.NewBibliographicSolrIndexCreator.indexingData(NewBibliographicSolrIndexCreator.java:113)
at eof.techProcessing.BuildIndexingPanel.buildIndex(BuildIndexingPanel.java:216)
at eof.techProcessing.BuildIndexingPanel$2.construct(BuildIndexingPanel.java:198)
at tools.SwingWorker$2.run(SwingWorker.java:119)
at java.lang.Thread.run(Thread.java:619)
Caused by: java.io.IOException: subfield not terminated

verussolutions

Jul 13, 2011; 7:55am

Re: Unicode problem

Administrator

Please download the latest Indexer file from http://sourceforge.net/projects/newgenlib/files/NewGenLib/NGL3Indexer/Indexer.zip/download

1. Extract the above zip file. You will get a directory called Indexer. In that directory there is one directory called lib and a file named NGLIndexer.jar. Copy them
2. Paste the above file and directory into your InstallNGL3.0 directory. You will be replacing the old lib directory and old NGLIndexer file already available in the InstallNGL3.0 directory
3. Run NewGenLib Server
4. Double click on BuildIndex.bat in InstallNGL3.0 directory
The indexing must take place now

The latest Indexer is already a part of InstallNGL3.0U1(Update 1)

papaiking

Jul 14, 2011; 5:47am

Re: Unicode problem

There are many functions caused error like above.
I want to explain MarcReader reader = new MarcStreamReader(input, "UTF-8"); cannot run correctly if input stream in Unicode

verussolutions

Jul 14, 2011; 7:25am

Re: Unicode problem

Administrator

Hi,
Thats true.
1. Did you put the new Indexer as instructed earlier?
2. Is your database encoding is set as UTF-8

Can you please send us the database backup for examination by the development team?

verussolutions

Jul 14, 2011; 7:26am

Re: Unicode problem

Administrator

Please email the backup to info@verussolutions.biz

papaiking

Jul 14, 2011; 8:58am

Re: Unicode problem

This is my database properties"

CREATE DATABASE newgenlib
WITH OWNER = newgenlib
ENCODING = 'UTF8';

verussolutions

Jul 14, 2011; 9:45am

Re: Unicode problem

Administrator

In reply to this post by verussolutions

Hi,
The database has been examined. A new Indexer has been uploaded to sourceforge.net. I am pasting the installation instructions for your convenience.
Please download the latest Indexer file from http://sourceforge.net/projects/newgenlib/files/NewGenLib/NGL3Indexer/Indexer.zip/download

1. Extract the above zip file. You will get a directory called Indexer. In that directory there is one directory called lib and a file named NGLIndexer.jar. Copy them
2. Paste the above file and directory into your InstallNGL3.0 directory. You will be replacing the old lib directory and old NGLIndexer file already available in the InstallNGL3.0 directory
3. Run NewGenLib Server
4. Double click on BuildIndex.bat in InstallNGL3.0 directory
The indexing must take place now

-------Reason for problem
We have seen that you created a new library. And the library id for the same is 2. And all the catalog records are created under that new library. We strongly recommend to have only one library and its library id must be 1.
NewGenLib multi-library creation procedures are different and is not done just by creating a row in the library table. Hence we request you to wait till Multi-library routines are available.
Currently to change your library name go to Administration->Configure System-> General Menu -> Library.
Enter library name and put the same string in the Network name also. Save it and restart NewGenlib server.