================================================= RELEASE NOTES FOR THE INFORMIX TIMESERIES REAL-TIME LOADER 1.01.UC2 DATE: 4/16/2001 ================================================= TABLE OF CONTENTS I. OVERVIEW OF RELEASE NOTES II. INFORMIX SERVER COMPATIBILITY III. THE INFORMIX TIMESERIES DATABLADE MODULE IV. REGISTERING THE TIMESERIES REAL-TIME LOADER V. INTRODUCING THE TIMESERIES REAL-TIME LOADER VI. WORKING WITH YOUR CONSULTANT VII. TIMESERIES REAL-TIME LOADER ARCHITECTURE VIII. PREPARING TO USE THE TIMESERIES REAL-TIME LOADER IX. WORKING WITH THE TIMESERIES REAL-TIME LOADER X. THE UTILITY PROGRAMS: RTLMODE, RTLSTAT, AND RTLSHMDMP XI. EXAMPLES XII. PERFORMANCE XIII. GLS SUPPORT XIV. KNOWN PROBLEMS +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ I. OVERVIEW OF RELEASE NOTES ============================= The purpose of these release notes is to make you aware of any special actions required to configure and use the Informix TimeSeries Real-Time Loader on your computer. This document also describes new features and contains information about known bugs and their workarounds. These release notes are written for the following audience: o System administrators who install the TimeSeries Real-Time Loader o Database administrators who control access to databases that use the TimeSeries Real-Time Loader o Developers who write applications using the TimeSeries Real-Time Loader II. INFORMIX SERVER COMPATIBILITY ================================== Version 1.01.UC2 of the TimeSeries Real-Time Loader has been tested on Informix Dynamic Server, Version 9.21. Version 1.01.UC2 is not compatible with Informix Dynamic Server with Universal Data Option, Version 9.14, or earlier versions of the server. III. THE INFORMIX TIMESERIES DATABLADE MODULE ============================================= Version 1.01.UC2 of the Informix TimeSeries Real-Time Loader requires that you also register the Informix TimeSeries DataBlade module, Version 4.01, in your database. IV. REGISTERING THE INFORMIX REAL-TIME LOADER ============================================== To use the TimeSeries Real-Time Loader, you must register it in each database in which it will be used. Before you register the TimeSeries Real-time loader, your consultant will customize your installation. See the "DataBlade Module Installation and Registration Guide" (formerly known as the "BladeManager User's Guide") for instructions on how to register DataBlade modules. V. INTRODUCING THE TIMESERIES REAL-TIME LOADER =============================================== The TimeSeries Real-Time Loader provides real-time access to data from data feeds such as Reuters, via SQL queries to an Informix database. The TimeSeries Real-Time Loader consists of two parts: o RTLoader, which resides outside the database. o The Real-Time Loader DataBlade module, which runs as a datablade module within the database. RTLoader reads from data feeds and places data into shared memory. (Shared memory is memory available to all processes, as opposed to memory allocated for one program/application.) The Real-Time Loader DataBlade module periodically flushes data from shared memory to disk using the TimeSeries DataBlade module. The Real-Time Loader DataBlade module also provides access to data whether it is on disk or in shared memory. It does this transparently, you do not need to know where the data actually resides. VI. WORKING WITH YOUR CONSULTANT ================================ The TimeSeries Real-Time Loader is only available with the purchase of consulting. Your consultant builds feed handlers that are customized to the particular data feeds you are using and the type of data you want to extract. The first step is for your consultant to produce a document that outlines the work and time involved in customizing your installation. VII. TIMESERIES REAL-TIME LOADER ARCHITECTURE ============================================= The TimeSeries Real-Time Loader runs a client outside the database (RTLoader) that listens to data feeds and sends data to shared memory. The timelag for data to arrive in shared memory is less than 1/10 second, so the data is still real-time for our purposes. Periodically, data in shared memory is flushed to the database and to disk. You can control the time interval, which is usually set to about 20 seconds. Data is *not* removed from shared memory at this point, so this data can still be accessed in real-time; you don't need to retrieve it from the database. To avoid shared memory filling up, you control when to clear out space by setting high- and low-water marks in the Real-Time Loader configuration file, rtlconfig. When high-watermark is reached, The TimeSeries Real-Time Loader frees up ticks that have already been flushed to disk until low-water mark is reached. (A tick is a record consisting of a timestamp plus one or more fields of data.) The TimeSeries Real-Time Loader provides SQL functions and a C-API to enable you to query for data and manipulate it, as described in "Working With The TimeSeries Real-Time Loader", below. The TimeSeries Real-Time Loader requires that you also register the Informix TimeSeries DataBlade module, Version 4.01, in your database. VIII. PREPARING TO USE THE TIMESERIES REAL-TIME LOADER ====================================================== In order to use the TimeSeries Real-Time Loader, you must: o Create a "loader" virtual processor. o Create a database table to store data (your consultant helps you with this step). A. Creating a Loader VP Class --------------------------------------------- Define a loader VP (virtual processor) class in your ONCONFIG file using the following format: VPCLASS Loader, noyield, #_of_classes_to_run_simultaneously Consider the following when you decide how many classes to run simultaneously. Ideally, you want one loader VP per container and one container on each disk. However, there is no advantage in having more loader VPs than the number of processors available on your machine. It is permissible to have more containers than loader VPs. B. Inserting Data into a Table ------------------------------------------ To insert data, you provide a table with a TimeSeries column that contains a dummy time series of the correct data type for each container that you plan to populate. Your consultant creates the data type. You can either add data to each container in round-robin fashion (the default), or you can loadbalance between containers. If you want to loadbalance, discuss this with your consultant. The metadata of your time series must contain a string of format 'xx|sym_name', where xx is an integer between 0 and 99 and sym_name is a name identifying your data. For example, '01|IBM' may identify that the data is IBM stock data from the Reuters feed. The system table, TSF_MAP, maintains this information. IX. WORKING WITH THE TIMESERIES REAL-TIME LOADER ================================================ The TimeSeries Real-Time Loader provides SQL functions and a C-API to enable you to query for data and manipulate it. You can also create virtual tables if you want to view time series data as regular tabular data. A. RTLoader Command-Line Options ----------------------------------------------------- You use the rtloader command to start the RTLoader client program. The rtloader command has the following options shown below. Usage: rtloader -cdt -c The configuration file to use. -d The database to connect to. -t The table to hold tick data. -e Locates the error message file. B. Using the Error Message File ---------------------------------------------- The error message file, rtlerrmsg.txt, contains all the error messages that can be displayed by RTLoader. When an error is raised, the error message is written to the syslog. You can put your own error messages in the rtlerrmsg file. Informix recommends you use error numbers 800 and higher, with a maximum error number of 999. Do not alter the pre-defined messages in the file, otherwise you may miss important warnings that are written to the syslog. RTLoader locates the error message file when you specify its full path name with the -e option in the rtloader command. C. Using the Configuration File -------------------------------------------- The TimeSeries Real-Time Loader configuration file, rtlconfig, is located in the directory $INFORMIXDIR/extend/RealTimeLoader.1.01.UC2/rtlroot/examples. It's format is identical to the Informix ONCONFIG file that helps configure your server. The rtlconfig file contains explanatory comments, and you can make adjustments in this file. Among other things, the configuration file controls parameters for recovery, such as secondary host and port parameters that can direct reader threads to alternate sources for the incoming tick data. D. Running Queries ---------------------------- When you issue a query, the TimeSeries Real-Time Loader always checks shared memory first for the data your query is searching for. If historically older data is not in shared memory, it checks for the data in the database. For maximum performance, write reasonable queries. For example, do not query for data for the period 8am to 4pm if the data on the feed starts at 9am. The TimeSeries Real-Time Loader will not know why the first hour of data is missing from shared memory and will be forced to query the database. E. SQL Functions -------------------------- RTL_NElems(TimeSeries, datetime year to fraction(5) default null, datetime year to fraction(5) default null, integer default 0) returns integer The RTL_NElems function returns the number of elements in the given time series between the two specified dates. If the first date is NULL, the count starts at the beginning of the time series. If the second date is NULL, the count ends at the last element of the time series. The flags argument must be either 0 or 0x80 (128 decimal); setting flags to 0x80 causes Real-time loader to search only in shared memory, not on disk. RTL_Elem(TimeSeries, datetime year to fraction(5), lvarchar default NULL, integer default 0) returns row The RTL_Elem function searches for an element in the given time series. The arguments are: 1. The time series to be searched. 2. The search date. 3. A string that contains one of the following boolean operators: '<', '<=', '=', '==', '>=', '>' 4. A flags argument that can either be 0 or 0x80 (128 decimal); setting flags to 0x80 causes Real-time loader to search only in shared memory, not on disk. RTL_Clip(TimeSeries, datetime year to fraction(5) default null, datetime year to fraction(5) default null, integer default 0) returns TimeSeries The RTL_Clip function extracts data between two timepoints in a time series and returns a new time series containing that data. RTL_LastElem(TimeSeries) returns row The RTL_LastElem function returns the last element in the given time series. If the time series is empty, NULL is returned. RTL_FirstElem(TimeSeries) returns row The RTL_FirstElem function returns the first element in the given time series. If the time series is empty, NULL is returned. TSCreateVirtualRtlTab(lvarchar, lvarchar, integer default 0, lvarchar default NULL) The TSCreateVirtualRtlTab procedure creates a virtual table for the time series loaded by the TimeSeries Real-Time Loader. The arguments are: 1. The name of the virtual table. 2. The name of the original time series table. 3. The flags argument. This can be any of the values documented for the TSCreateVirtualTab procedure in the Informix TimeSeries DataBlade Module User's Guide, plus the additional values: o 0x400 (1024 decimal) Causes the TimeSeries Real-Time Loader to start a scan of an irregular time series with the first element found in the given range. (The default behavior is to take the first element before the starting timestamp and give it the new start timestamp.) o 0x20000000 (536870912 decimal) Causes the TimeSeries Real-Time Loader to access only shared memory, not disk. 4. The TimeSeries column in the original table that should be used to populate the virtual table. RTL_Release() returns lvarchar The RTL_Release function returns version information for the TimeSeries Real-Time Loader. F. C-API Functions ---------------------------- ts_rtldesc * ts_rtl_open_by_symbol(MI_CONNECTION *conn, mi_string *symbol, mi_integer tstruct_id, mi_integer flags) The ts_rtl_open_by_symbol() function opens a time series. ts_rtldesc * ts_rtl_open_by_ts(MI_CONNECTION *conn, ts_timeseries *ts, MI_TYPEID *typeid, mi_integer flags) The ts_rtl_open_by_ts() function opens a time series. void ts_rtl_close(ts_rtldesc *rtldesc) The ts_rtl_close() procedure closes the associated time series. ts_rtlscan * ts_rtl_begin_scan(ts_rtldesc *rtldesc, mi_integer flags, mi_datetime *start, mi_datetime *end) The ts_rtl_begin_scan() function begins a scan of elements in a time series. mi_integer ts_rtl_current_offset(ts_rtlscan *scan) The ts_rtl_current_offset() function returns the offset for the last element returned by ts_rtl_next(). mi_datetime * ts_rtl_current_timestamp(ts_rtlscan *scan) The ts_rtl_current_timestamp() function finds the timestamp that corresponds to the current element retrieved from the scan. mi_integer ts_rtl_next(ts_rtlscan *scan, sm_tick_rec *ret_tick) After a scan has been started with the ts_rtl_begin_scan() function, elements can be retrieved from the time series with the ts_rtl_next() function. void ts_rtl_end_scan(ts_rtlscan *scan) The ts_rtl_end_scan() procedure ends a scan of a time series. It releases resources acquired by ts_rtl_begin_scan(). Upon return, no more elements can be retrieved using the given ts_tscan pointer. sm_tick_rec * ts_rtl_elem(ts_rtldesc *rtldesc, mi_datetime *tstamp, mi_integer *isNull, mi_integer cmp, sm_tick_rec *ret_tick) The ts_rtl_elem() function returns an element from the time series at the given time. sm_tick_rec * ts_rtl_first_elem(ts_rtldesc *rtldesc, mi_integer *isNull, sm_tick_rec *ret_tick) The ts_rtl_first_elem() function returns the first element in the time series. sm_tick_rec * ts_rtl_last_elem(ts_rtldesc *rtldesc, mi_integer *isNull, sm_tick_rec *ret_tick) The ts_rtl_last_elem() function returns the last element from a time series. MI_ROW_DESC * ts_rtl_rowdesc(MI_CONNECTION *conn, MI_TYUPEID *typeid) The ts_rtl_rowdesc() function returns a row descriptor for the specified type ID. mi_string * ts_rtl_ts_to_symbol(ts_timeseries *ts, mi_integer *tstruct_id) The ts_rtl_ts_to_symbol() returns the information stored in the time series metadate. void * ts_rtl_column_by_number(ts_rtldesc *rtldesc, ts_tselem elem, mi_integer colnum, mi_boolean *isNull) The ts_rtl_column_by_number() function extracts the individual pieces (columns) of data from an element. The column 0 (zero) is always the timestamp. ts_timeseries * ts_rtl_create_with_tsdesc(MI_CONNECTION *conn, ts_rtldesc *tsdesc, mi_integer flags, mi_datetime *origin, mi_integer threshold, mi_string *container_name, mi_integer nelems, mi_lvarchar *metadata, MI_TYPEID *meta_typeid) The ts_rtl_create_with_tsdesc() function creates a time series with user-defined metadata attached. mi_integer ts_rtl_inmem_size(ts_rtldesc *rtldesc) The ts_rtl_inmem_size() function returns the number of ticks that are in shared memory for the specified time series. X. THE UTILITY PROGRAMS: RTLMODE, RTLSTAT, AND RTLSHMDMP ==================================================== The TimeSeries Real-Time Loader includes three utility programs that perform the following functions: o rtlmode Controls how RTLoader shuts down and which port its reader threads listen on. o rtlstat Reports the status of the shared memory segment used by RTLoader. o rtlshmdmp Produces various reports directly from shared memory, mainly used for debugging. A. RTLMODE --------------------- Communicates with RTLoader using a message queue to control how it shuts down or which port its readers listen on. Usage: ./rtlmode -fkmrzxy -f {p/s/o}# Change the feed source {Primary/Secondary/Off}. -k Shut down RTLoader completely. -m Re-start ALL feeds to RTLoader. -r Force RTLoader to re-read the configuration file. -z Shut down ALL feeds to RTLoader. -x Shut down immediately without flushing shared memory. -y Do not require confirmation. -f option ------------ The host and port that supply each reader with data is set in the configuration file, rtlconfig (described in the section IX.D. Using the Configuration File). This allows you to identify a secondary source; if the primary source fails, RTLoader automatically switches to the secondary source. You can switch manually to the secondary source by using the -f option. The argument to the -f option contains one of the following letters: p for primary s for secondary o for off plus a number that corresponds to the feed number in the configuration file. For example, to switch feed02 from the secondary to the primary source, use the following command: rtlmode -fp2 If a feed has been switched off, it is switched back on again when you specify the primary or secondary feed. -k option ------------- When you use the -k option to shut down RTLoader, RTLoader first closes all the reader threads, then allows the inserter threads to clear any pending inserts. After these are complete, the inserter threads are closed down. Lastly the flusher threads are forced to do a full flush and then terminate. This way, all the data in shared memory is written to disk prior to RTLoader shutting down. -m and -z options ------------------------- You can switch off all feeds using the -z option, or switch them all back on again with the -m option. -r option ------------ You can adjust the parameters listed below dynamically, by first changing them in the configuration file and then using the rtlmode -r command. RTLoader re-reads the configuration file and makes the necessary changes. o PURGE_HWM Start purging the tick pool when shared memory becomes this full. eg. 70 % o PURGE_LWM Stop purging the tick pool when shared memory is this full. eg. 65% o FLUSH_PERIOD Minimum number of seconds between flushes o FULL_FLUSH Number of partial flushes before a full flush. o MAX_FLUSH_COUNT Max number of ticks to flush in a single go. o MIN_TICK_COUNT Minimum number of ticks required to enable a symbol to be flushed. o NODISKWRITE Write to disk or not. o ENABLE_NEWSYM Allow symbols to be created on the fly or not. -x option ------------ When you use the -x option to shut down RTLoader, threads used by RTLoader are terminated immediately, without clearing any pending jobs. -y option ------------ Options -x and -k require confirmation, unless you also specify the -y option. B. RTLSTAT ----------------- Attaches to the shared memory segment used by RTLoader and reports its current status. Usage: ./rtlstat -dfirm -d Show reader stats. -f Show flusher stats. -i Show inserter stats. -l Shows spinlock stats. -r Repeat options every seconds (default: 5) -m Show shared memory usage. -d option ------------ Reader stats show how many times a reader thread has tried to connect to its source and which source it is currently listening to. The information includes how many messages and ticks have been received and how many bytes that represents. -f option ----------- Flush stats show the time of the last flush for each flusher thread, how long it took to complete, and how many ticks were written to disk. The information includes a cumulative count of ticks flushed by each flusher thread and the total time spent flushing. -i option ----------- Inserter stats show how many symbols have been inserted dynamically by each of the insert threads, and how many are currently waiting to be inserted. -l option ----------- Displays spinlock stats. Spinlocks are used to keep the various groups of threads (such as readers, flushers, and purgers) from clashing with eachother and from accessing a data entity currently being processed by another thread. The -l option displays a report which details how many locks have been obtained by each group of threads and how often these locks have failed. Since each failure puts the thread into a spin, a cumulative total of spins is reported, along with the maximum spin. -r option ------------ You can set the report generated by rtlstat to automatically repeat by using the -r option. If you do not add an argument to the -r option, the default is 5 seconds. -m option -------------- The shared memory screen shows how many ticks have been received during the current run, how many are currently available on the free list, and how many are actually in shared memory. It also shows the percent utilization of both the security and tick pools. C. RTLSHMDMP ------------------------ This program outputs the TimeSeries container records and security headers held in shared memory. You cannot view individual ticks, because the structure of a tick is determined by each customer's situation. Usage: ./rtlshmdmp -defhimptrx -d List symbols by hash bucket. -e Dumps error messages in the specified file. -f File to hold output. -h Show hash bucket distribution (summary or detail). -i Show initialization information only. -m Show tsf_map table. -p Show symbol profile. Purgable or Flushable symbols. -t Arrange by Timeseries (summary or detail). -r Raw output: TSC, Sec.hdr & Sec.hdr Flush. -x Only print details for specified symbol. Reports are written to stdout by default. -d option ------------ The basic report lists all symbols currently in shared memory, grouping them by the hash bucket to which they have been assigned. The hash bucket is a mechanism to allow high performance lookups of individual symbols. -e option ------------ The -e option allows you to dump the error messages held in the specified error message file, usually for debugging purposes. Since the error message file is in Ascii format, you can view the file with any editor. However, by using rtlshmdmp, you can determine whether any messages will fail to be loaded by RTLoader. -f option ----------- You can redirect output to a file by using the -f option. WARNING: The output from this program can be extremely verbose, especially if there are large volumes of securities stored in the security pool. -h option ------------ You can analyze the distribution of securities amoung hash buckets by using -h option . This report lists all hash buckets and the securities within them (the summary report lists only counts of securities within hash buckets). For optimum performance, all hash buckets should have similar counts of securities. -i option ----------- The -i option shows where the various pools in shared memory are located, how large the security and tick structures are, and how many ticks are currently held in the tick pool. -m option ------------- The tsf_map structure held in shared memory can be dumped using the -m option. This maps a time series row type to an internal tstruct_id. -p option ------------ The profile reports produced by the -p show how many securities have ticks that are either flushable or purgeable. The flushable report also lists ticks that would be flushed during a partial flush. -t option ------------ To view the securities group by TimeSeries container, use option -t . This report lists all securities (the summary report lists only counts). -r option ------------ The raw outputs dump either TimeSeries container records or security header records directly from their respective pools. The records are not logically ordered; the sequence is determined by the time the element was created. During re-starts it is possible for data to appear in this report that is no longer valid, because the various shared memory pools are not re-initialized completey, instead each element is initialized prior to being re-used. -x option ------------ The -x option lists details for an individual symbol. There may be more than one record in shared memory for any given symbol, because symbols can come from different groups of feeds, such as Reuters and Nasdaq, or because gap filling has been performed. XI. EXAMPLE ========== This product includes an example that demonstrates how to write a clip function. The example is called test_clip.c and is located in the $INFORMIXDIR/extend/RealTimeLoader.1.01.UC2/rtlroot/examples directory. This directory includes a make file, Makefile.sol, that you can use to compile test_clip.c, for example: make -f Makefile.sol MY_DATABASE= In this command, is the name of your database. XII. PERFORMANCE ================ After you load data into a TimeSeries column, run the following command: update statistics for table tsinstancetable; This improves performance for any subsequent load, insert, and delete operations. XIII. GLS SUPPORT ================= This section describes the support for GLS that the TimeSeries Real-Time Loader, Version 1.01.UC2, provides. A. CHARACTER I/O ----------------- Character I/O is not GLS compliant. This affects: o Conversions between time series and character strings o Use of the bulkload function B. MULTIBYTE CHARACTERS ------------------------ The following character strings can contain multibyte characters: o Calendar names o Calendar pattern names o Container names o Table names o Column names o Character fields inside a time series C. DATETIME AND NUMBER INPUT ----------------------------- Datetime and number input is not GLS compliant. You must use the default (U.S.) format for datetime and number input: "%Y-%m-%d %H:%M:%S:%F5" (See the "Informix Guide to GLS Functionality" for more information.) If you are not in the default locale, set the GL_DATETIME environment variable to the U.S. locale to avoid confusion and conflicting datetime interpretations in other parts of the server or other DataBlade modules. D. FLOATING POINT INPUT ------------------------ Floating point input uses the default (U.S.) format: o An ASCII period '.' is the decimal separator. o The ASCII plus and minus signs '+' and '-' are used. This is not affected by the locale or by the DBMONEY environment variable. E. DECIMAL NUMBER AND MONEY INPUT ---------------------------------- Decimal number and money input is largely GLS compliant. However, plus and minus signs must be the ASCII '+' and '-'. F. ERROR MESSAGES ------------------ Error messages can be fully localized. XIV. KNOWN PROBLEMS ===================== There are no known bugs in this release of the TimeSeries Real-Time Loader.