The problems of
seismological data mining via Internet[1]
Gitis V.G., Weinstock A.P.
Institute for Information Transmission Problems,
Russian Academy of Sciences,
B.Karetnyi Lane, 19, 101447, Moscow, GSP-4, RUSSIA,
e-mail: gitis@iitp.ru,
http://gis.iitp.ru
1. Introduction
Internet opens wide
possibilities for specialists to access the remote databases. Due to
considerable interest of society to ecology and its importance for vital
activity it is reasonable to supply Internet with databases on regional natural
hazards and information tools for processing and analysis of geological and
geophysical data. The tools have to help the specialists in spatial data mining
of natural phenomena and processes.
The analytical GIS
GeoProcessor [1] (http://www.iitp.ru/projects/geo, http://gis.iitp.ru/,
http://borneo.gmd.de/and/geoprocessor) is analytical Web GIS for presentation,
modeling and analysis of environmental
and in particularly seismological data. The system supports remote access to
geological, geophysical and geographical databases and processing, modeling,
analysis and spatial data mining. It helps to estimate and detect properties of
geological environment using the set of the plausible inference methods, such
as: similarity with precedents, similarity with expert expressions in fuzzy
logic constructions, membership functions, nonparametric regression.
The motivation of this paper
is development of the seismological problem domain for information technology
in scope the IST Project SPIN! [2]. The paper outlines the basic problems of
seismic impact forecasting, gives specification of WWW users of seismological
GI, considers four basic methods of data mining referring to seismological
applications, and discusses the peculiarities of seismological data mining in
conclusion.
2. Seismological problem domain
2.1. General methodology
There are three principal problems in seismic impact forecasting:
1. Seismic hazard assessment;
2. Earthquake prediction.
3. Induced hazard assessment.
General methodology for the solution of these problems consists of three main steps:
(i) Assessment of the relationship between seismotectonic attributes or between natural and man-made objects.
(ii) Prediction of the target seismological attributes or detection of the target seismological objects.
(iii) Representation of the result for cartographic exploration.
2.2. Seismic hazard assessment
Seismic hazard assessment consists in two problems: estimation of seismic regime parameters and estimation of seismic shakeability (Fig. 1). Seismic regime according to Gutenberg-Richter model [3] is determined by the following 3 parameters: intensity of seismic flow, coefficient of decreasing linear relationship between the logarithm of a number of events and their magnitudes, and maximal possible magnitude (Mmax) of expected earthquake. Shakeability defines seismic impact on the earth surface in the MSK-64, EMS-98 [4] intensity scales or in acceleration scale. Shakeability at the point is calculated as a sum of seismic impacts of all seismic sources taking into account spatial damping of seismic energy with distances from seismic sources.
The following initial data are used for seismic hazard assessment: earthquake catalogue, data about state and tendencies of evolution of geological environment, expert knowledge/hypotheses about the geodynamic regional models and expert solutions.
The most difficult and important problem is estimation of Mmax spatial distribution [5, 6]. Complexity of the problem is necessity to estimate rare and extreme event under the condition of incomplete information. Economy and social consequences due to underestimation or overestimation of Mmax define importance of the problem. Problem solution is based on historical facts about earthquake repetition and on suggestion about relationship between Mmax and seismotectonic earth crust properties, which are very slowly changed in time.

Fig. 1. Sesimic hazard assessment.
2.3. Earthquake prediction
The problem [7, 8, 9, 10, 11, 12, 13] consists in detection of effective earthquake precursors and in using the precursors for estimation of time, place and energy of expected earthquake (Fig.2). There are no effective solutions of this problem till nowadays in spite of the great common activity of specialists of different disciplines. Earthquake catalogues, geomonitoring time series, data about stationary seismotectonic properties of geological environment, and expert knowledge and hypotheses about earthquake precursors are used to solve this problem. Complexity of the problem consists in the great uncertainty in the model of earthquake preparation process, limited volume of the measurements, high level of noise in data, and presence of different effects, which trigger the earthquakes.

Fig. 2. Earthquake prediction.
2.4. Induced hazard assessment
Earthquake can induce
another natural and man-made catastrophes (Fig. 3). The model of induced hazard
assessment was proposed in [14, 15]. It considers a system that consists of
interacting subsystems, such as different elements of the environment, economical
structures, or nodes of energetic or information networks. Each of these
subsystems can be in several states: the normal state and a number of states
representing some degree of damage. It is supposed that a damage in a certain
subsystem can cause a damage in another one. The model allows to incorporate
the following data: a set of hazard objects, expert knowledge about possible
damage states of the objects, expert knowledge about paired links between the
objects, estimations of probabilities of spontaneous catastrophes, the expert
evaluations of probabilities for one catastrophe to induce directly another
catastrophe. The model is presented as the oriented graph nodes of which are
all possible catastrophes and arrows are probabilities of induced catastrophes.

Fig. 3. Induced hazard
assessment.
3. Users of seismological Web GISs
Seismological
Web GISs [17] have two classes of users: seismological GI suppliers and
seismological GI consumers.
The
objectives of seismological GI suppliers are the following:
1.
Dissemination of
knowledge on historical and current seismicity.
2. Training of actions on seismic
disaster mitigation for individuals.
3. Support
of administrative solutions for seismic risk mitigation.
4. Support
of scientific research.
GI consumers can be divided in four groups: citizens,
administration, students, and experts. The queries of seismological GI
consumers are the following:
Citizens:
1. Current and historical seismicity.
2. Documents on seismic hazard and seismic risk mitigation for individuals.
Administration:
1. Seismic hazard and induced hazard assessment.
2. Actions on seismic risk mitigation and insurance policy.
Students:
1. Training in seismic hazards assessment.
2. Training in earthquake precursor analysis.
3. Training in estimation of seismological parameters.
Experts:
1. Seismic hazard assessment and developing the seismic zonation maps.
2. Investigation of new earthquake precursors.
3. Earthquake prediction in real time.
4. Induced hazard assessment.
5. Seismic risk assessment
4.
Spatio-temporal seismic data analysis.
4.1. Seismological
data and types of data analysis
Seismological data base contains
information about seismological and seismotectonic entities, attributes and
relationships between entities and attributes. Entities are presented by geographical
objects and 2D or 3D grid data. The objects include the polygons for
representation of geological and administrative zones, the points for
representation of earthquake catalogues, cities, dangerous man-made objects,
seismological stations, the lines for representation of geological faults,
lineaments, roads, topographical elements, the network of geomonitoring data
stations for measurement of seismotectonic time series. Grid data are used for
spatial and spatio-temporal presentation of seismotectonic properties of
geological media.
The following four types of
GI analysis are used in seismological GISs for solution of the problems
specified in the previous part of the paper.
· Estimation
of the relationships between the attributes.
· Estimation
of the relationships between the objects.
·
Estimation and prediction of the target GI attribute.
· Detection
of the target GI objects.
4.2. Estimation of the relationships between
the attributes
There are two basic methods
for analyzing the relationships between the seismotectonic attributes:
(i)
Correlation and factor analysis
of the attributes.
(ii)
Statistical and logical
inference of the relationships between the attributes.
Correlation
and factor analysis help to discover the
peculiarities of seismotectonic structure and tendency of evolution for the
region under study. Plausible inference methods are aimed to determine the
relationships, which could be applied for the solving the forecast problems.
An example of the inferred relationship between maximal
magnitude of expected earthquake Mmax
and geological and geophysical attributes for the Caucasus region [5] is
presented in graphical form in Fig. 4. The
relationship is
,
where:
,
,
are increasing
piece-linear functions;
x1 is a spatial attribute of the faults, which is equal
to half sum of the closeness to thrusts active in Cenozoic period (y1) and closeness to
strike-slip faults (y2)
active in the same period
,
, i=1,2, t is the raster point number,
is the distance
from the grid point t to the nearest
thrust (fault), R = 50 km;
x2 is the absolute value of the post-Sarmatian vertical
tectonic movement velocity gradient in 10-9/year;
x3 is the
anomaly of the upper mantle P-wave
run time variation in seconds,
, where
is
the upper mantle P-wave run time variation in the point with geographic
co-ordinates l and j, R = 30 km.

Fig. 4. Relationship between Mmax and thematic attributes for the Caucasus region,
.
4.3. Estimation of the relationships between
the objects
Three methods of analysis the relationships between
the seismotectonic objects are the most typical:
(i)
Object classification and
clustering.
(ii)
Estimation of geometric
properties of seismic process.
(iii)
Induced hazard assessment
(modeling of scenarios and estimation of probability of induced catastrophes).
Method
of classification and cluster analysis are usually apply to select out
homogeneous groups of earthquakes or to analyze earthquake migration. Geometric
properties of seismic process describe spatio-temporal variations of the
relationship between the earthquakes. The parameter of fractal dimension of the
earthquakes (D-value) is usually measured for this aim. D-value is close to 1
in the area where earthquakes form the linear groups and it is close to 2 in
the case of diffusive spatial earthquake distribution. The methods of induced
hazard assessment use the paired links between natural and man-made objects in
order to model scenarios of catastrophe evolution.
An
example of catastrophe chain induced directly by the earthquake is presented in
Fig. 5. The earthquakes with energy e1, e2, e3
induce the landslides with body rocks m1 and m2.
The landslides create dams and water body. Probabilities of earthquakes are
given by vector pT=(p1, p2, p3). Probabilities of ei
earthquake induces a landslide mj
is given by the 3x2 matrix Pem=(P(ei,mj). Two catastrophes of
water body are possible: dam overfilling or dam crush. Let us Do(m) is 2x2 diagonal matrix, probability of dam overfilling after the
landslide mj; probability
of not overfilling for the dam is Dn(m)=E-Do(m); probability of mudflow after
overfilling is QoT(m)=(Qo(m1),Qo(m2)); probability of mudflow
after the dam crush is QnT(m)=(Qn(m1), Qn(m2)); U is
probability of the structural damage after mudflow. Then the probability of
indirect structural damage after the earthquake in matrix notation is given by
the following expression:
L= pT Pem(Do Qo+Dn Qn)U

4.4. Estimation and prediction of the target GI
attributes
(i)
Generation of new geographical
object attributes or layers.
(ii)
Estimation of spatio-temporal
distribution of seismotectonic attributes.
(iii)
Forecasting of the seismic
parameters.
Generation
of new geographical object attributes or layers is carried out by the
attributes and geometry of another layers. For example, it is possible to calculate
seismological attributes of the polygon or point by the earthquake epicenters
located at the polygon or point buffer zones.
An example of estimation of
spatio-temporal distribution of seismitectonic attribute is the variation of
minimal representative earthquake magnitude Mmin
for the East Mediterranean region. The result presented in Fig. 6. was obtained
in cooperation with G.Papadopoulos in scope of EC Copernicus Project ASPELEA
(contract IC 15 CT97 0200). The less Mmin
then the sensitivity of seismic network is higher. The variations of Mmin manifested that
sensitivity of seismic network was significally improved firstly in 1988-1991
years and the second time in 1997-2000 in the north-west area of the region.
Another example refers to
forecasting the properties of seismic process. It is an application of the RTL
criterion [8] for detection of earthquake precursors made in scope of ASPELEA
project. Cleaned from aftershocks earthquake catalogue with 1551 events
occurred from 01.01.1964 till 06.09.1999 (one day before the earthquake) within
the circle with 100 km radius and the center in the Athens earthquake, M=5.9,
7.09.1999, latitude=38.15, longitude=23.62 was analyzed. The RTL time
series is presented in Fig. 7. The significant positive anomaly before the
Athens earthquake detects a set of forshocks which could be interpreted as
earthquake precursor.


Fig. 6. The sections
of 3D raster of Mmin
variation from 1967 to 2000 with 3 year interval.

Fig.
7. Estimation of time series for the Athens
earthquake, m=5.9, 7.09.1999, lat=38.15, long=23.62, by RTL -criterion
4.5. Detection of the target GI objects
Two types of seismological object detection can be
considered:
(i)
Delineation of seismic source
zones.
(ii)
Earthquake prediction.
In the fist case the problem
consists in estimation of seismic source geometry by all available
seismological, geophysical and geological data. In the second case the
predictive problem consisting in assessing the time, coordinates and earthquake
energy is solved with the help of preceding data only.
An example of earthquake
prediction made retrospectively refers to Tangshan earthquake with magnitude M=7.8 occurred at north-eastern China on
July 28, 1976 [13]. Daily time series of geophysical and hydrogeological parameters
measured at 10 stations based on the observation period ranging from the
beginning of 1972 till July 27, 1976 (24 hours before the Tangshan earthquake)
were analyzed. Four sections of 3D grid data corresponding to earthquake
precursor and the seismic stations shown by triangles are presented in Fig 8.
It also shows tectonic faults and epicenters of the future Tangshan earthquake
and of the two strongest aftershocks with magnitudes 6.9 and 7.0. The anomaly
corresponding to earthquake precursor is evolved from Beijing and Tangshan
areas and located at the vicinity of Tangshan epicenter.

Fig. 8. Precursor evolution in 60-7 days before the Tangshan earthquake,
China, M=7.8, 28.07.1976. The value of anomaly is
measured in values
of standard deviations of stationary process multiplied by 100.
5. Some
peculiarities of seismological data mining
(i) Two types of seismological predictive data mining are
considered:
A)
Prediction based on spatial and
spatio-temporal geodata at the vicinity of the target object under study:
.
B) Prediction based on the ratio between the objects:
.
(ii) There are the following peculiarities in seismological
predictive data mining:
A)
Incomplete information
(incomplete data, absence of mathematical model).
B) Uncertainty
in data
and expert knowledge (noise in data, ambiguity in expert knowledge).
(iii) There are the following
peculiarities in seismological GI:
A)
Many types of spatio-temporal
data (grid data, points, lines, polygons, images, time series, documents).
B)
Huge earthquake catalogues
involving up to tens and hundreds thousand events.
(iv)
Principle peculiarity of
seismological GIS architecture consists in integration of vector and grid data
processing. This integration is essential for three following types of
operations:
A)
Estimation of spatio-temporal
properties of geographical objects with the help transformations their
geometric and thematic data into 2D and 3D grid data.
B)
Estimation of the attributes of
geographical objects using evaluation of the functions of 2D and 3D grid data
within the geographical object buffer zones.
C)
Detection of target geographical
objects from 2D and 3D grid data.
References
1. Gitis
V., Dovgyallo A., Osher B., Gergely T. GeoNet: an information technology for
WWW on-line intelligent Geodata analysis. // Proceedings of the 4th
EC-GIS Workshop. Hungary. 1998. JRC of EC. P. 124-135.
2. May M. Spatial Knowledge
Discovery: The SPIN! System // Proceedings of the 6th EC-GIS
Workshop. Lyon, France. 2000. JRC of EC.
3. B. Gutenberg & C. H.
Richter. Earthquake magnitude, intensity, energy, and accelaration, Bull.
Seism. Soc. Am. 46, 105-145 (1956).
4. European
Seismological Commission. Activity Report 1996-1998. Proceedings XXVI General
Assembly, August 24-28, 1998. Tel Aviv, Israel: Geophysical Institute of
Israel. 58 p.
5. Gitis
V. GIS Technology for the Design of Computer-Based Models in Seismic Hazard
Assessment // Geographical Information Systems is Assessing Natural Hazards,
A.Carrara and F Guzzetti (eds.). 1995. Kluver Academic Publishers. P. 219-233
6. Gitis V., Vainchtok
A., Tatevosjan R. Maximum expected magnitude assessment in GEO computer
environment: case study // Natural Hazards 17. 1998. Kluver Academic
Publishers. Netherlands. P. 225-250.
7. European Macroseismic
Scale 1998 (EMS-98). Ed. G.Grunthal // Cahiers du Centre Europeen
de Geodynamique et de Seismologie. Vol. 15. 1998 97 p.
8. Keilis-Borok
V.I. Intermediate-term earthquake prediction // Proc. Nat l. Acad. Sci. USA,
Vol. 93. April 1996. P. 3748-3755.
9. Sobolev
G.A., Zavialov A.D. Earthquake prediction. A concentration criterion for seismicically active faults.
Earthquake prediction. National Report of IASPEI and IUGG, 1995 – 1998. Geophysical Committee of RAS. 1999. C. 22-27.
10. Wyss
M., Console R., Muggu M. Seismicity rate change before the Iprina (M=6.9) 1980
earthquake // Bull. Seism. Soc. Amer., 1997, V.87. P. 318-326.
11. Zhang
Guomin, Zhang
Zhaocheng. The study of
multidisciplinary earthquake prediction in China // Journal of Earthquake
Prediction Res. 1992. V. 1, № 1. H. 71-86.
12. Zschau
J. et al SEISMOLAP: A new approach to prediction // Proceeding of the
International Conference on Earthquake prediction: state of art. Council of
Europe, Strasbourg. 1996. P. 444-453.
13. Gitis
V.G., Osher B.V., Pirogov S.A., Ponomarev A.V., Sobolev G.A., Jurkov E.F. A
System for Analysis of Geological Catastrophe Precursors // Journal of
Earthquake Prediction Research. Vol. 3. 1994. P. 540-555.
14. Ponomarev A.V., Sobolev G.A., Gitis V.G.,
Zhang Zchaocheng, Wang Guixuan, Qin Xinxi, Complex analysis of geophysical fields for detection of spatio-temporal
earthquake precursors. Electronic Journal of UIPhE RAS, 4 (10). 1999. URL: http://www.scgis.ru/russian/cp1251/h_dgggms/4-99/komp-an.zip.
15. Gitis
V.G., Petrova E.N., Pirogov S.A.
Catastrophe Chains: Hazard Assessment // Natural Hazards. 10. 1994. P. 117-127.
16. Gitis
V.G., Petrova E.N., Pirogov S.A. Expert knowledge approach to catastrophe
Chains // Cahiers du Centre Europeen de Geodynamique et de Seismologie. Vol.
12. 1996. P. 67-72.
17. M. Mattarelli, C.
Boehner and R.J. Peckham. Web Access to Earthquake Catalogues. Proc. EOGEO98.
Salzburg, Feb. 1998.
http://www.sbg.ac.at/geo/eogeo/authors/mattarelli/mattarelli.htm
[1] The work is supported by Russian Basic Research Foundation
(projects 00-07-90100 and 99-07- 90326) and 5FP Program (project EU IST – 10536
SPIN!).