EducationThe science

Subject area of the study

Any research consists in observing the properties of objects in order to elucidate and evaluate the significant relationships and relationships between the indicators of these properties.

The subject area includes objects that differ in properties and are in some ways in some respects interconnected. Solving problems in programming begins with the study of the subject area.

The subject area is a part of the real world, which is infinite and contains both significant and nonessential data. The researcher should be able to allocate their essential part. For example, when solving the problem of granting a loan, all the data on the client's private life (whether the work of the spouse, whether the client is raising the minor children, the education of the client, etc.) will be considered significant. And in order to solve another task related to banking, such data will be completely irrelevant. The significance of the data depends on what we choose as the subject area.

In the process of research it is necessary to create a domain model. Knowledge from different sources must be formalized. The subject area is formalized by any means. Means can be very different. This can be a textual description of the domain or a specialized graphical notation. With the help of the domain model, the processes that occur in it are described, and also the data of this field of study are studied.

The formulation of the problem also consists of describing the static and dynamic behavior of the objects that we are exploring. The description of static behavior involves the characterization of objects and their properties. In describing dynamic behavior, the causes of the behavior of objects are characterized.

Dynamic behavior of objects is often described along with static behavior.

Sometimes the analysis of the subject area and the statement of the problem are combined into a 1 stage.

At the stage of definition and analysis of data requirements, the data necessary for the implementation of Data Mining is modeled. For this purpose, the issues of user distribution are explored; Analytical characteristics of the system; Questions of access to the data necessary for analysis.

The subject area is analyzed easier and more efficiently when the organization has a data warehouse. However, not all enterprises have such data warehouses. In this case, the source for the initial data are operational databases, reference and archival materials, that is, data from already existing information systems (information systems).

It may also require information from the IS of managers, external and internal sources, various documents on paper carriers, as well as knowledge of specialists and / or survey results.

It is also necessary to know that during the preparation of data, program developers should describe as many factors as possible that affect the process. Some data can be encoded here. For example, one of the characteristics of a client is its income level, which can be defined as: very low, low, medium, high, very high. In this case, you need to determine the graduation level of income.

When determining the correct amount of data, consideration should be given to the ordering of the data.

When ordered, it is necessary to know if a seasonal / cyclic component is included in such a dataset. When they are not ordered, i.e. The set of events from the database is not related to the timeline, then in the course of the collection, the following rules should be observed:

1) the small number of records in the database can be the reason for creating an inadequate model;

2) the accuracy of the model can be improved with the increase in the number of data;

3) outdated data is excluded from the set;

4) The algorithms used to create a model using very large databases must be scalable.

Similar articles





Trending Now






Copyright © 2018 Theme powered by WordPress.