Types of test tasks. Methodology for compiling and using test tasks in computer science

TYPES OF TESTS AND FORMS OF TESTS

Plan

    The main types of pedagogical tests.

    Forms test tasks.

    Empirical verification and statistical processing of the results.

    Principles of content selection. Criteria for evaluating the content of the test.

    The ratio of the form of the task and the type of knowledge, skills, and abilities being tested.

    The main types of pedagogical tests

There are two main types of tests: traditional and non-traditional.

The test has composition, integrity and structure. It consists of tasks, rules for their application, marks for completing each task, and recommendations for interpreting test results. The integrity of the test means the relationship of tasks, their belonging to a common measured factor. Each test task performs its assigned role and therefore none of them can be removed from the test without loss of measurement quality. The structure of the test is formed by the way the tasks are linked to each other. Basically, this is the so-called factorial structure, in which each task is related to others through the general content and the general variation of test results.

The traditional test is a unity of at least three systems:

Formal task system of increasing difficulty;

Statistical characteristics of tasks and results of the subjects.

The traditional pedagogical test should be considered in two essential senses: - as a method of pedagogical measurement and as a result of the application of the test. Surprisingly, texts in Russian gravitate towards the meaning of the method, while in most works by Western authors the concept of test is more often considered in the sense of results. Meanwhile, both of these meanings characterize the test from different angles, because the test must be understood both as a method and as a result of pedagogical measurement. One complements the other. A test, as a method, cannot be conceived without results confirming the quality of the test itself and the quality of the assessments of the measurement of subjects of various levels of preparedness.

In the above definition of the traditional test, several ideas have been developed.

The first idea is that the test is considered not as an ordinary collection or set of questions, tasks, etc., but as a concept of a "system of tasks". Such a system is formed not by any set, but only by one that causes the emergence of a new integrative quality that distinguishes the test from an elementary set of tasks and from other means. pedagogical control. Of the many possible systems the best is formed by that integral set in which the quality of the test is manifested to a relatively greater extent. From this follows the idea of ​​singling out the first of the two main system-forming factors - the best composition of test tasks that form integrity. Based on this, one of the shortest definitions can be given: a test is a system of tasks that form the best methodological integrity. The integrity of the test is a stable interaction of tasks that form the test as an evolving system.

The second idea is that this definition of the test departs from the long-established tradition of considering the test as a simple means of verification, testing, testing. Any test includes an element of testing, it is not all reduced to it. For the test is also a concept, content, form, results and interpretation - everything that needs justification. This implies that the test is a qualitative means of pedagogical measurement. In accordance with the provisions of the theory, test scores are not accurate assessments of the subjects. It is correct to say that they only represent these values ​​with some precision.

The third idea developed in our definition of a traditional test is the inclusion of a new concept - test effectiveness, which was not previously considered in the test literature as a criterion for the analysis and creation of tests. The leading idea of ​​the traditional test is to compare the knowledge of as many students as possible in a short time, quickly, efficiently and at the lowest cost with a minimum number of tasks.

Essentially, this reflects the idea efficiency pedagogical activity in the field of knowledge control. I would like to think that there is no one and there is no need to object to this idea itself. If our teacher can clarify educational material no worse than his foreign colleague, then it is good to check the required knowledge, for all students, according to all the material studied, he is not able to humane form of knowledge control. He is physically unable to do so. Due to, to put it mildly, erroneous social policy the salaries of our teachers have long ceased to compensate for the expenditure of even the physical energy necessary for good teaching, not to mention the increased expenditure of intellectual energy, which can only be done by uninhibited thinking, and not preoccupied with the search for bread. As noted in the literature, a qualified worker in our country receives three to four times less than the level of wages beyond which normal life activity is disrupted and the destruction of labor potential begins.

Although there are hundreds of examples of test definitions in the literature that are either difficult or impossible to agree with, this does not mean at all that this definition traditional test - the ultimate truth. Like all other concepts, it needs constant improvement. It just seems to the author so far more reasoned than some other well-known concepts of the pedagogical test. However, the desire to improve concepts is a completely normal phenomenon and necessary for a normally developing practice and science. Constructive attempts to give other definitions of the test or to challenge existing ones are always useful, but this is precisely what we lack.

Traditional tests include homogeneous and heterogeneous tests. A homogeneous test is a system of tasks of increasing difficulty, a specific form and a certain content - a system created with the aim of an objective, high-quality, and effective method for assessing the structure and measuring the level of preparedness of students one by one. academic discipline. It is easy to see that fundamentally the definition of a homogeneous test is the same as that of a traditional test.

Homogeneous tests are more common than others. In pedagogy, they are created to control knowledge in one academic discipline or in one section of such, for example, a voluminous academic discipline as physics. In a homogeneous pedagogical test, the use of tasks that reveal other properties is not allowed. The presence of the latter violates the requirement of disciplinary purity of the pedagogical test. After all, each test measures something predetermined.

For example, a test in physics measures the knowledge, skills, abilities and perceptions of the subjects in a given science. One of the difficulties of such a measurement is that physical knowledge is fairly associated with mathematical knowledge. Therefore, in the physics test, the level of mathematical knowledge used in solving physical tasks. Exceeding the accepted level leads to a bias in the results; as they are exceeded, the latter increasingly begin to depend not so much on the knowledge of physics, but on the knowledge of another science, mathematics. Another important aspect is the desire of some authors to include in tests not so much a test of knowledge as the ability to solve physical problems, thereby involving the intellectual component in the measurement of preparedness in physics.

A heterogeneous test is a system of tasks of increasing difficulty, a specific form and a certain content - a system created with the aim of an objective, high-quality, and effective method for assessing the structure and measuring the level of preparedness of students in several academic disciplines. Often, such tests also include psychological tasks to assess the level of intellectual development.

Typically, heterogeneous tests are used for a comprehensive assessment of a school graduate, personality assessment when applying for a job, and for selecting the most prepared applicants for admission to universities. Since each heterogeneous test consists of homogeneous tests, the interpretation of the test results is carried out according to the answers to the tasks of each test (here they are called scales) and, in addition, through various methods of aggregating scores, attempts are made to give an overall assessment of the preparedness of the subject.

Recall that the traditional test is a method of diagnosing subjects in which they answer the same tasks, at the same time, under the same conditions and with the same assessment. With this orientation, the tasks of determining the exact volume and structure of the mastered educational material recede, by necessity, into the background. In the test, such a minimum sufficient number of tasks is selected, which makes it possible to relatively accurately determine, figuratively speaking, not "who knows what", but "who knows more". Interpretation of test results is carried out mainly in the language of testology, based on the arithmetic mean, mode or median and on the so-called percentile norms, showing how many percent of the test subjects have a test result worse than that of any test subject taken for analysis with his test score. Such an interpretation is called normatively oriented. Here the conclusion is completed by rating: tasks answers conclusions about the knowledge of the subject rating, understood as a conclusion about the place or rank of the subject.

Integration tests. An integrative test can be called a test consisting of a system of tasks that meet the requirements of an integrative content, a test form, an increasing difficulty of tasks aimed at a generalized final diagnosis of a graduate's readiness. educational institution. Diagnostics is carried out by presenting such tasks, the correct answers to which require integrated (generalized, clearly interconnected) knowledge of two or more academic disciplines. The creation of such tests is given only to those teachers who have knowledge of a number of academic disciplines, understand the important role of interdisciplinary connections in learning, are able to create tasks, the correct answers to which require students to have knowledge of various disciplines and the ability to apply such knowledge.

Integrative testing is preceded by the organization of integrative learning. Unfortunately, the current class-lesson form of conducting classes, combined with excessive fragmentation of academic disciplines, together with the tradition of teaching individual disciplines (rather than generalized courses), will hamper the introduction of an integrative approach into the learning process and control of preparedness for a long time to come. The advantage of integrative tests over heterogeneous ones lies in the greater informative content of each task and in the smaller number of tasks themselves. The need to create integrative tests increases as the level of education and the number of disciplines studied increase. Therefore, attempts to create such tests are noted mainly in high school. Particularly useful are integrative tests to improve the objectivity and efficiency of the final state certification of pupils and students.

The methodology for creating integrative tests is similar to the methodology for creating traditional tests, with the exception of the work on determining the content of tasks. To select the content of integrative tests, the use of expert methods is mandatory. This is due to the fact that only experts can determine the adequacy of the content of tasks to the goals of the test. But, first of all, it will be important for the experts themselves to determine the goals of education and study of certain educational programs, and then agree among themselves on fundamental issues, leaving for examination only variations in understanding the degree of importance of individual elements in the overall structure of preparedness. Agreed, on fundamental issues, selected composition of experts in foreign literature often a panel. Or given the difference in meaning last word, in Russian, such a composition can be called a representative expert group. The group is selected to adequately represent the approach used in creating the respective test.

adaptive tests. The expediency of adaptive control follows from the need to rationalize traditional testing. Every teacher understands that it is not necessary for a well-prepared student to be given easy and very easy tasks. Because the probability of the right decision is too high. In addition, lightweight materials do not have a noticeable developmental potential. Symmetrical, due to the high probability of a wrong decision, it makes no sense to give difficult tasks weak student. It is known that difficult and very difficult tasks reduce learning motivation many students. It was necessary to find a comparable, in one scale, measure of the difficulty of tasks and a measure of the level of knowledge. This measure was found in the theory of pedagogical measurements. The Danish mathematician G. Rask called this measure the word "logit". After the advent of computers, this measure formed the basis of the method of adaptive knowledge control, which uses methods to regulate the difficulty and number of tasks presented, depending on the response of students. If the answer is successful, the computer selects the next task more difficult, if unsuccessful, the next task is easy. Naturally, this algorithm requires preliminary testing of all tasks, determining their measure of difficulty, as well as creating a bank of tasks and a special program.

The use of tasks corresponding to the level of preparedness significantly increases the accuracy of measurements and minimizes the time of individual testing to about 5 - 10 minutes.

In Western literature, there are three variants of adaptive testing. The first is called pyramid testing. In the absence of preliminary assessments, all subjects are given a task of medium difficulty, and only then, depending on the answer, each subject is given an easier or harder task; at each step it is useful to use the rule of dividing the scale of difficulty in half. In the second option, the control begins with any desired level of difficulty for the test subject, with a gradual approach to the real level of knowledge. The third option is when testing is carried out by means of a bank of tasks divided by difficulty levels.

Thus, an adaptive test is a variant of an automated testing system in which the parameters of the difficulty and differentiating ability of each task are known in advance. This system is created in the form of a computer bank of tasks, ordered in accordance with the characteristics of the tasks of interest. The most main characteristic tasks of an adaptive test is the level of their difficulty, obtained empirically, which means: before getting into the bank, each task is empirically tested on a sufficiently large number of typical students of the contingent of interest. The word "contingent of interest" is intended here to represent the meaning of the well-known in science concept of the more rigorous concept of "general population".

common with us educational model adaptive school E.A. Yamburg, proceeds essentially from the general ideas of adaptive learning and adaptive knowledge control. The origins of this approach can be traced back to the pedagogical works of Comenius, Pestalozzi and Diesterweg, who are united by the ideas of natural and humane education. At the center of them pedagogical systems was a student. For example, in the work of A. Diesterweg, little known to us, "Didactic Rules", one can read the following words: "Teach according to nature ... Teach without gaps ... Start teaching where the student left off ... Before you start teaching, you need to explore the point of departure ... Without knowing where the student stopped, it is impossible to decently teach him. Lack of awareness of the real level of knowledge of students and natural differences in their ability to assimilate the proposed knowledge have become main reason the emergence of adaptive systems based on the principle of individualization of learning. This principle is difficult to implement in the traditional classroom form.

Before the advent of the first computers, the most well-known system close to adaptive learning was the so-called "System of Complete Knowledge Assimilation".

Criteria-oriented tests. With a criterion-oriented approach, tests are created to compare the educational achievements of each student with the amount of knowledge, skills or abilities planned for assimilation. In this case, a specific area of ​​content is used as an interpretative frame of reference, and not this or that sample of students. At the same time, the emphasis is on what the student can do and what he knows, and not on how he looks against the background of others.

There are also difficulties with the criteria-oriented approach. As a rule, they are associated with the selection of the content of the test. Within the criteria-based approach, the test tries to reflect the entire content of the controlled course, or at least what can be taken as this full volume. The percentage of correct completion of tasks is considered as a level of preparation or as a degree of mastery of the total volume of the course content. Of course, within the criteria-oriented approach, there is every reason for the latter interpretation, since the test includes everything that can be conditionally taken as 100%.

Criteria-oriented tests cover a fairly wide range of tasks. In particular, they help to collect complete and objective information about the educational achievements of each student individually and a group of students; compare the knowledge, skills and abilities of the student with the requirements laid down in state educational standards; select students who have reached the planned level of preparedness; evaluate the effectiveness of the professional activities of individual teachers and groups of teachers; evaluate the effectiveness of various training programs.

The emphasis on a content approach can have a beneficial effect on pedagogical testing in general. This approach benefits, for example, the interpretation of test scores in the current control. The student receives information not about how he looks against the background of others, but about what he can do and what he knows in comparison with the given requirements for the level of preparation in the subject. Of course, such an interpretation does not exclude a combination with the attribution of results to the norms, which, as a rule, occurs with the current control of students' knowledge in the daily educational process. In this case, testing is integrated with learning and helps the student to identify possible difficulties, as well as to correct errors in mastering the content of educational material in a timely manner.

    Forms of test tasks

In modern testology (Avanesov V.S., Chelyshkova M.B., Maiorov A.N., etc.) there are 4 types of tasks in test form: tasks for choosing one or more correct answers, tasks in an open form or for addition, tasks for establishing the correct sequence and tasks for establishing correspondences. The most common is the first form.

Let us consider in detail each form of tasks according to the classification of V.S. Avanesov.

Tasks for choosing one or more correct answers for computer control of knowledge are most suitable. Such tasks are conveniently divided into the following types: tasks with two, three, four, five, and a large number of answers. The instruction for this form of tasks is the sentence: "Circle (tick, indicate) the number of the correct answer."

Example 1 Mark the number of the correct answer.

The place that a digit occupies in a number is called

    position

    discharge;

    position;

    familiarity.

The task should be formulated briefly and clearly, so that its meaning is clear on the first reading.

The content of the task is formulated as clearly and as briefly as possible. Brevity is ensured by a careful selection of words, symbols, graphics, allowing a minimum of means to achieve maximum clarity of the meaning of the task. It is necessary to completely exclude repetitions of words, the use of obscure, rarely used, as well as symbols unknown to students, foreign words that make it difficult to perceive the meaning. It is good when the task contains no more than one subordinate clause.

To achieve brevity in each task, it is better to ask about one thing. Making assignments heavier by the requirements to find something, solve it, and then explain it again negatively affects the quality of the assignment, although from a pedagogical point of view it is easy to understand the reason for such a formulation.

It is even better when both the task and the answer are short. An incorrect but plausible answer in the American test literature is called the word distractor (from English verb to distract - distract). In general, the better the distractors are chosen, the better the task is. The developer's talent is manifested primarily in the development of effective distractors. It is generally believed that the higher the percentage of choosing the wrong answer, the better it is formulated. It should be noted that this is true only up to a certain limit; in pursuit of the attractiveness of distractors, a sense of proportion is often lost. The attractiveness of each answer is tested empirically.

Single or multiple choice questions are the most criticized form. Proponents of habitual approaches argue that knowledge can only be truly tested in the process of direct communication with the student, asking him clarifying questions, which helps to better clarify the true depth, strength and validity of knowledge. We must agree with such statements. However, there are still issues of saving the living labor of teachers and students, saving time and problems of increasing the efficiency of the educational process.

It is often believed that finding the right answer is much easier than formulating it yourself. However, in well-done assignments to an ignorant student, incorrect answers often seem more plausible than correct ones. The talent of the test developer is revealed in the process of creating exactly wrong, but very plausible answers. Another objection is that a test task with a choice of one or more correct answers is suitable only for assessing knowledge of the so-called lower level.

Option stands out assignments with the choice of one,most correct response from those proposed.Accordingly, the instructions for such tasks are written: “Circle the number of the most correct answer.Naturally, it is assumed that all other answers to the tasks are correct, but to a different extent.

There are three reasons for introducing such tasks into practice.

The first is the old idea of ​​excluding incorrect answers from assignments that weak students can supposedly remember. If you follow this very controversial thesis, then you can’t give wrong answers when testing at all.

The second reason for introducing such tasks into practice is more realistic. It concerns the need to develop in students not only the ability to distinguish correct answers from incorrect ones, but also the ability to differentiate the measure of the correctness of answers. This is really important, both in general secondary and higher professional education.

Third ground for application assignments with the choice of the most correct answer - this is the desire to check with their help the completeness knowledge.

No matter how convincing the reasons for introducing such tasks into practice, the latter are unlikely to be widely used.

IN open form assignmentsready-made answers are not given: they must be invented or received by the person being tested. Sometimes, instead of the term "tasks of an open form", the terms are used: "tasks for addition" or "tasks with a constructed response".For an open form, it is customary to use an instruction consisting of one word: “Complete”.

Example 2Complement.

IN binary system reckoning 10-1=_________.

Add-on tasks are of two markedly different types:

1) with restrictions imposed on answers, the possibilities of obtaining which are appropriately determined by the content and form of presentation;

2) tasks with a freely constructed answer, in which it is necessary to compose a detailed answer in the form complete solution tasks or give an answer in the form of a micro essay.

In tasks with restrictions, it is determined in advance what is unambiguously considered the correct answer, and the degree of completeness of the answer is set. Usually it is quite short - one word, number, symbol, etc. Sometimes - longer, but not exceeding two or three words. Naturally, the regulated brevity of answers puts forward certain requirements for the scope, so the tasks of the first type are mainly used to assess a fairly narrow range of skills.

A distinctive feature of tasks with restrictions on padded answers is that they must generate only one correct answer planned by the developer.

Tasks of the second type with a freely constructed answer have no restrictions on the content and form of the answers. For a certain time, the student can write anything and how he wants. However, the careful formulation of such tasks implies the existence of a standard, which is usually the most correct answer with the characteristics and quality features that describe it.

In assignments for establishing correspondence, the teacher checks the knowledge of the relationships between the elements of two sets. The elements for comparison are written in two columns: on the left, the elements of the defining set containing the statement of the problem are usually given, and on the right, the elements to be selected.

The tasks are given a standard instruction: "Make a match."

Example 3 Match

Property

a) commutativity

b) associativity

c) distributivity with respect to addition

a B C) - _____________.

It should be noted that it is desirable that there are more elements in the right column than in the left one. In this situation, there are certain difficulties associated with the selection of plausible redundant elements. Sometimes, for one element of the left set, it is necessary to select several correct answers from the right column. In addition, correspondences can be extended to three or more sets. The effectiveness of the task is significantly reduced if implausible options are easily distinguished even by ignorant students.

The performance of the match is also reduced in cases where the number of elements in the left and right columns is the same and there is simply nothing to choose from when matching the last element on the left. The last correct or incorrect match is automatically established by successive exclusion of elements for previous matches.

Test tasksto establish the correct sequence are designed to assess the level of ownership of a sequence of actions, processes, etc. In tasks, actions, processes, elements associated with a specific task are given in an arbitrary, random order. The standard instruction for these tasks is: “Install correct sequence action."

Example 4 Set the correct sequence

The full branch command on UAY has the format:

    aka series 2 >

    then series 1 >

    if condition >

Tasks to establish the correct sequencereceive benevolent support from many teachers, which is explained by the important role of ordered thinking and activity algorithms.

The purpose of introducing such tasks into the educational process is the formation of algorithmic thinking, algorithmic knowledge, skills and abilities.

Algorithmic thinking can be defined as an intellectual ability that manifests itself in determining the best sequence of actions when solving learning and practical tasks. Typical examples of the manifestation of such thinking are the successful completion of various tasks in a short time, the development of the most effective computer program, etc.

The choice of task forms is determined by many very controversial factors, including the features of the content, the goals of testing, and also the specifics of the contingent of subjects. Verification is easier when using closed-form tasks, however, such tasks are less informative. Open form tasks are more informative, but it is more difficult to organize their verification. An even more difficult task is the creation of computer programs to check the correctness of answers to such tasks. It has to do with wealth. vocabulary subjects (synonyms can be used in the answer), attentiveness (typos, case mismatch), etc.

For successful orientation in the forms of tasks, you can use a special table (see table 1) for a comparative analysis of tasks proposed by M.B. Chelyshkova.

According to the developer, this table is purely indicative, however, its use can facilitate the process of selecting test tasks of various forms for solving certain diagnostic problems.


Table 1

Comparative analysis of the characteristics of test items

Characteristics

Correspondence assignments in a test formthe requirements of pedagogical correctness of content and form are necessary, but not sufficient conditions for calling them test .

The transformation of tasks in a test form into test tasks begins from the moment of statistical verification of each task for the presence of test-forming properties.

    Empirical verification and statistical processing of results

The presence of a sufficient number of test tasks allows you to proceed to the development of a test as a system with integrity, composition and structure. At the third stage, tasks are selected and tests are created, the quality and efficiency of the test are improved.

The integrity of the test forms the relationship between the answers of the subjects to the test tasks, the presence of a common measurable factor that affects the quality of knowledge.

The composition of the test forms the correct selection of tasks, allowing the minimum necessary number to display the essential elements of the language competence of the subjects.

Level and structureknowledge are revealed when analyzing the answers of each subject to all test tasks. The more correct answers, the higher the individual test score of the subjects. Typically, this test score is associated with the concept of "level of knowledge" and undergoes a refinement procedure based on one or another model of pedagogical measurement. The same level of knowledge can be obtained by answering different tasks. For example, in a test of thirty tasks, the subject received ten points. These scores are most likely obtained by correct answers to the first ten relatively easy tasks. The sequence of ones and then zeros inherent in such a case can be called the correct structure of the subject's preparedness. If the opposite picture is revealed, when the subject answers correctly to difficult tasks and incorrectly to easy ones, then this contradicts the logic of the test and therefore such a profile of knowledge can be called inverted. It is rare, and most often, due to the error of the test, in which the tasks are located with violations of the requirement of increasing difficulty. Provided that the test is done correctly, each profile is indicative of a knowledge structure. This structure can be called elementary (since there are also factor structures that are revealed using factor analysis methods).

To determine the level of preparedness structuredness, you can use the L. Gutman coefficient, previously inaccurately called the measure of “test reliability”.

r g = 1 -

where rg structure factor;

    The sum of the erroneous elements of individual structures, calculated in row vectors of the points of the subjects;

    N - the number of subjects;

    k - the number of tasks.

The level of knowledge largely depends on personal efforts and abilities, while the structure of knowledge significantly depends on the correct organization of the educational process, on the individualization of training, on the skill of the teacher, on the objectivity of control - in general, on everything that is usually lacking. The way to achieve this ideal lies through the difficulties of creating quality tests.

Test development begins with an analysis of the content of the knowledge being taught and mastering the principles of formulating test items. Unfortunately, tests are still viewed as easy to come up with, while forte tests - their effectiveness, resulting from the theoretical and empirical validity.

At the third stage, developers of a new generation of tests will need some mathematical and statistical training, knowledge of test theory. Test theory can be defined as a set of consistent concepts, forms, methods, axioms, formulas and statements that improve the efficiency and quality of the test process. In addition, some experience in the application of multivariate statistical analysis methods and experience in the correct interpretation of test results may be required.

The question often arises: “How will the deleted tasks behave in other groups of subjects?” The answer depends on the quality of the selection of groups, and more precisely on the statistical plan for the formation of sampling sets. The correct answer to this question should be sought in the sense of the concept " target group »; this is the set of subjects in the general population for whom the developed test is intended.

Accordingly, if the tasks of the designed test behave differently in different groups, then this is most likely an indication of errors in the formation of samples of subjects. The latter should be as homogeneous as the subjects in the target group. In the language of statistics, this means that the subjects in the target and experimental groups must belong to the same general population.

Logarithmic estimates, called logites , such seemingly incomparable phenomena as the level of knowledge of the subject with the level of difficulty of each task, were used to directly compare the level of difficulty with the level of preparedness of the subject.

According to Bespalko V.P. and Tatur Yu.G., testing should be a measurement of the quality of mastering knowledge, skills and abilities. Comparison of the rules for completing the task (task) proposed in the text with the standard of the answer allows you to determine the coefficient of knowledge assimilation (K us ). It should be noted that where A is the number of correct answers, and P is the number of tasks in the proposed tests.

Definition K us is an operation to measure the quality of knowledge assimilation. TO us normalizable (0 us us > 0.7, then the learning process can be considered completed. When acquiring knowledge with K us ≤ 0.7 the student systematically makes mistakes in professional activities and is unable to correct them due to the inability to find them. The lower acceptable limit of the end of the learning process is increased to the value required from the point of view of the safety of the activity.

    Principles of content selection. Criteria for evaluating the content of the test

When creating a test, the attention of the developer is primarily attracted by the selection of content, which can be defined as the optimal display of the content of the academic discipline in the system of test tasks. The requirement of optimality involves the use of a certain selection methodology, including the issues of goal setting, planning and assessing the quality of the test content.

The goal-setting stage is the most difficult and at the same time the most important: the quality of the test content primarily depends on the results of its implementation. In the process of goal-setting, the teacher needs to decide what results of students he wants to evaluate with the help of the test.

The grounds for errors in the conclusions of the teacher are not always associated with the technological shortcomings of traditional means of control. Sometimes they are caused by the teacher's shortcomings at the goal-setting stage, when the center of gravity of the test is shifted to secondary learning goals, and sometimes the goal-setting stage is completely absent, since some teachers are confident in the infallibility of their experience and intuition, especially if they have been working at school for many years. However, noeven very perfect control methods and no experience will give grounds for reliable conclusions about the achievement of learning goals as long as there is no confidence in the correct setting of control goals and in their correct, unbiased display in the content of the test.

When creating a test, the task is to display in its content the main thing that students should know as a result of training, therefore, it is impossible to confine oneself to a simple enumeration of learning objectives. I would like to include everything in the test, but, unfortunately, this is impossible, so some of the goals have to be simply discarded and the degree of their achievement by students not checked. In order not to lose the most important thing, it is necessary to structure the goals and introduce a certain hierarchy into them. mutual arrangement. Without a doubt, there are no ready-made general recipes here, since each discipline has its own priorities. In addition, individual goals are noticeably interconnected, and therefore a simple idea of ​​a system of goals as an ordered set without considering the relationships between elements is clearly not enough.

After defining the goals of testing and specifying them, it is necessary to developplan and specification test.

When developing the plan, an approximate layout of the percentage of the content of the sections is made and the required number of tasks is determined for each section of the discipline based on the importance of the section and the number of hours allotted for studying it in the program.

The layout starts with counting the planned initial number of tasks in the test, which then, in the process of working on the test, will repeatedly change upwards or downwards.Usually the limit number does not exceed 60 - 80 tasks, since the testing time is chosen within 1.5 - 2 hours, and an average of no more than 2 minutes is given to complete one task.

After completing the first step of planning the content, a test specification is developed, which fixes the structure, content of the test, and the percentage of items in the test. Sometimes the specification is made in an expanded form, containing indications of the type of items that will be used to assess student achievement in accordance with the intended goals of creating a test, the time of the test, the number of items, the features of testing that may affect the characteristics of the test, etc.

The specification in expanded form includes:

    the purpose of creating a test, the rationale for choosing an approach to its creation, a description of the possible areas of application of the test;

    scroll normative documents used in planning the content of the test;

    description of the general structure of the test, including a list of subtests (if any) indicating approaches to their development;

    the number of tasks of various forms, indicating the number of answers to closed tasks, the total number of tasks in the test;

    the number of parallel test variants or a link to the cluster containing the number and numbers of cluster jobs;

    the ratio of tasks for various sections and types of educational activities of schoolchildren;

    coverage of standards requirements (for certification tests);

    a list of requirements not included in the test (for certification tests);

Knowledge and skills are divided as follows:

A - knowledge of concepts, definitions, terms;

B - knowledge of laws and formulas;

C - the ability to apply laws and formulas to solve problems;

D – the ability to interpret the results on graphs and diagrams;

E - the ability to make value judgments.

The following proportions are often set:

A - 10%, B - 20%, C - 30%, D - 30%, E - 10%.

In addition to the criteria, there are general principles, contributing to a certain extent to the correct selection of the content of the tests.

The principle of representativenessregulates not only the completeness of the display, but also the significance of the content elements of the test. The content of the assignments should be such that, based on the answers to them, one can conclude that one knows or does not know the entire program of the section or course being checked.

The principle of consistencyinvolves the selection of content elements that meet the requirements of consistency and are interconnected by the general structure of knowledge. Subject to the principle of consistency, the test can be used to identify not only the amount of knowledge, but also to assess the quality of the structure of students' knowledge.

After selecting the content of the test, the most important stage of creating pre-test tasks begins. This work is usually entrusted to the most experienced teachers with a long history of work in the school. However, experience alone is not enough to create tasks. It also requires special knowledge on the theory and methodology of developing pedagogical tests, which provide a professional approach to the creation of pre-test tasks.

V.S. Avanesov identified 3 criteria for selecting the content of test items:

1) the certainty of the content of the test;

2) consistency of the content of tasks;

3) the validity of the content of test items.

1. The certainty of the content of the test forms the subject of the pedagogical dimension. In the case of a homogeneous test, the question arises of being sure that all test items test knowledge in a particular academic discipline, and not in some other. Quite often it happens that the correct answers to some tasks require knowledge not only of the discipline of interest, but also of a number of other, usually related and preceding academic disciplines. The closeness and connectedness of which makes it difficult precise definition the subject matter of the measured knowledge.

For example, in physical calculations, a lot of mathematical knowledge is used, and therefore the system of physical knowledge usually includes the mathematics that is used in solving physical problems. Failure in mathematical calculations generates failure in answers to tasks of a physical test. A negative score is given, respectively, for ignorance of physics, although the subject made mathematical errors. If such a test includes many such tasks that, for the correct solution, require not so much physical knowledge how many skills to perform complicated calculations, then this may be an example of an inaccurately defined content of a physics test. The smaller the intersection of the knowledge of one academic discipline with the knowledge of another, the more clearly the content of the academic discipline is expressed in the test. Certainty of content is also required in all other tests. In a heterogeneous test, this is achieved by explicitly separating tasks from one academic discipline into a separate scale. At the same time, there are often tasks that work well not only on one, but also on two, three, or even more scales.

In any test task, it is determined in advance what is unambiguously considered the answer to the task, with what degree of completeness the correct answer should be. It is not allowed to define a concept through enumeration of elements that are not included in it.

2. The consistency of the content of tasks requires that judgments do not arise regarding the same thought, both affirming and denying it. Existence of two exclusive answers to the same task of the test is inadmissible. If subjects are instructed to "Circle the number of the correct answer" and then one of the answers states that there is no correct answer, then this is an example of inconsistency in the thinking of the test designer. In some tests, there are answers that are not related to the content of the task at all. Such answers are quite easily recognized by the subjects as erroneous, and therefore the test turns out to be ineffective. To improve efficiency, the test is preliminarily tested on a typical sample of subjects. And if such answers to tasks are found that the subjects do not choose at all, then such answers are removed from the test. Because they do not perform the function of the so-called distractors, designed to divert the attention of unknowing subjects from the correct answer. In addition, such distractors are harmful to the test, because they reduce the accuracy of measurements (but more on that in the articles, which will consider questions of test reliability).

3. The validity of the content of test tasks means that they have grounds for truth. Validity is related to the arguments that can be given in favor of one or another wording of the test items. In the absence of conclusive arguments in favor of the correctness of the formulated task, it is not included in the test, under any pretext. The same happens if at least one counterargument arises in the process of expert discussion, or a condition is allowed under which this statement may turn out to be ambiguous or false. The idea of ​​the validity of the content of the test is closely intertwined with the principle of meaningful correctness of test items, as already mentioned in the previous article. Recall that the test includes only that content of the academic discipline that is objectively true and that lends itself to some rational argumentation. Accordingly, controversial points of view, quite acceptable in science, are not recommended to be included in the content of test items.

The untruth of the content of test tasks differs from the incorrectness of their formulation. Untruth, as noted above, is determined by the corresponding answer, while an incorrectly formulated task can produce both correct and incorrect answers, and even cause bewilderment. This also includes inaccurately or ambiguously formulated tasks that generate several correct or conditionally correct answers. Hence the need to introduce additional terms truth, which lengthens the task itself and complicates its semantics. The incorrectness of the wording is usually found out in the process of discussing the content of tasks with experienced expert teachers. The success of such a discussion is possible by creating an appropriate cultural environment where only constructive and tactful judgments are allowed. Alas, experience convinces us that this does not happen often. Meanwhile, only a joint and friendly discussion of materials by developers and experts can create an atmosphere of searching for the best options for the content of the test. This search is almost endless, and there is no ultimate truth.

    The ratio of the form of the task and the type of knowledge, skills, abilities being tested

As mentioned in previous articles, for testing purposes, knowledge can be divided into three types: offered, acquired and tested. Now let's look at this issue in a little more detail.

Suggested knowledge is given to students in the form teaching aids, materials, texts, lectures, stories, etc., reflecting the main part educational program. This knowledge is formulated, in addition, in the system of tasks, according to which the students themselves can check the degree of their preparedness.

Acquired knowledge by students is usually only a part of the knowledge offered, more or less, depending on the learning activity of students. With the development of computer learning, conditions have appeared for exceeding the amount of acquired knowledge over the amount of knowledge offered. This is a new situation associated with the possibilities of mass immersion of students in the global educational space, in which the leading role of assignments in the process of acquiring knowledge is already quite well understood. The solution of educational tasks is the main stimulus for the activation of learning, the students' own activity. This activity can take place in the form of work with a teacher, in a group or independently. Common discussions in the literature aboutassimilation levelsrefer exclusively to acquired knowledge.

Checked knowledge forms the main content of that document, which can be called an exam or testing program, depending on the chosen form of knowledge control. The main feature of the tested knowledge is its relevance, which means the readiness of the subjects to practical application knowledge for solving tasks used at the time of verification. In higher education, the same feature is sometimes called the efficiency of knowledge.

In the process of testing schoolchildren and applicants, usually only such knowledge is checked that is in RAM, that does not require reference to reference books, dictionaries, maps, tables, etc. Among the tested knowledge, one can also highlight normative knowledge that is subject to compulsory assimilation by students and subsequent control by educational authorities through an expertly selected and approved by the governing body system of assignments, tasks and other control materials.

In addition, stand out properties knowledge. IN AND. Ginetsinsky identifies the following properties of knowledge:

reflexivity (I not only know something, but I also know that I know it);

transitivity (if I know that someone knows something, then it follows that I know this something);

antisymmetry (if I know someone, it does not mean that he knows me).

Classification of types and levels of knowledge

Classification of types and levels of knowledge, formulated by Bloom to solve practical problems of pedagogical measurement.

    Knowledge of names. Socrates owns the words: who comprehends the names, he will comprehend also that to which these names belong. As noted by the famous foreign philosopher J. Austin, knowledge of an object or phenomenon is largely determined by whether we know its name, or rather, its correct name.

    Knowledge of the meaning of titles and names. It has long been known that as we understand, so we act. Understanding the meaning of names and names helps to remember and use them correctly. For example, with the name "Baikal" some of the younger students may think not of the famous lake, the pearl of Russia, but of the fruit water sold under the same name. Another example can be taken from the field of political consciousness. As Yu.N. Afanasiev, A.S. Stroganov and S.G. Shekhovtsev, the consciousness of the former Soviet people turned out to be unable to see the various meanings of such abstractions of language as "freedom", "power", "democracy", "state", "people", "society", considering them to be clear by default. Which was one of the reasons that allowed, with the active complicity of these people, to destroy their own life support system.

    factual knowledge. Knowing the facts allows you not to repeat the mistakes of your own and others, to enrich the evidence base of knowledge. Often fixed in the form scientific texts, observation results, type recommendations safety precautions, worldly wisdom, sayings, sayings. For example, from Ancient China came the saying of the Chinese thinker Ju Xi: do not cook sand in the hope of getting porridge.

    Knowledge of definitions. The weakest point in school education, because definitions cannot be taught; they can be understood and assimilated only as a result of independent efforts to master the required concepts. Knowledge of the system of definitions is one of the best evidence of theoretical preparedness. In the educational process, all four considered types of knowledge can be combined into a group of reproductive knowledge. As noted by I.Ya. Lerner, over the years schooling students perform over 10 thousand tasks. The teacher is forced to organize reproductive activity, without which the content is not initially assimilated.

This is knowledge that does not require noticeable transformation during assimilation, and therefore they are reproduced in the same form in which they were perceived. They can, with some convention, be called knowledge of the first level.

    Comparative, comparative knowledge. They are widespread in practice and in science, inherent mainly in intellectually developed persons, especially specialists. They are able to analyze and choose the best options for achieving a particular goal. As N. Kuzansky noted, "all researchers judge the unknown by means of a commensurate comparison with something already familiar, so that everything is studied in comparison."

    Knowledge of opposites, contradictions, antonyms, etc. objects. Such knowledge is valuable in training, especially at the very beginning. In some areas, such knowledge is essential. For example, in school course life safety, you need to know exactly what students can do and what not to do, under any circumstances.

    Associative knowledge. They are characteristic of an intellectually developed and creative person. The richer the associations, the greater the conditions and the higher the probability for the manifestation of creativity. To a large extent, it is on the richness of associations that the language culture of the individual, the work of a writer, the work of an artist, designer and workers of other creative professions are built.

    classification knowledge. Mainly used in science; Examples - Linnaeus classifications, D.I. Mendeleev, test classifications, etc. Classification knowledge is generalized, systemic knowledge. This type of knowledge is inherent only to persons with sufficient intellectual development, as it requires a developed abstract thinking, a holistic and interconnected vision of the totality of phenomena and processes. The knowledge system is, first of all, the possession of effective definitions of the basic concepts of the studied sciences.

Knowledge of p.p. 5-8 can be attributed to the second level. Such knowledge allows students to solve typical tasks as a result of subsuming each specific task under known classes of phenomena and methods being studied.

    Causal knowledge, knowledge of cause-and-effect relationships, knowledge of the foundations. As W. Shakespeare wrote, the time for the inexplicable has passed, everything has to look for reasons. IN modern science causal analysis is the main focus of research. As L. Wittgenstein noted, they say "I know" when they are ready to give undeniable reasons.

    Procedural, algorithmic, procedural knowledge. They are the main ones in practice. Mastering this knowledge is an essential sign of professional readiness and culture. This group also includes technological knowledge that makes it possible to inevitably obtain the planned result.

    Technological knowledge. This knowledge is a special kind of knowledge that manifests itself at different levels of preparedness. This can be relatively simple knowledge about a separate operation of the technological chain, or a set of knowledge that makes it possible to achieve the set goals without fail at the lowest possible cost.

Knowledge of p.p. 9-11 can be attributed to knowledge of a higher, third level. They are acquired mainly in the system of secondary and higher vocational education.

The following types of knowledge can be attributed to the highest, fourth level of knowledge:

    Probabilistic knowledge. Such knowledge is needed in cases of uncertainty, lack of existing knowledge, inaccuracy of available information, if necessary, to minimize the risk of error in decision-making. This is knowledge about the patterns of data distribution, the reliability of differences, and the degree of validity of hypotheses.

    abstract knowledge. This is a special kind of knowledge, in which idealized concepts and objects that do not exist in reality are operated. There are many such objects in geometry, natural sciences, and in those social sciences that are called behavioral in the West - these are psychology, sociology, and pedagogy. Probabilistic, abstract and special scientific knowledge in each separate discipline of knowledge form the basis theoretical knowledge. This is the level of theoretical knowledge.

    Methodological knowledge. This is knowledge about the methods of transforming reality, scientific knowledge about building effective activities. This is knowledge of the highest, fifth level.

The listed types of knowledge do not yet form a complete classification system and therefore allow for the possibility of a noticeable expansion of the presented nomenclature, replacing some types of knowledge with others, and combining them into various groups.

Each of the listed types of knowledge is expressed by the corresponding form of test tasks.

To determine the degree of learning in each academic discipline, the amount of knowledge that is necessary for mastering according to curriculum, which constitutes the basic body of knowledge. Basic knowledge represents the minimum of the state educational standard. However, even among the basic knowledge, those that should remain in memory in any discipline are singled out; in the aggregate, they form worldview knowledge. BOO. Rodionov and A.O. Tatur (testing center MEPhI) distinguish several links of worldview knowledge: basic knowledge, program knowledge, super-program knowledge. Pedagogical tests are the only tool that allows not only to measure learning, but also the ability to use knowledge. If we talk only about skills, then at all levels of learning, four types of skills can be distinguished:

1) the ability to recognize objects, concepts, facts, laws, models;

2) the ability to act according to a model, according to a known algorithm, rule;

3) the ability to analyze the situation, isolate the main thing and build procedures from the mastered operations that allow obtaining a solution to the test task;

4) the ability and ability to find original solutions.

Four types of skills named by B.U. Rodionov and A.O. Tatur, do not contradict the theory of the phased formation of mental actions, which is based on the method of developing automated testing in order to assess the assimilation of knowledge, the acquisition of skills and abilities. This allows you to create not only expert systems for assessing the degree of student learning, but also to build a flexible dynamic rating system knowledge control.

As you know, the unit of the test, its structural element is the test task. It can be defined as "the most simple and at the same time holistic structural element test. The tasks themselves included in the test can be varied both in the form of presentation and in content. There are different approaches to the classification of test tasks according to the form of their presentation. The most common is the types of test items shown in Figure 3.1.

The main factor influencing the form of a test task is the method of obtaining an answer (choice from the proposed options or self-formulation of the answer). Then this classification can be represented by the following scheme.


It should be noted that test tasks have a number of characteristics. Each test task has its own serial number. As a rule, items in the test are arranged in order of increasing difficulty, although it is not excluded that the difficulty of items fluctuates in different directions as you move along the test.

Each test task has a standard of the correct answer. Questions that do not have the correct answer, as a rule, are not included in the test.

Test items of one form are usually accompanied by a standard instruction that precedes the wording of the items in the test.

For each test task, a rule for scoring (scoring) is developed.

The test task in the form of presentation and in terms of execution time is usually quite short. When formulating the task, attention is paid to the fact that all statements of the test are understandable to all students without exception (formulated in simple expressions with commonly used vocabulary, without terms using foreign or infrequent words. If possible, turns with the negation of "not" are avoided in tasks, since it is considered that it is preferable to affirm something (both positive and negative).

Open tasks. In tasks of an open form (tasks for additions), ready-made answers are not given, they must be received. There are two types of open tasks:

  • 1) with restrictions imposed on the answer;
  • 2) without restrictions imposed on the answer, in which the test-takers must compose a detailed answer in the form of a solution to the problem.

Tasks of the second type differ little from the traditional control work, are expensive to test and more difficult to standardize.

When answering an open task with a limited answer, the student fills in the missing word, formula or number at the place of the dash or in a specially designated place on the answer sheet.

Instructions for tasks of an open type are usually accompanied by the words: "Enter the missing word in the place of the dash" or "Receive and write down the answer in the answer sheet", etc.

Tasks of the closed type. Questions with multiple choice answers. A closed task with a choice of answers, as a rule, includes a question and several answers to it (they are indicated by the letters A, B, C, D, ... or numbers: 1,2,3,4, ...). The student must choose the correct answers. In most tests, only one is correct. But sometimes test developers put several correct answers among the answers. Plausible answers are called distractors. Their number in the task is usually no more than five. Distractors are selected taking into account common mistakes schoolchildren.

A closed test item with a choice of answers is considered "working well" if students who know the educational material perform it correctly, and those who do not choose any of the answers with the same probability.

Multiple-choice tasks are usually preceded by the following instruction: Indicate the number (letter) of the correct answer (for blank testing) or: Press the key with the number (letter) of the correct answer (for computer testing).

Test tasks with a choice of one correct answer, as a rule, have the following characteristics:

in the text of the assignment, ambiguity and ambiguity are avoided;

the task has a simple syntactic construction;

the main part contains as much as possible more words, leaving no more than 2-3 keywords for a given problem for an answer. All repeated words are excluded from the answers by entering them into the main text of the task;

answers to one task usually offer the same length;

try to exclude all verbal associations that contribute to the choice of the correct answer with the help of a guess;

the frequency of choosing the same number of the correct answer in different tasks of the text is usually the same or this number is random;

  • * from the number of test tasks are usually excluded those that contain value judgments and opinions of the tested person on any issue;
  • * the number of answer options in each task is the same and usually no more than five (rarely - 7);
  • * when formulating distractors (plausible answers), they avoid the expressions "none of the listed", "all listed", etc., which contribute to guessing, in the answers they try not to use words such as "all", "none", "never" "," always ", etc., as conducive to guessing;
  • *distractors are offered such that they are equally attractive to subjects who do not know the correct answer;

none of the distractors is a partially correct answer, which, under certain conditions, becomes the correct answer;

from the number of incorrect answers are excluded, following one from the other;

answers are selected so that the key of one task does not serve as a key to the correct answers of another task, that is, they do not use distractors from one task as the correct answers of another;

all answers, as a rule, are parallel in construction and grammatically consistent with the main part of the test item;

* if there are alternative answers in the task, then they are not put next to the correct one, as this immediately focuses attention on them.

Comparative characteristics of the types of test tasks. The choice of types of test tasks is determined by many parameters: the specifics of the content of the academic subject, the goals of testing, the level of complexity of tasks, the professionalism of the developer, etc.

Each type of test items has its own advantages and disadvantages. For example, tasks of a closed form with a choice of answers are characterized by the advantages that all tests have, namely:

  • - objective evaluation of the results of the work;
  • - the speed of checking completed tasks;
  • - system check of a sufficiently large amount of educational material.

At the same time, they have positive characteristics inherent only this species assignments. For example, they are the easiest to process, allow organizing computerized collection and analysis of results without any special expenses, etc. But such tests also have their drawbacks:

checking only the final results of the work;

the inability to trace the logic of the student's reasoning when performing tasks;

some probability of choosing an answer at random;

impossibility test check some types of educational activities (for example, independent finding of directions for a solution).

To avoid these shortcomings often helps enough a large number of tasks in the test (there are usually more than 20 of them) and big number answer options (more than 4).

Some of these shortcomings (for example, guessing the answer) can be avoided by open-type tests. But, at the same time, the results of these tasks are more difficult to statistically process, and the assessment of tasks with a detailed answer requires the involvement of experts, which, in turn, reduces the objectivity of control, complicates the standardization of the test, and increases the time and financial costs of processing test results.

In test theory, the opinion is increasingly being expressed that it is desirable to use as few different forms of test items as possible in one test. Professional tests often differ precisely in the monoformity of tasks. However, this requirement is not always feasible due to the specifics of a particular subject. Therefore, developers often combine different types of test items (for example, closed and open) within the same test.

For example, centralized testing tests contain two parts (Part A and Part B). Part A contains tests of a closed type, and part B contains open ones.

Tables 1.2 and 1.3 show the comparative characteristics of test items of various types.

Table 1.2. Comparative analysis of test tasks in accordance with the levels of assimilation of educational material

Guided by some of these characteristics, the creators of the test can choose the form of test items suitable for certain purposes. It should also be noted that only a reasonable combination of tests with traditional forms and methods of control will allow to get a comprehensive picture of the level of knowledge.

Table 1.3. Comparative analysis of test items in accordance with test design indicators

Design indicators

Job types

Closed

open

Choice of answers

To establish compliance

To establish the correct sequence

Limited response

With a free answer

Ease of construction

Not always

Not always

Not always

Guessing effect

Objectivity in evaluating the result of implementation

Depends on the quality of the task

No, rating is subjective

Possibility of student errors when writing an answer

Plan

    The main types of pedagogical tests.

    Forms of test tasks.

    Empirical verification and statistical processing of the results.

    Principles of content selection. Criteria for evaluating the content of the test.

    The ratio of the form of the task and the type of knowledge, skills, and abilities being tested.

  1. The main types of pedagogical tests

There are two main types of tests: traditional and non-traditional.

The test has composition, integrity and structure. It consists of tasks, rules for their application, marks for completing each task, and recommendations for interpreting test results. The integrity of the test means the relationship of tasks, their belonging to a common measured factor. Each test task performs its assigned role and therefore none of them can be removed from the test without loss of measurement quality. The structure of the test is formed by the way the tasks are linked to each other. Basically, this is the so-called factorial structure, in which each task is related to others through the general content and the general variation of test results.

The traditional test is a unity of at least three systems:

Formal task system of increasing difficulty;

Statistical characteristics of tasks and results of the subjects.

The traditional pedagogical test should be considered in two essential senses: - as a method of pedagogical measurement and as a result of the application of the test. Surprisingly, texts in Russian gravitate towards the meaning of the method, while in most works by Western authors the concept of test is more often considered in the sense of results. Meanwhile, both of these meanings characterize the test from different angles, because the test must be understood both as a method and as a result of pedagogical measurement. One complements the other. A test, as a method, cannot be conceived without results confirming the quality of the test itself and the quality of the assessments of the measurement of subjects of various levels of preparedness.

In the above definition of the traditional test, several ideas have been developed.

The first idea is that the test is considered not as an ordinary collection or set of questions, tasks, etc., but as a concept of a "system of tasks". Such a system is formed not by any set, but only by the one that causes the emergence of a new integrative quality that distinguishes the test from an elementary set of tasks and from other means of pedagogical control. Of the many possible systems, the best is formed by that integral set in which the quality of the test is manifested to a relatively greater extent. From this follows the idea of ​​singling out the first of the two main system-forming factors - the best composition of test tasks that form integrity. Based on this, one of the shortest definitions can be given: a test is a system of tasks that form the best methodological integrity. The integrity of the test is a stable interaction of tasks that form the test as an evolving system.

The second idea is that this definition of the test departs from the long-established tradition of considering the test as a simple means of verification, testing, testing. Any test includes an element of testing, it is not all reduced to it. For the test is also a concept, content, form, results and interpretation - everything that needs justification. This implies that the test is a qualitative means of pedagogical measurement. In accordance with the provisions of the theory, test scores are not accurate assessments of the subjects. It is correct to say that they only represent these values ​​with some precision.

The third idea developed in our definition of a traditional test is the inclusion of a new concept - test effectiveness, which was not previously considered in the test literature as a criterion for the analysis and creation of tests. The leading idea of ​​the traditional test is to compare the knowledge of as many students as possible in a short time, quickly, efficiently and at the lowest cost with a minimum number of tasks.

In essence, this reflects the idea of ​​the effectiveness of pedagogical activity in the field of knowledge control. I would like to think that there is no one and there is no need to object to this idea itself. If our teacher can explain the educational material no worse than his foreign colleague, then it is good to check the required knowledge, for all students, for all the material studied, he is not able to programs for organizing automated self-control - the most humane form of knowledge control. He is physically unable to do so. Due to, to put it mildly, an erroneous social policy, the salaries of our teachers have long ceased to compensate for the expenditure of even the physical energy necessary for good teaching, not to mention the increased expenditure of intellectual energy, which can only be done by uninhibited thinking, and not preoccupied with the search for bread. As noted in the literature, a qualified worker in our country receives three to four times less than the level of wages beyond which normal life activity is disrupted and the destruction of labor potential begins.

Although there are hundreds of examples of test definitions in the literature that are either difficult or impossible to agree with, this does not mean at all that this definition of a traditional test is the ultimate truth. Like all other concepts, it needs constant improvement. It just seems to the author so far more reasoned than some other well-known concepts of the pedagogical test. However, the desire to improve concepts is a completely normal phenomenon and necessary for a normally developing practice and science. Constructive attempts to give other definitions of the test or to challenge existing ones are always useful, but this is precisely what we lack.

Traditional tests include homogeneous and heterogeneous tests. A homogeneous test is a system of tasks of increasing difficulty, a specific form and a certain content - a system created with the aim of an objective, high-quality, and effective method for assessing the structure and measuring the level of preparedness of students in one academic discipline. It is easy to see that fundamentally the definition of a homogeneous test is the same as that of a traditional test.

Homogeneous tests are more common than others. In pedagogy, they are created to control knowledge in one academic discipline or in one section of such, for example, a voluminous academic discipline as physics. In a homogeneous pedagogical test, the use of tasks that reveal other properties is not allowed. The presence of the latter violates the requirement of disciplinary purity of the pedagogical test. After all, each test measures something predetermined.

For example, a test in physics measures the knowledge, skills, abilities and perceptions of the subjects in a given science. One of the difficulties of such a measurement is that physical knowledge is fairly associated with mathematical knowledge. Therefore, in the physics test, the level of mathematical knowledge used in solving physical tasks is expertly established. Exceeding the accepted level leads to a bias in the results; as they are exceeded, the latter increasingly begin to depend not so much on the knowledge of physics, but on the knowledge of another science, mathematics. Another important aspect is the desire of some authors to include in tests not so much a test of knowledge as the ability to solve physical problems, thereby involving the intellectual component in the measurement of preparedness in physics.

A heterogeneous test is a system of tasks of increasing difficulty, a specific form and a certain content - a system created with the aim of an objective, high-quality, and effective method for assessing the structure and measuring the level of preparedness of students in several academic disciplines. Often, such tests also include psychological tasks to assess the level of intellectual development.

Typically, heterogeneous tests are used for a comprehensive assessment of a school graduate, personality assessment when applying for a job, and for selecting the most prepared applicants for admission to universities. Since each heterogeneous test consists of homogeneous tests, the interpretation of the test results is carried out according to the answers to the tasks of each test (here they are called scales) and, in addition, through various methods of aggregating scores, attempts are made to give an overall assessment of the preparedness of the subject.

Recall that the traditional test is a method of diagnosing subjects in which they answer the same tasks, at the same time, under the same conditions and with the same assessment. With this orientation, the tasks of determining the exact volume and structure of the mastered educational material recede, by necessity, into the background. In the test, such a minimum sufficient number of tasks is selected, which makes it possible to relatively accurately determine, figuratively speaking, not "who knows what", but "who knows more". Interpretation of test results is carried out mainly in the language of testology, based on the arithmetic mean, mode or median and on the so-called percentile norms, showing how many percent of the test subjects have a test result worse than that of any test subject taken for analysis with his test score. Such an interpretation is called normatively oriented. Here the conclusion is completed by rating: tasks answers conclusions about the knowledge of the subject rating, understood as a conclusion about the place or rank of the subject.

Integration tests. An integrative test can be called a test consisting of a system of tasks that meet the requirements of an integrative content, a test form, an increasing difficulty of tasks aimed at a generalized final diagnosis of the readiness of a graduate of an educational institution. Diagnostics is carried out by presenting such tasks, the correct answers to which require integrated (generalized, clearly interconnected) knowledge of two or more academic disciplines. The creation of such tests is given only to those teachers who have knowledge of a number of academic disciplines, understand the important role of interdisciplinary connections in learning, are able to create tasks, the correct answers to which require students to have knowledge of various disciplines and the ability to apply such knowledge.

Integrative testing is preceded by the organization of integrative learning. Unfortunately, the current class-lesson form of conducting classes, combined with excessive fragmentation of academic disciplines, together with the tradition of teaching individual disciplines (rather than generalized courses), will hamper the introduction of an integrative approach into the learning process and control of preparedness for a long time to come. The advantage of integrative tests over heterogeneous ones lies in the greater informative content of each task and in the smaller number of tasks themselves. The need to create integrative tests increases as the level of education and the number of disciplines studied increase. Therefore, attempts to create such tests are noted mainly in higher education. Particularly useful are integrative tests to improve the objectivity and efficiency of the final state certification of pupils and students.

The methodology for creating integrative tests is similar to the methodology for creating traditional tests, with the exception of the work on determining the content of tasks. To select the content of integrative tests, the use of expert methods is mandatory. This is due to the fact that only experts can determine the adequacy of the content of tasks to the goals of the test. But, first of all, it will be important for the experts themselves to determine the goals of education and study of certain educational programs, and then agree among themselves on fundamental issues, leaving for examination only variations in understanding the degree of importance of individual elements in the overall structure of preparedness. A coordinated, on fundamental issues, selected composition of experts in foreign literature is often a panel. Or, given the differences in the sense of the last word, in Russian, such a composition can be called a representative expert group. The group is selected to adequately represent the approach used in creating the respective test.

adaptive tests. The expediency of adaptive control follows from the need to rationalize traditional testing. Every teacher understands that it is not necessary for a well-prepared student to be given easy and very easy tasks. Because the probability of the right decision is too high. In addition, lightweight materials do not have a noticeable developmental potential. Symmetrically, due to the high probability of a wrong decision, it makes no sense to give difficult tasks to a weak student. It is known that difficult and very difficult tasks reduce the learning motivation of many students. It was necessary to find a comparable, in one scale, measure of the difficulty of tasks and a measure of the level of knowledge. This measure was found in the theory of pedagogical measurements. The Danish mathematician G. Rask called this measure the word "logit". After the advent of computers, this measure formed the basis of the method of adaptive knowledge control, which uses methods to regulate the difficulty and number of tasks presented, depending on the response of students. If the answer is successful, the computer selects the next task more difficult, if unsuccessful, the next task is easy. Naturally, this algorithm requires preliminary testing of all tasks, determining their measure of difficulty, as well as creating a bank of tasks and a special program.

The use of tasks corresponding to the level of preparedness significantly increases the accuracy of measurements and minimizes the time of individual testing to about 5 - 10 minutes.

In Western literature, there are three variants of adaptive testing. The first is called pyramid testing. In the absence of preliminary assessments, all subjects are given a task of medium difficulty, and only then, depending on the answer, each subject is given an easier or harder task; at each step it is useful to use the rule of dividing the scale of difficulty in half. In the second option, the control begins with any desired level of difficulty for the test subject, with a gradual approach to the real level of knowledge. The third option is when testing is carried out by means of a bank of tasks divided by difficulty levels.

Thus, an adaptive test is a variant of an automated testing system in which the parameters of the difficulty and differentiating ability of each task are known in advance. This system is created in the form of a computer bank of tasks, ordered in accordance with the characteristics of the tasks of interest. The most important characteristic of adaptive test items is their level of difficulty, obtained empirically, which means that before entering the bank, each item is empirically tested on a sufficiently large number of typical students of the population of interest. The word "contingent of interest" is intended here to represent the meaning of the well-known in science concept of the more rigorous concept of "general population".

The educational model of the adaptive school E.A. Yamburg, proceeds essentially from the general ideas of adaptive learning and adaptive knowledge control. The origins of this approach can be traced back to the pedagogical works of Comenius, Pestalozzi and Diesterweg, who are united by the ideas of natural and humane education. At the center of their pedagogical systems was the Apprentice. For example, in the work of A. Diesterweg, little known to us, "Didactic Rules", one can read the following words: "Teach according to nature ... Teach without gaps ... Start teaching where the student left off ... Before you start teaching, you need to explore the point of departure ... Without knowing where the student stopped, it is impossible to decently teach him. Lack of awareness of the real level of knowledge of students and natural differences in their ability to assimilate the proposed knowledge have become the main reason for the emergence of adaptive systems based on the principle of individualization of learning. This principle is difficult to implement in the traditional classroom form.

Before the advent of the first computers, the most well-known system close to adaptive learning was the so-called "System of Complete Knowledge Assimilation".

Criteria-oriented tests. With a criterion-oriented approach, tests are created to compare the educational achievements of each student with the amount of knowledge, skills or abilities planned for assimilation. In this case, a specific area of ​​content is used as an interpretative frame of reference, and not this or that sample of students. At the same time, the emphasis is on what the student can do and what he knows, and not on how he looks against the background of others.

There are also difficulties with the criteria-oriented approach. As a rule, they are associated with the selection of the content of the test. Within the criteria-based approach, the test tries to reflect the entire content of the controlled course, or at least what can be taken as this full volume. The percentage of correct completion of tasks is considered as a level of preparation or as a degree of mastery of the total volume of the course content. Of course, within the criteria-oriented approach, there is every reason for the latter interpretation, since the test includes everything that can be conditionally taken as 100%.

Criteria-oriented tests cover a fairly wide range of tasks. In particular, they help to collect complete and objective information about the educational achievements of each student individually and a group of students; compare the knowledge, skills and abilities of the student with the requirements laid down in state educational standards; select students who have reached the planned level of preparedness; evaluate the effectiveness of the professional activities of individual teachers and groups of teachers; evaluate the effectiveness of various training programs.

The emphasis on a content approach can have a beneficial effect on pedagogical testing in general. This approach benefits, for example, the interpretation of test scores in the current control. The student receives information not about how he looks against the background of others, but about what he can do and what he knows in comparison with the given requirements for the level of preparation in the subject. Of course, such an interpretation does not exclude a combination with the attribution of results to the norms, which, as a rule, occurs with the current control of students' knowledge in the daily educational process. In this case, testing is integrated with learning and helps the student to identify possible difficulties, as well as to correct errors in mastering the content of educational material in a timely manner.

  1. Forms of test tasks

In modern testology (Avanesov V.S., Chelyshkova M.B., Mayorov A.N., etc.) there are 4 types of tasks in the test form: tasks for choosing one or more correct answers, tasks in an open form or for addition, tasks to establish the correct sequence and tasks to establish correspondences. The most common is the first form.

Let us consider in detail each form of tasks according to the classification of V.S. Avanesov.

Tasks for choosing one or more correct answers for computer control of knowledge are most suitable. Such tasks are conveniently divided into the following types: tasks with two, three, four, five, and a large number of answers. The instruction for this form of tasks is the sentence: "Circle (tick, indicate) the number of the correct answer."

Example 1. Mark the number of the correct answer.

The place that a digit occupies in a number is called

    position

    discharge;

    position;

    familiarity.

The task should be formulated briefly and clearly, so that its meaning is clear on the first reading.

The content of the task is formulated as clearly and as briefly as possible. Brevity is ensured by a careful selection of words, symbols, graphics, allowing a minimum of means to achieve maximum clarity of the meaning of the task. It is necessary to completely exclude repetitions of words, the use of obscure, rarely used, as well as symbols unknown to students, foreign words that make it difficult to perceive the meaning. It is good when the task contains no more than one subordinate clause.

To achieve brevity in each task, it is better to ask about one thing. Making assignments heavier by the requirements to find something, solve it, and then explain it again negatively affects the quality of the assignment, although from a pedagogical point of view it is easy to understand the reason for such a formulation.

It is even better when both the task and the answer are short. An incorrect but plausible answer in the American test literature is called the word distractor (from the English verb to distract - to distract). In general, the better the distractors are chosen, the better the task is. The developer's talent is manifested primarily in the development of effective distractors. It is generally believed that the higher the percentage of choosing the wrong answer, the better it is formulated. It should be noted that this is true only up to a certain limit; in pursuit of the attractiveness of distractors, a sense of proportion is often lost. The attractiveness of each answer is tested empirically.

Single or multiple choice questions are the most criticized form. Proponents of habitual approaches argue that knowledge can only be truly tested in the process of direct communication with the student, asking him clarifying questions, which helps to better clarify the true depth, strength and validity of knowledge. We must agree with such statements. However, there are still issues of saving the living labor of teachers and students, saving time and problems of increasing the efficiency of the educational process.

It is often believed that finding the right answer is much easier than formulating it yourself. However, in well-done assignments to an ignorant student, incorrect answers often seem more plausible than correct ones. The talent of the test developer is revealed in the process of creating exactly wrong, but very plausible answers. Another objection is that a test task with a choice of one or more correct answers is suitable only for assessing knowledge of the so-called lower level.

A variant of tasks is allocated with the choice of one, the most correct answer from among the proposed ones. Accordingly, the instructions for such tasks are also written: “Circle the number of the most correct answer.” Naturally, it is assumed that all other answers to the tasks are correct, but to a different extent.

There are three reasons for introducing such tasks into practice.

The first is the old idea of ​​excluding incorrect answers from assignments that weak students can supposedly remember. If you follow this very controversial thesis, then you can’t give wrong answers when testing at all.

The second reason for introducing such tasks into practice is more realistic. It concerns the need to develop in students not only the ability to distinguish correct answers from incorrect ones, but also the ability to differentiate the measure of the correctness of answers. This is really important, both in general secondary and higher professional education.

The third reason for using tasks with the choice of the most correct answer is the desire to check the completeness of knowledge with their help.

No matter how convincing the reasons for introducing such tasks into practice, the latter are unlikely to be widely used.

In tasks of an open form, ready-made answers are not given: they must be invented or received by the person being tested. Sometimes, instead of the term "tasks of an open form", the terms are used: "tasks for addition" or "tasks with a constructed response". For an open form, it is customary to use an instruction consisting of one word: “Complete”.

Example 2. Add.

In the binary system, 10-1=_________.

Add-on tasks are of two markedly different types:

1) with restrictions imposed on answers, the possibilities of obtaining which are appropriately determined by the content and form of presentation;

2) tasks with a freely constructed answer, in which it is necessary to compose a detailed answer in the form of a complete solution to the problem or give an answer in the form of a micro essay.

In tasks with restrictions, it is determined in advance what is unambiguously considered the correct answer, and the degree of completeness of the answer is set. Usually it is quite short - one word, number, symbol, etc. Sometimes - longer, but not exceeding two or three words. Naturally, the regulated brevity of answers puts forward certain requirements for the scope, so the tasks of the first type are mainly used to assess a fairly narrow range of skills.

A distinctive feature of tasks with restrictions on padded answers is that they must generate only one correct answer planned by the developer.

Tasks of the second type with a freely constructed answer have no restrictions on the content and form of the answers. For a certain time, the student can write anything and how he wants. However, the careful formulation of such tasks implies the existence of a standard, which is usually the most correct answer with the characteristics and quality features that describe it.

In assignments for establishing correspondence, the teacher checks the knowledge of the relationships between the elements of two sets. The elements for comparison are written in two columns: on the left, the elements of the defining set containing the statement of the problem are usually given, and on the right, the elements to be selected.

The tasks are given a standard instruction: "Make a match."


Example 3: Match

a B C) - _____________.

It should be noted that it is desirable that there are more elements in the right column than in the left one. In this situation, there are certain difficulties associated with the selection of plausible redundant elements. Sometimes, for one element of the left set, it is necessary to select several correct answers from the right column. In addition, correspondences can be extended to three or more sets. The effectiveness of the task is significantly reduced if implausible options are easily distinguished even by ignorant students.

The performance of the match is also reduced in cases where the number of elements in the left and right columns is the same and there is simply nothing to choose from when matching the last element on the left. The last correct or incorrect match is automatically established by successive exclusion of elements for previous matches.

Test tasks for establishing the correct sequence are designed to assess the level of proficiency in a sequence of actions, processes, etc. In tasks, actions, processes, elements associated with a specific task are given in an arbitrary, random order. The standard instruction for these tasks is: "Set the correct sequence of actions."

Example 4: Set the correct sequence

The full branch command on UAY has the format:

    otherwise<серия 2>

    That<серия 1>

    If<условие>

Tasks to establish the correct sequence receive friendly support from many teachers, which is explained by the important role of ordered thinking and activity algorithms.

The purpose of introducing such tasks into the educational process is the formation of algorithmic thinking, algorithmic knowledge, skills and abilities.

Algorithmic thinking can be defined as an intellectual ability that manifests itself in determining the best sequence of actions in solving educational and practical problems. Typical examples of the manifestation of such thinking are the successful completion of various tasks in a short time, the development of the most effective computer program, etc.

The choice of task forms is determined by many very controversial factors, including the features of the content, the goals of testing, and also the specifics of the contingent of subjects. Verification is easier when using closed-form tasks, however, such tasks are less informative. Open form tasks are more informative, but it is more difficult to organize their verification. An even more difficult task is the creation of computer programs to check the correctness of answers to such tasks. This is due to the richness of the vocabulary of the subjects (synonyms can be used in the answer), attentiveness (typos, mismatch of registers), etc.

For successful orientation in the forms of tasks, you can use a special table (see table 1) for a comparative analysis of tasks proposed by M.B. Chelyshkova.

According to the developer, this table is purely indicative, however, its use can facilitate the process of selecting test tasks of various forms for solving certain diagnostic problems.


Table 1

Comparative analysis of the characteristics of test items

Characteristics Closed Form Tasks Complementary Tasks Matching Tasks Sequencing TasksFact-checking Good Good Good GoodApplication of knowledge according to the model Suitable Suitable Suitable SuitableApplication of knowledge in non-standard situations Unsuitable Suitable Unsuitable SuitableEase of construction Yes Yes No NoGuessing exclusion Not excluded Not excluded Not excluded Not excludedObjectivity of assessment Yes No Yes YesMisspellings No Yes No NoPossibility of original answer No Yes Yes/No No

Compliance of tasks in the test form with the requirements of pedagogical correctness of content and form are necessary, but not sufficient conditions for calling them test ones.

The transformation of tasks in a test form into test tasks begins from the moment of statistical verification of each task for the presence of test-forming properties.

  1. Empirical verification and statistical processing of results

The presence of a sufficient number of test tasks allows you to proceed to the development of a test as a system with integrity, composition and structure. At the third stage, tasks are selected and tests are created, the quality and efficiency of the test are improved.

The integrity of the test forms the relationship between the answers of the subjects to the test tasks, the presence of a common measurable factor that affects the quality of knowledge.

The composition of the test forms the correct selection of tasks, allowing the minimum necessary number to display the essential elements of the language competence of the subjects.

The level and structure of knowledge are revealed by analyzing the answers of each subject to all test tasks. The more correct answers, the higher the individual test score of the subjects. Typically, this test score is associated with the concept of "level of knowledge" and undergoes a refinement procedure based on one or another model of pedagogical measurement. The same level of knowledge can be obtained by answering different tasks. For example, in a test of thirty tasks, the subject received ten points. These scores are most likely obtained by correct answers to the first ten relatively easy tasks. The sequence of ones and then zeros inherent in such a case can be called the correct structure of the subject's preparedness. If the opposite picture is revealed, when the subject answers correctly to difficult tasks and incorrectly to easy ones, then this contradicts the logic of the test and therefore such a profile of knowledge can be called inverted. It is rare, and most often, due to the error of the test, in which the tasks are located with violations of the requirement of increasing difficulty. Provided that the test is done correctly, each profile is indicative of a knowledge structure. This structure can be called elementary (since there are also factor structures that are revealed using factor analysis methods).

To determine the level of preparedness structuredness, you can use the L. Gutman coefficient, previously inaccurately called the measure of “test reliability”.


where r g is the coefficient of structuring;.

The level of knowledge largely depends on personal efforts and abilities, while the structure of knowledge significantly depends on the correct organization of the educational process, on the individualization of training, on the skill of the teacher, on the objectivity of control - in general, on everything that is usually lacking. The way to achieve this ideal lies through the difficulties of creating quality tests.

Test development begins with an analysis of the content of the knowledge being taught and mastering the principles of formulating test items. Unfortunately, tests are still viewed as a tool that is easy to come up with, while the strength of tests is their effectiveness, which comes from theoretical and empirical validity.

At the third stage, developers of a new generation of tests will need some mathematical and statistical training, knowledge of test theory. Test theory can be defined as a set of consistent concepts, forms, methods, axioms, formulas and statements that improve the efficiency and quality of the test process. In addition, some experience in the application of multivariate statistical analysis methods and experience in the correct interpretation of test results may be required.

The question often arises: “How will the deleted tasks behave in other groups of subjects?” The answer depends on the quality of the selection of groups, and more precisely on the statistical plan for the formation of sampling sets. The correct answer to this question should be sought in the sense of the concept of "target group"; this is the set of subjects in the general population for whom the developed test is intended.

Accordingly, if the tasks of the designed test behave differently in different groups, then this is most likely an indication of errors in the formation of samples of subjects. The latter should be as homogeneous as the subjects in the target group. In the language of statistics, this means that the subjects in the target and experimental groups must belong to the same general population.

Logarithmic scores, called logits, of such seemingly disparate phenomena as the level of knowledge of the subject with the level of difficulty of each task, were used to directly compare the level of difficulty with the level of preparedness of the subject.

According to Bespalko V.P. and Tatur Yu.G., testing should be a measurement of the quality of mastering knowledge, skills and abilities. Comparison of the rules for completing the task (task) proposed in the text with the standard of the answer allows you to determine the coefficient of knowledge assimilation (Kus). It should be noted that , where A is the number of correct answers, and P is the number of tasks in the proposed tests.

The definition of K us is an operation to measure the quality of knowledge acquisition. To us lends itself to normalization (0< К us < 1), процедура же контроля усвоения легко автоматизируется. По коэффициенту судят о завершенности процесса обучения: если К us >0.7, then the learning process can be considered complete. When mastering knowledge with K us ≤ 0.7, the student systematically makes mistakes in professional activities and is unable to correct them due to the inability to find them. The lower acceptable limit of the end of the learning process is increased to the value required from the point of view of the safety of the activity.

  1. Principles of content selection. Criteria for evaluating the content of the test

When creating a test, the attention of the developer is primarily attracted by the selection of content, which can be defined as the optimal display of the content of the academic discipline in the system of test tasks. The requirement of optimality involves the use of a certain selection methodology, including the issues of goal setting, planning and assessing the quality of the test content.

The goal-setting stage is the most difficult and at the same time the most important: the quality of the test content primarily depends on the results of its implementation. In the process of goal-setting, the teacher needs to decide what results of students he wants to evaluate with the help of the test.

The grounds for errors in the conclusions of the teacher are not always associated with the technological shortcomings of traditional means of control. Sometimes they are caused by the teacher's shortcomings at the goal-setting stage, when the center of gravity of the test is shifted to secondary learning goals, and sometimes the goal-setting stage is completely absent, since some teachers are confident in the infallibility of their experience and intuition, especially if they have been working at school for many years. However, no even very perfect control methods and no experience will give grounds for reliable conclusions about the achievement of learning goals as long as there is no confidence in the correct setting of control goals and in their correct, unbiased display in the content of the test.

When creating a test, the task is to display in its content the main thing that students should know as a result of training, therefore, it is impossible to confine oneself to a simple enumeration of learning objectives. I would like to include everything in the test, but, unfortunately, this is impossible, so some of the goals have to be simply discarded and the degree of their achievement by students not checked. In order not to lose the most important thing, it is necessary to structure the goals and introduce a certain hierarchy in their mutual arrangement. Without a doubt, there are no ready-made general recipes here, since each discipline has its own priorities. In addition, individual goals are noticeably interconnected, and therefore a simple idea of ​​a system of goals as an ordered set without considering the relationships between elements is clearly not enough.

After defining the goals of testing and specifying them, it is necessary to develop a plan and test specification.

When developing the plan, an approximate layout of the percentage of the content of the sections is made and the required number of tasks is determined for each section of the discipline based on the importance of the section and the number of hours allotted for studying it in the program.

The layout starts with counting the planned initial number of tasks in the test, which then, in the process of working on the test, will repeatedly change upwards or downwards. Usually the limit number does not exceed 60 - 80 tasks, since the testing time is chosen within 1.5 - 2 hours, and an average of no more than 2 minutes is given to complete one task.

After completing the first step of planning the content, a test specification is developed, which fixes the structure, content of the test, and the percentage of items in the test. Sometimes the specification is made in an expanded form, containing indications of the type of items that will be used to assess student achievement in accordance with the intended goals of creating a test, the time of the test, the number of items, the features of testing that may affect the characteristics of the test, etc.

The specification in expanded form includes:

    the purpose of creating a test, the rationale for choosing an approach to its creation, a description of the possible areas of application of the test;

    a list of regulatory documents used in planning the content of the test;

    description of the general structure of the test, including a list of subtests (if any) indicating approaches to their development;

    the number of tasks of various forms, indicating the number of answers to closed tasks, the total number of tasks in the test;

    the number of parallel test variants or a link to the cluster containing the number and numbers of cluster tasks;

    the ratio of tasks for various sections and types of educational activities of schoolchildren;

    coverage of standards requirements (for certification tests);

    a list of requirements not included in the test (for certification tests);

Knowledge and skills are divided as follows:

A - knowledge of concepts, definitions, terms;

B - knowledge of laws and formulas;

C - the ability to apply laws and formulas to solve problems;

D - the ability to interpret the results on graphs and diagrams;

E - the ability to make value judgments.

The following proportions are often set:


A - 10%, B - 20%, C - 30%, D - 30%, E - 10%.

In addition to the criteria, there are general principles that contribute to a certain extent to the correct selection of test content.

The principle of representativeness regulates not only the completeness of the display, but also the significance of the content elements of the test. The content of the assignments should be such that, based on the answers to them, one can conclude that one knows or does not know the entire program of the section or course being checked.

The principle of consistency involves the selection of meaningful elements that meet the requirements of consistency and are interconnected by a common structure of knowledge. Subject to the principle of consistency, the test can be used to identify not only the amount of knowledge, but also to assess the quality of the structure of students' knowledge.

After selecting the content of the test, the most important stage of creating pre-test tasks begins. This work is usually entrusted to the most experienced teachers with a long history of work in the school. However, experience alone is not enough to create tasks. It also requires special knowledge on the theory and methodology of developing pedagogical tests, which provide a professional approach to the creation of pre-test tasks.

V.S. Avanesov identified 3 criteria for selecting the content of test items:

1) the certainty of the content of the test;

2) consistency of the content of tasks;

3) the validity of the content of test items.

1. The certainty of the content of the test forms the subject of the pedagogical dimension. In the case of a homogeneous test, the question arises of being sure that all test items test knowledge in a particular academic discipline, and not in some other. Quite often it happens that the correct answers to some tasks require knowledge not only of the discipline of interest, but also of a number of other, usually related and preceding academic disciplines. The proximity and connectedness of which makes it difficult to accurately determine the subject matter of the measured knowledge.

For example, in physical calculations, a lot of mathematical knowledge is used, and therefore the system of physical knowledge usually includes the mathematics that is used in solving physical problems. Failure in mathematical calculations generates failure in answers to tasks of a physical test. A negative score is given, respectively, for ignorance of physics, although the subject made mathematical errors. If such a test includes many such tasks that, for a correct solution, require not so much physical knowledge as the ability to perform complicated calculations, then this may be an example of an inaccurately defined content of a physics test. The smaller the intersection of the knowledge of one academic discipline with the knowledge of another, the more clearly the content of the academic discipline is expressed in the test. Certainty of content is also required in all other tests. In a heterogeneous test, this is achieved by explicitly separating tasks from one academic discipline into a separate scale. At the same time, there are often tasks that work well not only on one, but also on two, three, or even more scales.

In any test task, it is determined in advance what is unambiguously considered the answer to the task, with what degree of completeness the correct answer should be. It is not allowed to define a concept through enumeration of elements that are not included in it.

2. The consistency of the content of tasks requires that judgments do not arise regarding the same thought, both affirming and denying it. Existence of two exclusive answers to the same task of the test is inadmissible. If subjects are instructed to "Circle the number of the correct answer" and then one of the answers states that there is no correct answer, then this is an example of inconsistency in the thinking of the test designer. In some tests, there are answers that are not related to the content of the task at all. Such answers are quite easily recognized by the subjects as erroneous, and therefore the test turns out to be ineffective. To improve efficiency, the test is preliminarily tested on a typical sample of subjects. And if such answers to tasks are found that the subjects do not choose at all, then such answers are removed from the test. Because they do not perform the function of the so-called distractors, designed to divert the attention of unknowing subjects from the correct answer. In addition, such distractors are harmful to the test, because they reduce the accuracy of measurements (but more on that in the articles, which will consider questions of test reliability).

3. The validity of the content of test tasks means that they have grounds for truth. Validity is related to the arguments that can be given in favor of one or another wording of the test items. In the absence of conclusive arguments in favor of the correctness of the formulated task, it is not included in the test, under any pretext. The same happens if at least one counterargument arises in the process of expert discussion, or a condition is allowed under which this statement may turn out to be ambiguous or false. The idea of ​​the validity of the content of the test is closely intertwined with the principle of meaningful correctness of test items, as already mentioned in the previous article. Recall that the test includes only that content of the academic discipline that is objectively true and that lends itself to some rational argumentation. Accordingly, controversial points of view, quite acceptable in science, are not recommended to be included in the content of test items.

The untruth of the content of test tasks differs from the incorrectness of their formulation. Untruth, as noted above, is determined by the corresponding answer, while an incorrectly formulated task can produce both correct and incorrect answers, and even cause bewilderment. This also includes inaccurately or ambiguously formulated tasks that generate several correct or conditionally correct answers. Hence, it becomes necessary to introduce additional truth conditions, which lengthens the task itself and complicates its semantics. The incorrectness of the wording is usually found out in the process of discussing the content of tasks with experienced expert teachers. The success of such a discussion is possible by creating an appropriate cultural environment where only constructive and tactful judgments are allowed. Alas, experience convinces us that this does not happen often. Meanwhile, only a joint and friendly discussion of materials by developers and experts can create an atmosphere of searching for the best options for the content of the test. This search is almost endless, and there is no ultimate truth.

  1. The ratio of the form of the task and the type of knowledge, skills, abilities being tested

As mentioned in previous articles, for testing purposes, knowledge can be divided into three types: offered, acquired and tested. Now let's look at this issue in a little more detail.

The proposed knowledge is given to students in the form of textbooks, materials, texts, lectures, stories, etc., reflecting the main part of the educational program. This knowledge is formulated, in addition, in the system of tasks, according to which the students themselves can check the degree of their preparedness.

The knowledge acquired by students is usually only a part of the knowledge offered, more or less, depending on the learning activity of students. With the development of computer learning, conditions have appeared for exceeding the amount of acquired knowledge over the amount of knowledge offered. This is a new situation associated with the possibilities of mass immersion of students in the global educational space, in which the leading role of assignments in the process of acquiring knowledge is already quite well understood. The solution of educational tasks is the main stimulus for the activation of learning, the students' own activity. This activity can take place in the form of work with a teacher, in a group or independently. The arguments about the levels of assimilation, which are widespread in the literature, refer exclusively to acquired knowledge.

The knowledge being tested forms the main content of the document, which can be called an exam or testing program, depending on the chosen form of knowledge control. The main feature of the tested knowledge is their relevance, which means the readiness of the subjects for the practical application of knowledge to solve the tasks used at the time of the test. In higher education, the same feature is sometimes called the efficiency of knowledge.

In the process of testing schoolchildren and applicants, usually only such knowledge is checked that is in RAM, that does not require reference to reference books, dictionaries, maps, tables, etc. Among the knowledge being tested, one can also single out normative knowledge that is subject to mandatory assimilation by students and subsequent control by the educational authorities through an expertly selected and approved by the governing body system of assignments, tasks and other control materials.

In addition, the properties of knowledge are highlighted. IN AND. Ginetsinsky identifies the following properties of knowledge:

 reflexivity (I not only know something, but I also know that I know it);

 transitivity (if I know that someone knows something, then it follows that I know this something);

 antisymmetry (if I know someone, it does not mean that he knows me).

Classification of types and levels of knowledge

Classification of types and levels of knowledge, formulated by Bloom to solve practical problems of pedagogical measurement.

    Knowledge of names. Socrates owns the words: who comprehends the names, he will comprehend also that to which these names belong. As noted by the famous foreign philosopher J. Austin, knowledge of an object or phenomenon is largely determined by whether we know its name, or rather, its correct name.

    Knowledge of the meaning of titles and names. It has long been known that as we understand, so we act. Understanding the meaning of names and names helps to remember and use them correctly. For example, with the name "Baikal" some of the younger students may think not of the famous lake, the pearl of Russia, but of the fruit water sold under the same name. Another example can be taken from the field of political consciousness. As Yu.N. Afanasiev, A.S. Stroganov and S.G. Shekhovtsev, the consciousness of the former Soviet people turned out to be unable to see the various meanings of such abstractions of language as "freedom", "power", "democracy", "state", "people", "society", considering them to be clear by default. Which was one of the reasons that allowed, with the active complicity of these people, to destroy their own life support system.

    factual knowledge. Knowing the facts allows you not to repeat the mistakes of your own and others, to enrich the evidence base of knowledge. Often they are recorded in the form of scientific texts, observational results, recommendations such as safety precautions, worldly wisdom, sayings, sayings. For example, from Ancient China came the saying of the Chinese thinker Ju Xi: do not cook sand in the hope of getting porridge.

    Knowledge of definitions. The weakest point in school education, because definitions cannot be taught; they can be understood and assimilated only as a result of independent efforts to master the required concepts. Knowledge of the system of definitions is one of the best evidence of theoretical preparedness. In the educational process, all four considered types of knowledge can be combined into a group of reproductive knowledge. As noted by I.Ya. Lerner, over the years of schooling, students perform over 10 thousand tasks. The teacher is forced to organize reproductive activity, without which the content is not initially assimilated.

This is knowledge that does not require noticeable transformation during assimilation, and therefore they are reproduced in the same form in which they were perceived. They can, with some convention, be called knowledge of the first level.

    Comparative, comparative knowledge. They are widespread in practice and in science, inherent mainly in intellectually developed persons, especially specialists. They are able to analyze and choose the best options for achieving a particular goal. As N. Kuzansky noted, "all researchers judge the unknown by means of a commensurate comparison with something already familiar, so that everything is studied in comparison."

    Knowledge of opposites, contradictions, antonyms, etc. objects. Such knowledge is valuable in training, especially at the very beginning. In some areas, such knowledge is essential. For example, in a school life safety course, you need to know exactly what students can and cannot do under any circumstances.

    Associative knowledge. They are characteristic of an intellectually developed and creative person. The richer the associations, the greater the conditions and the higher the probability for the manifestation of creativity. To a large extent, it is on the richness of associations that the language culture of the individual, the work of a writer, the work of an artist, designer and workers of other creative professions are built.

    classification knowledge. Mainly used in science; Examples - Linnaeus classifications, D.I. Mendeleev, test classifications, etc. Classification knowledge is generalized, systemic knowledge. This type of knowledge is inherent only to persons with sufficient intellectual development, as it requires developed abstract thinking, a holistic and interconnected vision of the totality of phenomena and processes. The knowledge system is, first of all, the possession of effective definitions of the basic concepts of the studied sciences.

Knowledge of p.p. 5-8 can be attributed to the second level. Such knowledge allows students to solve typical tasks as a result of subsuming each specific task under known classes of phenomena and methods being studied.

    Causal knowledge, knowledge of cause-and-effect relationships, knowledge of the foundations. As W. Shakespeare wrote, the time for the inexplicable has passed, everything has to look for reasons. In modern science, causal analysis is the main focus of research. As L. Wittgenstein noted, they say "I know" when they are ready to give undeniable reasons.

    Procedural, algorithmic, procedural knowledge. They are the main ones in practice. Mastering this knowledge is an essential sign of professional readiness and culture. This group also includes technological knowledge that makes it possible to inevitably obtain the planned result.

    Technological knowledge. This knowledge is a special kind of knowledge that manifests itself at different levels of preparedness. This can be relatively simple knowledge about a separate operation of the technological chain, or a set of knowledge that makes it possible to achieve the set goals without fail at the lowest possible cost.

Knowledge of p.p. 9-11 can be attributed to knowledge of a higher, third level. They are acquired mainly in the system of secondary and higher vocational education.

The following types of knowledge can be attributed to the highest, fourth level of knowledge:

    Probabilistic knowledge. Such knowledge is needed in cases of uncertainty, lack of existing knowledge, inaccuracy of available information, if necessary, to minimize the risk of error in decision-making. This is knowledge about the patterns of data distribution, the reliability of differences, and the degree of validity of hypotheses.

    abstract knowledge. This is a special kind of knowledge, in which idealized concepts and objects that do not exist in reality are operated. There are many such objects in geometry, natural sciences, and in those social sciences that are called behavioral in the West - these are psychology, sociology, and pedagogy. Probabilistic, abstract and special scientific knowledge in each separate discipline of knowledge form the basis of theoretical knowledge. This is the level of theoretical knowledge.

    Methodological knowledge. This is knowledge about the methods of transforming reality, scientific knowledge about building effective activities. This is knowledge of the highest, fifth level.

The listed types of knowledge do not yet form a complete classification system and therefore allow for the possibility of a noticeable expansion of the presented nomenclature, replacing some types of knowledge with others, and combining them into various groups.

Each of the listed types of knowledge is expressed by the corresponding form of test tasks.

To determine the degree of training in each academic discipline, the amount of knowledge that is necessary for mastering according to the curriculum is allocated, which constitutes the basic amount of knowledge. Basic knowledge represents the minimum of the state educational standard. However, even among the basic knowledge, those that should remain in memory in any discipline are singled out; in the aggregate, they form worldview knowledge. BOO. Rodionov and A.O. Tatur (testing center MEPhI) distinguish several links of worldview knowledge: basic knowledge, program knowledge, super-program knowledge. Pedagogical tests are the only tool that allows not only to measure learning, but also the ability to use knowledge. If we talk only about skills, then at all levels of learning, four types of skills can be distinguished:

1) the ability to recognize objects, concepts, facts, laws, models;

2) the ability to act according to a model, according to a known algorithm, rule;

3) the ability to analyze the situation, isolate the main thing and build procedures from the mastered operations that allow obtaining a solution to the test task;

4) the ability and ability to find original solutions.

Four types of skills named by B.U. Rodionov and A.O. Tatur, do not contradict the theory of the phased formation of mental actions, which is based on the method of developing automated testing in order to assess the assimilation of knowledge, the acquisition of skills and abilities. This allows you to create not only expert systems for assessing the degree of student learning, but also to build a flexible dynamic rating system for monitoring knowledge.

Classification of test items

According to the structure and method of answering, test tasks will be divided into closed-type test tasks, i.e. tasks with prescribed answers, and open type, i.e. tasks with free answers.

Let's represent the classification of test items as follows:

1. Test tasks of an open type:

a) tasks-additions;

b) tasks of free presentation;

2. Test tasks of a closed type:

a) setting alternative answers;

b) tasks with multiple choice;

c) assignments for establishing compliance;

d) sequencing tasks;

e) assignments for classification.

Avanesov V.S. offers the following classification of test items.

Scheme 1. Forms of test items

Types of tasks with the choice of the correct answer

Variants of tasks with the choice of the correct answer

Its distinctive features are:

1. open form tasks are not classified.

2. Tasks are classified according to the number of correct answers and the number of answer options.

The Dutch Institute for Educational Evaluation (CITO) gives this classification in three schemes.

All questions that offer a choice of multiple answers are called multiple choice questions.

In this classification, multiple choice tasks and closed type tasks in our proposal are the same.

CITO, using a broad understanding of testing, provides two more forms of tasks - oral and tasks on performance technique. Unfortunately, there is no experience in using oral test items in our country. There is no experience in this matter. Tasks for performance technique are undoubtedly of considerable interest: “In technical test papers The exam involves reviewing and assessing a skill such as speaking or a process, such as performing a small experiment or making a product.

Closed test items

Test tasks of a closed type express judgments in a complete form and provide for various options for a question or task. The subject is offered a set of options from which he chooses one or more correct (or incorrect) answers. Sometimes variants of incorrect answers are called distractors (in the American test literature, an incorrect, but plausible answer is called the word distractor, from the English verb to distract - to distract).

1. Tasks of alternative answers. For each task of alternative answers, only two answers are given. The subject must choose one of them - “yes - no”, “correct - incorrect”, etc.

Questions of alternative answers are the simplest, but not the most common when compiling tests. Questions of alternative answers are used to evaluate one element of knowledge. Using multiple choice questions as a single question, on its own, generally results in trivial testing and is not desirable for use. The CITO recommendations say the same: “Alternate answer questions offer only one alternative, which the test-taker either accepts as correct or rejects.” Thus, the subjects have the opportunity to guess the correct answer to one question by 50%. Therefore, it is advisable to apply these tasks in series to one element of knowledge.

A feature of the tasks of alternative answers is that the question must be formulated in the form of a statement, since it implies agreement or disagreement, which can be attributed to the statement.

Let us give an example when the elements of knowledge are adequate to this form of task.

Instructions: Circle the answer “yes” or “no”. (If you agree with the statement, circle 'yes'; if you disagree, circle 'no').

Question: Hydrolysis is a process in which:

Answer options:

1) salt is decomposed with the help of electric current Not really

2) salt is oxidized yes / no

3) the color of the indicator changes yes / no

4) salt crystallizes yes / no

5) salt interacts with water yes / no.

The most effective use of tasks of this type is when testing knowledge of large definitions, understanding of complex processes, the ability to read function graphs, interpret diagrams, tables, i.e. such elements of knowledge that can be structured or divided into smaller parts.

2. Tasks with multiple choice. Multiple choice tasks involve variability in choices. The subject must choose one of the proposed options, among which most often only one is correct.

Tasks with a choice of answer (answers) in test practice are quite widespread, this is due to the convenience of this form for automating knowledge control, as well as the possibility of quantifying the quality of a task, which in turn is an assessment of its “distinctive” power (differenting power, discriminating power) - the ability of the task to differentiate students of different academic backgrounds.

Instructions for multiple choice questions: circle the letter corresponding to the correct answer.

In multiple choice tasks, the number of correct answers is not limited by objective reasons. In the event that there are several options for correct answers, the instructions should be modified, indicating that it is necessary to mark the letters corresponding to the correct answers. Or otherwise indicate that there are several correct options.

Closed tasks with two or three or four answer options and the choice of one correct answer are usually used to check the orientation of students in the discipline, self-testing knowledge, express assessment of the preparedness of applicants, course participants, etc.

Closed tasks with the choice of several correct answers are more difficult than those with one correct answer. They are also less likely to guess the correct answers. Can be used in disciplinary and qualification tests. It is assumed that the student must indicate all the correct answers to the task.

3. Tasks for establishing compliance. In tasks for establishing correspondence, it is necessary to find a correspondence (or equate parts, elements, concepts) between elements of two lists (sets).

A common form of instruction for students when answering this question of this type is the option using arrows: draw arrows from the elements of the first list to the second, connect the corresponding concepts with arrows, etc.

Matching assignments make it possible to test mainly associative knowledge that exists in each academic discipline. This is knowledge about the relationship and relationships between elements, properties, laws, formulas, dates, etc. of two "lists" (columns).

The benefits of matching assignments include:

Compact arrangement of tasks in the test;

The ability to quickly assess knowledge, skills, both subject and intellectual;

Activation of students' activities with the help of associations in the studied discipline.

The pedagogical meaning of the use of such tasks lies in the desire to intensify the students' own learning activities by strengthening the associations of the studied elements and understanding the results of control and self-control. The subjects acquire knowledge, important for the process of independent learning, about what they do not know.

One of the formal requirements for matching assignments is the unequal number of elements in the right and left columns. Redundant (plausible but incorrect) answers are in only one column. They act as distractors. If the number of elements in the columns were the same, then the last pair would be selected automatically by the subjects using the sequential elimination method.

4. Task to restore the sequence. Such tasks allow you to test the algorithmic knowledge, skills and abilities necessary to establish the correct sequence of various actions, operations, calculations. They are suitable for any subject, where there is an algorithmic activity or temporal events. For technologies, this can be the order of technological operations, for the humanities - the restoration of time sequences of events, for the exact sciences - algorithms for solving problems, and this list is almost endless.

The role of algorithms for correct and effective activity is important at all stages of learning, they are especially necessary at the final stages. vocational training. The purpose of introducing such tasks into the educational process is to check the formation of algorithmic thinking.

It is also necessary to note the low probability of guessing the correct answer, which is characteristic of this form of tasks.

Instructions for such tasks: set the correct sequence; put in the correct order.

Rules to be observed when developing a task:

1) the algorithm to be restored must be correct in terms of purpose and content, and also homogeneous in interpretation, i.e., it is assumed that there is a single algorithm corresponding to the correct answer;

2) the keyword in the name and the words describing the elements are better written in the nominative case, since the endings of the words can suggest the correct answer.

5. Tasks for classification. The classification of objects is based on the ability to compare, i.e., to find similarities and differences. It is possible to compare objects only by a certain common property (attribute, parameter), which is the basis of classification.

If this property is not specified, then the question of comparison cannot be resolved. So, in order to compare objects, you must first identify them. general properties, and only then establish how they differ, by how much or how many times.

Task design: a list of numbered objects (words, formulas, figures, etc.) and a table to be filled out. If the table contains a list of classification grounds, then the tasks are closed, otherwise they are open.

Instructions: classify by filling out the table.

Rules to be followed when classifying:

1. The essential properties (features) of objects should be taken as the basis for classification;

2. when classifying on some basis, each object should fall into only one class as a result of classification.

Advantages of closed tasks:

Items can be reliable because there are no factors associated with subjective assessments that reduce reliability;

Assessment of tasks is completely objective: there can be no differences between the assessments of different assessors;

It doesn't matter if the subjects are good at formulating answers;

Tasks are easily processed, testing is carried out quickly;

A simple filling algorithm reduces the number of random errors and typos;

These tasks allow you to cover large areas of knowledge;

Machine processing of responses is possible;

There are two types of open-ended assignments - addition assignments and free presentation assignments. A distinctive feature of free presentation tasks is that in order to complete them, the subject himself needs to write down one or more words (numbers, letters, possibly phrases or even sentences). They assume free answers of the subjects in essence of the task. There are no restrictions on answers. However, the wording of tasks should ensure that there is only one correct answer.

2.2. Closed type tasks

Tasks of a closed type include tasks of several types: alternative answers (AO), multiple choice. Closed-type test tasks - provide for various answers to the question posed: one or more correct answers are selected from a number of proposed ones, correct (or incorrect) elements of the list are selected, etc. These are tasks with prescribed answers, which implies the presence of a number of pre-developed answers to a given question

2.3. Compliance tasks

In tasks of correspondence (correspondence restoration) it is necessary to find a correspondence (or equate parts, elements, concepts) between elements of two lists (sets). This form of tasks is quite diverse and can be successfully used for all academic subjects and subject areas. In almost every subject there is a wide possibility of their use. Matching tasks require the selection of a suitable answer

2.4. Resequencing tasks

The sequencing tasks can be considered as a variant of the matching task when one of the series is time, distance, or another continuum that is meant to be a series. Since this form of tasks requires special instructions, we have singled it out in a separate subsection.

2.5. Using tasks of psychological tests to identify the structure of intelligence for tests of educational achievements

Often in achievement tests one can find attempts to use specific tasks specially designed by psychologists for intelligence tests. These are basically three types of tasks: analogies, classifications and exclusion of the superfluous. The peculiarity of these tasks is that the result of their implementation depends not only on the knowledge of the subject content of the task, but also on the complexity of the intellectual operation that these tasks involve.