In our definition, human activity can be expressed by five basic attributes: actor, action, object, time and location. The goal of this paper is describe a method to automatically extract all of the basic attributes and the transition between activities derived from sentences in Japanese web pages. However, previous work had some limitations, such as high setup costs, inability to extract all attributes, limitation on the types of sentences that can be handled, and insufficient consideration interdependency among attributes. To resolve these problems, this paper proposes a novel approach that uses conditional random fields and self-supervised learning. Given a small corpus sample as input, it automatically makes its own training data and a feature model. Based on the feature model, it automatically extracts all of the attributes and the transition between the activities in each sentence retrieved from the Web corpus. This approach treats activity extraction as a sequence labeling problem, and has advantages such as domain-independence, scalability, and does not require any human input. Since it is unnecessary to fix the number of elements in a tuple, this approach can extract all of the basic attributes and the transition between activities by making only a single pass. Additionally, by converting to simpler sentences, the approach can deal with complex sentences retrieved from the Web. In an experiment, this approach achieves high precision (activity: 88.9%, attributes: over 90%, transition: 87.5%).