> topics > Sociology

Big data fuels 7th national population census

ZHANG LIPING and WANG GUANGZHOU | 2020-05-26 | Hits:
(Chinese Social Sciences Today)
A door-to-door census visit is an important way to get an accurate count of the population, but it is time-consuming and costly. The 2020 census features an additional online registration system to enrich data collection. Photo: CHINA DAILY


This year China will conduct its seventh national population census. Since 1953, China has successfully conducted six population censuses, providing a reliable basis for scientific decision-making and the formulation of national development strategies and programs as well as socioeconomic, ecological and other major policies. The quality of census data is an important indicator of a country’s comprehensive national strength, level of socioeconomic development and governance capacity. 
However, the census is often gripped by issues of information repetition, omission and concealment as well as errors in mapping the number and structure of the population. In the face of new socioeconomic development and a new cultural environment, it is urgent to identity new changes and trends in population development so as to put forward prospective solutions and strategies.
New trends in past decade
Since the sixth census in 2010, great changes have taken place in China’s social and economic environment and population structure.
The rapid development of information technologies such as the internet, cloud computing, big data, mobile communications and social media applications has accelerated the innovation of e-government, and the national informatization project has generated fruitful results. 
The national population information database, using citizens’ ID card numbers as the only reference for access, covers China’s whole population, and it recorded 1.399 billion entries upon its completion in 2017. The database ensured the consistency and accuracy of information that used to be collected separately by different departments and was thus hard to verify, such that it has shed a light on China’s population status. The new database avoids the issue of redundant and dispersed data collection and maintenance while forming an information management system that covers the whole life cycle of China’s citizens.
In recent years, the urbanization rate has accelerated, with the proportion of urban population rising from 49.95% in 2010 to 59.58% in 2018. In 2010, the migrant population was 221 million; and in 2017, the number hit nearly 245 million. Though the proportion of migrant population in the total population has its ebbs and flows, it can be predicted that for a long time large-scale population migration will remain an important phenomenon in China’s population and socioeconomic development. 
In addition, with the reform of the household registration system and the formation and improvement of the permanent residence registration system, the national census will be able to meet whatever new demands may arise. 
Also, the family planning policy has finally been adjusted, so the pressure to conceal and fail to report births has been reduced. The number of one-child and two-child births accounts for more than 90% of the total number of new births. China’s universal two-child policy has fundamentally changed the mechanism for birth registration, providing favorable conditions for the confirmation of the total number of new births.
Problems in population census
Despite the past achievements of the population census, the difficulty of door-to-door census visits, the decline in the response rate, and concealment and underreporting are still major problems to be tackled.
Every time, the national census has had to start from scratch. Due to the lack of unique identifiers for the respondents, every census needs to be re-registered, and tracking the vertical change of existing census information has been as yet unrealizable. As a result, the achievement of the previous population censuses cannot be further consolidated on the technical level, which affects the in-depth data mining of information resources.
The identification of permanent residence faces great challenges. According to the results of previous censuses, it can be found that the problems of repetitions and omissions in reporting the migrant population are still relatively serious. In order to solve the problem of over-reporting and under-reporting, it is necessary to standardize the methods for surveying the total population, the permanent resident population and the household registration population. Given the disconnection between the place of household registration and a resident’s de facto permanent residence, as well as a large migrant population, it is extremely difficult to accurately report the permanent resident population.
The census is a household survey, and the United Nations has recommended that countries conduct both censuses and housing censuses at the same time. Due to the close relationship between the population census and housing census, it is important to count every resident in each household. In fact, a lot of countries conduct population and housing censuses at the same time. For example, within the 15 years from 1965 to 1979, 215 countries and regions conducted population and housing censuses 289 times, 205 of which were simultaneous. 
However, China’s population census and housing property rights lack unique identification and effective correlation. Therefore, it is difficult to dig into the housing data and its characteristics and also the residential population data and its characteristics.
New measures
Through the census, we can obtain China’s basic population data, understand new changes in the population structure, and provide basic information for economic and social development. The 2020 census will make adjustments in identity information registration, administrative data application and survey methods.
First, ID number registration will be adopted to facilitate data integration. Different from previous censuses, the established national population information database will play a crucial role in the seventh census’s process. This time, ID number registration will help better use existing information resources, which will enrich the population information system.
At the same time, ID numbers will be set as keys to further perfect the national population information database, also serving as a baseline for comparative studies in the future, and providing updated information for the administrative departments.
Second, we will make full use of administrative data to prepare for the census. At present, the administrative departments have accumulated abundant population-related information, and how to make full use of these data resources has become a focus of the census. 
The 2020 census will take advantage of China’s informatization projects. Prior to the census, we will take into account information on household residents, migrant population, birth, death, property rights and living conditions from the public security, health, civil affairs, social security, housing and other departments, to promote data integration and utilization in different systems, so as to reduce the barriers between the data sets.
Third, we will improve survey methods to address the difficulty of door-to-door household visits. With the acceleration of urbanization and the increase of population migration, the mapping of buildings in the census area becomes more complex, and the difficulty of door-to-door household visits spikes, which weighs on the census’s time and economic costs. Therefore, under the new technical conditions, the online reporting of residents’ information has become a new highlight in the data collection of this census.
Big data application
Different from previous censuses, the national census in the big data era is not a one-time deal, rather it is a process of standardizing the survey based on the reuse of previous census data.
First, it is necessary to establish a set of unique codes for housing, so that housing and residents can be precisely matched. The population census is registered according to current residence, and the population information is closely related to the housing information, but housing information has not been effectively applied in the population census registration. 
In the 2000 census, housing was added to the survey for the first time, but there was no comprehensive registration of home ownership. Therefore, we should establish a set of unique codes for housing and record standardized detailed addresses to paint a complete picture of urban and rural housing conditions. Such an effort will also be significant for perfecting the system of housing policy and its planning in the future.
Second, we need to strengthen data archiving and mining. As an important part of human civilization and scientific heritage, many countries in the world attach great importance to the preservation, development and utilization of census data. There are both legal and scientific preservation methods. 
In developed countries, original data obtained decades ago or even hundreds of years ago will be kept on file, continuing to be utilized in depth with the development of science and technology, and not damaged or lost due to the change of institution and regime. 
China has successfully conducted six population censuses, which have become valuable data for scientific research and government decision-making. In addition to the summary data of population censuses, the preservation of original population data and censes-related survey data is a prerequisite for the better utilization of data. Therefore, it is necessary to archive the existing census data and establish a directory for the database.
At the same time, according to international practice, representative original sample data should be made available for researchers to conduct long-term and in-depth studies.
Zhang Liping is a research fellow from the Institute of Sociology at the Chinese Academy of Social Sciences (CASS) and Wang Guangzhou is a research fellow from the Institute of Population and Labor Economics at CASS.
edited by YANG XUE