Readme Replication of “The Asymmetric Incidence of Business Taxes: Survey Evidence from German Firms”

by Richard Winter, Philipp Doerrenberg, Fabian Eble, Davud Rostam-Afschar, and Johannes Voget.

Overview

This replication package contains the code used to conduct the analysis in “The Asymmetric Incidence of Business Taxes: Survey Evidence from German Firms” using R and Stata. The code is partitioned into 21 scripts and do-files, which execute the full analysis, generating the data for five main figures, 10 appendix figures, 11 main tables, and six appendix tables. The replication process is expected to complete in 17 hours.

Software Requirements

The analysis is conducted using R and Stata, with the following configurations:

  • R: Version 4.3.1. RTools is required and must be installed prior to running the analysis. This project uses the renv package for dependency management, see the .lock file for dependency configurations and advice to replicators below.
  • Stata: Version 17.

Hardware Requirements

This analysis was performed on a machine with the following specifications:

  • CPU: Intel Xeon Gold 6254 CPU @ 3.10GHz
  • Memory: 1.00 TB
  • Operating System: Windows Server 2019 Standard

Without transforming the Orbis flatfiles into parquet and executing the geo-coding script, the total runtime is about 35 minutes. Reading and writing the financial and address data takes several hours, depending on the hardware. Georeferencing the unique firm addresses takes about 13 hours.

Data Availability and Provenance Statements

Statement about Rights

We certify that the author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.

Summary of Availability

Some data cannot be made publicly available.

The data source for this paper is a survey of German firms, the German Business Panel (GPB). The GPB data are subject to the General Data Protection Regulation (GDPR), which prevents us from making the data available online in case of publication. In addition, Moody’s Orbis database, which is also utilized in this project, is proprietary data that cannot be made publicly available.

Nevertheless, we aim to be as transparent as possible and can provide on-site or controlled remote data access to the original data for researchers interested in replicating our findings. Access requires signing a data use agreement, but we make the process as convenient as possible.

Details on each Data Source

Data.Name Data.Files Location Provided Citation
Municipal Scaling Factors 2003-2022 Hebesätze Ausgabe YYYY.xlsx Data/Hebesätze TRUE SÄBL (2022)
Verwaltungsgebiete 1:250 000 Hebesätze Ausgabe YYYY.xlsx Data/geo_daten TRUE BKG (2022)
Municipal Codes 1970-2022 DDMMYYYY_Auszug_GV.xlsx Data/AGS/ TRUE Destatis (2022)
GBP Wave 2 Welle2_Safehouse_C01_only_2022-09-23.dta Data/ FALSE GBP (2022)
Moody’s Orbis All_addresses.txt; Industry-Global_financials_and_ratios-EUR.txt Data/ FALSE Moody’s (2024)
OSM (2025) addresses_ags.parquet Data/ TRUE OSM (2024)

Municipal Scaling Factors 2003-2022

The data on municipal scaling factors were downloaded from the website of the Federal and State Statistical Offices (FSSO, 2022) and can be downloaded here https://www.statistikportal.de/de/veroeffentlichungen/hebesaetze-der-realsteuern-deutschland. A copy of the data is provided as part of this archive. The data are in the public domain.

Verwaltungsgebiete 1:250 0000

The spatial data on municipalities were obtained by the Federal Ministry of Cartography and Geodesy (BKG, 2022) and can be downloaded here https://gdz.bkg.bund.de/index.php/default/open-data/verwaltungsgebiete-1-250-000-stand-01-01-vg250-01-01.html. A copy of the data is provided as part of this archive. The data are in the public domain.

Municipal Codes 1970-2022

The data on municipal codes were downloaded from the website of the Federal Statistical Office (Destatis, 2022) and can be downloaded here https://www.destatis.de/EN/Themes/Countries-Regions/Regional-Statistics/_node.html. A copy of the data is provided as part of this archive. The data are in the public domain.

GBP Wave 2

The experimental data used in this paper were fielded in the second wave of the German Business Panel (GBP). The codebook for this survey wave is available here https://backend.gbpanel.org/app/uploads/2022/11/Codebook_Welle2_2022_01_17_v3_2.pdf. Screenshots of the survey questions are provided in the Online Appendix of the paper. The data are confidential, but may be obtained with Data Use Agreements with the GBP. Researchers interested in access to the data may contact the GBP at gbpinfo@uni-mannheim.de, also see https://gbpanel.org/page/data for the user agreement and the declaration of commitment. It can take some time to negotiate data use agreements and gain access to the data. The authors will assist with any reasonable replication attempts for two years following publication.

Moody’s Orbis

The financial and address data used in this paper were sourced from Moody’s (2024) Orbis database. The data are confidential, but may be purchased from Moody’s. Researchers interested in access to the data may contact Moody’s via their website https://www.moodys.com/web/en/us/capabilities/company-reference-data/orbis.html. It can take some time to negotiate data use agreements and gain access to the data. The authors will assist with any reasonable replication attempts for two years following publication.

OpenStreetMap

We used the OpenStreetMap API (OSM, 2025) to obtain the spatial coordinates of firm addresses in Orbis. The API was contacted through the tidygeocoder::geocode() function.

Dataset list

The following datasets are derived from the raw Orbis flatfiles and can be provided to a replicator in a controlled environment to reduce compute time.

Data file Source Notes Provided
data/unique_addresses.parquet OSM (2025) Yes
data/addresses_all.parquet Moody’s (2024) Confidential, can be provided in controlled environment No
data/financials_cleaned.parquet Moody’s (2024) Confidental, can be provided in controlled environment No

Summary

Approximate time needed to reproduce the analyses on a standard (CURRENT YEAR) desktop machine: 35 minutes.

Parts of the code involves transforming large text files and downloading geolocations via an API. If these steps are to be performed, the data transformation will require a larger amount of memory than standard desktop machines offer. In addition, the communication with the API for geo-referencing the addresses increases compute time by 13 hours.

Description of programs/code

  • The folder Data contains the data.
    • The subfolder AGS contains the municipality codes.
    • The subfolder geo_daten contains the shape files.
    • The subfolder Hebesätze contains the municipal scaling factors.
  • The folder Scripts contains the R scripts and Stata dofiles.
  • The folder Figures contains all Figures produced by the code.
  • The folder Tables contains all Tables produced by the code.
  • As many of the figures and tables of the paper are generated programmatically, the mapping to the paper figure and table names are stored in the folder Results.
  • The renv folder contains the R script activate.R which is used by the renv package to activate the library dependency management system..
  • The program Scripts/00_Master.R sets important parameters, installs and loads the necessary dependencies and runs all scripts and dofiles in the correct order.
  • The program Scripts/01_Orbis_Financials.R prepares the flatfile of the Orbis Financial data.
  • The program Scripts/02_Orbis_Addresses.R prepares the flatfile of the Orbis Address data.
  • The program Scripts/03_Survey_Data_Prep.R prepares survey data and creates Figures A1, A10, and A11.
  • The program Scripts/04_Preparation.do conducts the necessary steps to prepare the data for the computation of survey weights.
  • The program Scripts/05_Raking.do implements the raking algorithm.
  • The program Scripts/06_Trimming.R trims the survey weights
  • The program Scripts/07_Merge_Survey_Weights.do adds the survey weights to the survey data.
  • The program Scripts/08_Descriptives.R creates Tables 2 and 3, and Figures A12 and A13.
  • The program Scripts/09_Main_Analysis.R creates Tables 4, A4, and A5, and Figures 1, 2, 3, 4, 5, A14, A15, A16, A17, and A18.
  • The program Scripts/10_Robustness_fmlogit.do creates Table A6.
  • The program Scripts/11_Levy_Rates.R prepares the data on municipal scaling factors.
  • The program Scripts/12_Municipality_Links.R prepares the data on municipality and zip codes.
  • The program Scripts/13_Geocode_Address_Data.R geolocates the Orbis address data.
  • The program Scripts/14_Municipal_Level_Data.R combines the municipal scaling factors with the municipality codes and the zip codes.
  • The program Scripts/15_Merge_Data.R ccombines the financial information with the address data and the municipal-level data.
  • The program Scripts/16_Firm-level_Associations.R creates Tables 6, 7, 8, 9, and 10.
  • The program Scripts/17_Proxy_Test_Covid.R creates Table 11.
  • The program Scripts/18_Comparison_Sample_Orbis.do creates Table A2 and A3.
  • The program Scripts/18_Comparison_Sample_Orbis.do creates Table A1.
  • The program Scripts/20_Collect_Figs_Tabs.R copies the Figures and Tables to the Result directory and assigns their names in the paper.

Instructions to Replicators

  1. Ensure the correct R-version (4.3.1) is used.
  2. Ensure that RTools is installed.
  3. Enter the computing environment via the .rproj file.
  4. Set up renv using renv::activate() to enable library management system and renv::restore() to install the necessary dependencies.
  5. Ensure the location of the Stata executable is set correctly in the script Scripts/00_Master.R.
  6. If derived data (see section Dataset list) is available, comment out section “Prepare Orbis Flat Files” in Scripts/00_Master.R and section “Geocode zip code + city” in Scripts/14_Geocode_Address_Data.R.
  7. Run Scripts/00_Master.R.

List of tables and programs

Figure/Table # Program Output file (Program) Output file (Mapping) Note
Table 1 n.a. (no data)
Table 2 08_Descriptives.R Tables/descriptives.tex Results/Tables/table2.tex Requires confidential data
Table 3 08_Descriptives.R Tables/share_table.tex Results/Tables/table3.tex Requires confidential data
Table 4 09_Main_Analysis.R Tables/unweighted/no_controls/main_spec_levels.tex Results/Tables/table4.tex Requires confidential data
Table 5 n.a. (no data)
Table 6 16_Firm-level_Associations.R Tables/validation_rev.tex Results/Tables/table6.tex Requires confidential data
Table 7 16_Firm-level_Associations.R Tables/validation_num_emp.tex Results/Tables/table7.tex Requires confidential data
Table 8 16_Firm-level_Associations.R Tables/validation_lbt.tex Results/Tables/table8.tex Requires confidential data
Table 9 16_Firm-level_Associations.R Tables/overview_tax_changes.tex Results/Tables/table9.tex Requires confidential data
Table 10 16_Firm-level_Associations.R Tables/hypo_vs_real.tex Results/Tables/table10.tex Requires confidential data
Table 11 16_Firm-level_Associations.R Tables/proxy_test_covid.tex Results/Tables/table11.tex Requires confidential data
Table A1 19_Comparison_Sample_PopulationGermany.do Tables/comparison_xample_pop_<revenues,employees,sector>.tex Results/Tables/tablea1.tex Requires confidential data
Table A2 18_Comparison_Sample_Orbis.do Tables/comparison_sample_orbis_fy2019.tex Results/Tables/tablea2.tex Requires confidential data
Table A3 18_Comparison_Sample_Orbis.do Tables/comparison_link_nolink.tex Results/Tables/tablea3.tex Requires confidential data
Table A4 09_Main_Analysis.R Tables/unweighted/controls/main_spec.tex Results/Tables/tablea4.tex Requires confidential data
Table A5 09_Main_Analysis.R Tables/unweighted/controls/main_spec_controls.tex Results/Tables/tablea5.tex Requires confidential data
Table A6 10_Robustness_fmlogit.do Tables/marginal_effects_fmlogit.csv Results/Tables/tablea6.csv Requires confidential data
Figure 1 09_Main_Analysis.R Figures/unweighted/no_controls/main_pointrange_unweighted.pdf Results/Figures/figure1.pdf Requires confidential data
Figure 2a 09_Main_Analysis.R Figures/unweighted/no_controls/heterogeneity_unweighted_size_category_decrease.pdf Results/Figures/figure2a.pdf Requires confidential data
Figure 2b 09_Main_Analysis.R Figures/unweighted/no_controls/heterogeneity_unweighted_size_category_increase.pdf Results/Figures/figure2b.pdf Requires confidential data
Figure 2c 09_Main_Analysis.R Figures/unweighted/no_controls/heterogeneity_margins_unweighted_size_category.pdf Results/Figures/figure2c.pdf Requires confidential data
Figure 3a 09_Main_Analysis.R Figures/unweighted/no_controls/heterogeneity_unweighted_sector_decrease.pdf Results/Figures/figure3a.pdf Requires confidential data
Figure 3b 09_Main_Analysis.R Figures/unweighted/no_controls/heterogeneity_unweighted_sector_increase.pdf Results/Figures/figure3b.pdf Requires confidential data
Figure 3c 09_Main_Analysis.R Figures/unweighted/no_controls/heterogeneity_margins_unweighted_sector.pdf Results/Figures/figure3c.pdf Requires confidential data
Figure 4a 09_Main_Analysis.R Figures/unweighted/no_controls/heterogeneity_unweighted_legal_form_decrease.pdf Results/Figures/figure4a.pdf Requires confidential data
Figure 4b 09_Main_Analysis.R Figures/unweighted/no_controls/heterogeneity_unweighted_legal_form_increase.pdf Results/Figures/figure4b.pdf Requires confidential data
Figure 4c 09_Main_Analysis.R Figures/unweighted/no_controls/heterogeneity_margins_unweighted_legal_form.pdf Results/Figures/figure4c.pdf Requires confidential data
Figure 5a 09_Main_Analysis.R Figures/unweighted/no_controls/heterogeneity_unweighted_profit_impact_decrease.pdf Results/Figures/figure5a.pdf Requires confidential data
Figure 5b 09_Main_Analysis.R Figures/unweighted/no_controls/heterogeneity_unweighted_profit_impact_increase.pdf Results/Figures/figure5b.pdf Requires confidential data
Figure 5c 09_Main_Analysis.R Figures/unweighted/no_controls/heterogeneity_margins_unweighted_profit_impact.pdf Results/Figures/figure5c.pdf Requires confidential data
Figure A1 03_Survey_Data_Prep.R Figures/progress_report.pdf Results/Figures/figurea1.pdf Requires confidential data
Figure A2 n.a. (no data)
Figure A3 n.a. (no data)
Figure A4 n.a. (no data)
Figure A5 n.a. (no data)
Figure A6 n.a. (no data)
Figure A7 n.a. (no data)
Figure A8 n.a. (no data)
Figure A9 n.a. (no data)
Figure A10 03_Survey_Data_Prep.R Figures/missing_categories_decrease.pdf Results/Figures/figurea10.pdf Requires confidential data
Figure A11 03_Survey_Data_Prep.R Figures/missing_categories_increase.pdf Results/Figures/figurea11.pdf Requires confidential data
Figure A12 08_Descriptives.R Figures/balance_tests.pdf Results/Figures/figurea12.pdf Requires confidential data
Figure A13 08_Descriptives.R Figures/reasons_investment.pdf Results/Figures/figurea13.pdf Requires confidential data
Figure A14 09_Main_Analysis.R Figures/comparison_specifications.pdf Results/Figures/figurea14.pdf Requires confidential data
Figure A15 09_Main_Analysis.R Figures/unweighted/controls/heterogeneity_margins_unweighted_Size Rev.pdf Results/Figures/figurea12.pdf Requires confidential data
Figure A16 09_Main_Analysis.R Figures/unweighted/controls/heterogeneity_margins_unweighted_Sector.pdf Results/Figures/figurea12.pdf Requires confidential data
Figure A17 09_Main_Analysis.R Figures/unweighted/controls/heterogeneity_margins_unweighted_Legal.pdf Results/Figures/figurea12.pdf Requires confidential data
Figure A18 09_Main_Analysis.R Figures/unweighted/controls/heterogeneity_margins_unweighted_Impact On Net Income.pdf Results/Figures/figurea12.pdf Requires confidential data

References

Bundesamt für Kartographie und Geodäsie, 2022. Verwaltungsgebiete 1:250 000 Stand 01.01.2022 (VG250 01.01.). Available at: https://gdz.bkg.bund.de/index.php/default/open-data/verwaltungsgebiete-1-250-000-stand-01-01-vg250-01-01.html. Accessed at 22 August 2024.

Statistisches Bundesamt (Destatis), 2022. Municipality directory (Gemeindeverzeichnis), Territorial status 01/01/1970-12/31/2022. Available at: https://www.destatis.de/EN/Themes/Countries-Regions/Regional-Statistics/_node.html. Accessed 30 September 2024.

Statistische Ämter des Bundes und der Länder (SÄBL), 2022. Hebesätze der Realsteuern in Deutschland. Editions 2003-2022. Available at: https://www.statistikportal.de/de/veroeffentlichungen/hebesaetze-der-realsteuern-deutschland. Accessed at 27 September 2024.

German Business Panel (GBP), 2022. Wave 2: Cost Structure, Accounting Choices, Corona Support Programs, Tax Incidence, Organizational Trust During and Beyond the COVID-19 Crisis. Codebook available at https://backend.gbpanel.org/app/uploads/2022/11/Codebook_Welle2_2022_01_17_v3_2.pdf Request access at https://www.gbpanel.org/page/data. Data version 09/23/2022.

Moody’s, 2024. Orbis database, flatfiles vintage June 2024. More information available at: https://www.moodys.com/web/en/us/capabilities/company-reference-data/orbis.html.

OpenStreetMap (OSM), 2025. Geolocations of German addresses. Obtained via API, accessed at 03/04/2025