Readme Replication of “The Asymmetric Incidence of Business Taxes: Survey Evidence from German Firms”
by Richard Winter, Philipp Doerrenberg, Fabian Eble, Davud Rostam-Afschar, and Johannes Voget.
Overview
This replication package contains the code used to conduct the analysis in “The Asymmetric Incidence of Business Taxes: Survey Evidence from German Firms” using R and Stata. The code is partitioned into 21 scripts and do-files, which execute the full analysis, generating the data for five main figures, 10 appendix figures, 11 main tables, and six appendix tables. The replication process is expected to complete in 17 hours.
Software Requirements
The analysis is conducted using R and Stata, with the following configurations:
- R: Version 4.3.1. RTools is required and must be installed prior to running the analysis. This project uses the renv package for dependency management, see the .lock file for dependency configurations and advice to replicators below.
- Stata: Version 17.
Hardware Requirements
This analysis was performed on a machine with the following specifications:
- CPU: Intel Xeon Gold 6254 CPU @ 3.10GHz
- Memory: 1.00 TB
- Operating System: Windows Server 2019 Standard
Without transforming the Orbis flatfiles into parquet and executing the geo-coding script, the total runtime is about 35 minutes. Reading and writing the financial and address data takes several hours, depending on the hardware. Georeferencing the unique firm addresses takes about 13 hours.
Data Availability and Provenance Statements
Statement about Rights
We certify that the author(s) of the manuscript have legitimate access to and permission to use the data used in this manuscript.
Summary of Availability
Some data cannot be made publicly available.
The data source for this paper is a survey of German firms, the German Business Panel (GPB). The GPB data are subject to the General Data Protection Regulation (GDPR), which prevents us from making the data available online in case of publication. In addition, Moody’s Orbis database, which is also utilized in this project, is proprietary data that cannot be made publicly available.
Nevertheless, we aim to be as transparent as possible and can provide on-site or controlled remote data access to the original data for researchers interested in replicating our findings. Access requires signing a data use agreement, but we make the process as convenient as possible.
Details on each Data Source
| Data.Name | Data.Files | Location | Provided | Citation |
|---|---|---|---|---|
| Municipal Scaling Factors 2003-2022 | Hebesätze Ausgabe YYYY.xlsx |
Data/Hebesätze | TRUE | SÄBL (2022) |
| Verwaltungsgebiete 1:250 000 | Hebesätze Ausgabe YYYY.xlsx |
Data/geo_daten | TRUE | BKG (2022) |
| Municipal Codes 1970-2022 | DDMMYYYY_Auszug_GV.xlsx |
Data/AGS/ | TRUE | Destatis (2022) |
| GBP Wave 2 | Welle2_Safehouse_C01_only_2022-09-23.dta |
Data/ | FALSE | GBP (2022) |
| Moody’s Orbis | All_addresses.txt; Industry-Global_financials_and_ratios-EUR.txt |
Data/ | FALSE | Moody’s (2024) |
| OSM (2025) | addresses_ags.parquet |
Data/ | TRUE | OSM (2024) |
Municipal Scaling Factors 2003-2022
The data on municipal scaling factors were downloaded from the website of the Federal and State Statistical Offices (FSSO, 2022) and can be downloaded here https://www.statistikportal.de/de/veroeffentlichungen/hebesaetze-der-realsteuern-deutschland. A copy of the data is provided as part of this archive. The data are in the public domain.
Verwaltungsgebiete 1:250 0000
The spatial data on municipalities were obtained by the Federal Ministry of Cartography and Geodesy (BKG, 2022) and can be downloaded here https://gdz.bkg.bund.de/index.php/default/open-data/verwaltungsgebiete-1-250-000-stand-01-01-vg250-01-01.html. A copy of the data is provided as part of this archive. The data are in the public domain.
Municipal Codes 1970-2022
The data on municipal codes were downloaded from the website of the Federal Statistical Office (Destatis, 2022) and can be downloaded here https://www.destatis.de/EN/Themes/Countries-Regions/Regional-Statistics/_node.html. A copy of the data is provided as part of this archive. The data are in the public domain.
GBP Wave 2
The experimental data used in this paper were fielded in the second wave of the German Business Panel (GBP). The codebook for this survey wave is available here https://backend.gbpanel.org/app/uploads/2022/11/Codebook_Welle2_2022_01_17_v3_2.pdf. Screenshots of the survey questions are provided in the Online Appendix of the paper. The data are confidential, but may be obtained with Data Use Agreements with the GBP. Researchers interested in access to the data may contact the GBP at gbpinfo@uni-mannheim.de, also see https://gbpanel.org/page/data for the user agreement and the declaration of commitment. It can take some time to negotiate data use agreements and gain access to the data. The authors will assist with any reasonable replication attempts for two years following publication.
Moody’s Orbis
The financial and address data used in this paper were sourced from Moody’s (2024) Orbis database. The data are confidential, but may be purchased from Moody’s. Researchers interested in access to the data may contact Moody’s via their website https://www.moodys.com/web/en/us/capabilities/company-reference-data/orbis.html. It can take some time to negotiate data use agreements and gain access to the data. The authors will assist with any reasonable replication attempts for two years following publication.
OpenStreetMap
We used the OpenStreetMap API (OSM, 2025) to obtain the spatial coordinates of firm addresses in Orbis. The API was contacted through the tidygeocoder::geocode() function.
Dataset list
The following datasets are derived from the raw Orbis flatfiles and can be provided to a replicator in a controlled environment to reduce compute time.
| Data file | Source | Notes | Provided |
|---|---|---|---|
| data/unique_addresses.parquet | OSM (2025) | Yes | |
| data/addresses_all.parquet | Moody’s (2024) | Confidential, can be provided in controlled environment | No |
| data/financials_cleaned.parquet | Moody’s (2024) | Confidental, can be provided in controlled environment | No |
Summary
Approximate time needed to reproduce the analyses on a standard (CURRENT YEAR) desktop machine: 35 minutes.
Parts of the code involves transforming large text files and downloading geolocations via an API. If these steps are to be performed, the data transformation will require a larger amount of memory than standard desktop machines offer. In addition, the communication with the API for geo-referencing the addresses increases compute time by 13 hours.
Description of programs/code
- The folder Data contains the data.
- The subfolder AGS contains the municipality codes.
- The subfolder geo_daten contains the shape files.
- The subfolder Hebesätze contains the municipal scaling factors.
- The folder Scripts contains the R scripts and Stata dofiles.
- The folder Figures contains all Figures produced by the code.
- The folder Tables contains all Tables produced by the code.
- As many of the figures and tables of the paper are generated programmatically, the mapping to the paper figure and table names are stored in the folder Results.
- The renv folder contains the R script
activate.Rwhich is used by the renv package to activate the library dependency management system.. - The program
Scripts/00_Master.Rsets important parameters, installs and loads the necessary dependencies and runs all scripts and dofiles in the correct order. - The program
Scripts/01_Orbis_Financials.Rprepares the flatfile of the Orbis Financial data. - The program
Scripts/02_Orbis_Addresses.Rprepares the flatfile of the Orbis Address data. - The program
Scripts/03_Survey_Data_Prep.Rprepares survey data and creates Figures A1, A10, and A11. - The program
Scripts/04_Preparation.doconducts the necessary steps to prepare the data for the computation of survey weights. - The program
Scripts/05_Raking.doimplements the raking algorithm. - The program
Scripts/06_Trimming.Rtrims the survey weights - The program
Scripts/07_Merge_Survey_Weights.doadds the survey weights to the survey data. - The program
Scripts/08_Descriptives.Rcreates Tables 2 and 3, and Figures A12 and A13. - The program
Scripts/09_Main_Analysis.Rcreates Tables 4, A4, and A5, and Figures 1, 2, 3, 4, 5, A14, A15, A16, A17, and A18. - The program
Scripts/10_Robustness_fmlogit.docreates Table A6. - The program
Scripts/11_Levy_Rates.Rprepares the data on municipal scaling factors. - The program
Scripts/12_Municipality_Links.Rprepares the data on municipality and zip codes. - The program
Scripts/13_Geocode_Address_Data.Rgeolocates the Orbis address data. - The program
Scripts/14_Municipal_Level_Data.Rcombines the municipal scaling factors with the municipality codes and the zip codes. - The program
Scripts/15_Merge_Data.Rccombines the financial information with the address data and the municipal-level data. - The program
Scripts/16_Firm-level_Associations.Rcreates Tables 6, 7, 8, 9, and 10. - The program
Scripts/17_Proxy_Test_Covid.Rcreates Table 11. - The program
Scripts/18_Comparison_Sample_Orbis.docreates Table A2 and A3. - The program
Scripts/18_Comparison_Sample_Orbis.docreates Table A1. - The program
Scripts/20_Collect_Figs_Tabs.Rcopies the Figures and Tables to the Result directory and assigns their names in the paper.
Instructions to Replicators
- Ensure the correct R-version (4.3.1) is used.
- Ensure that RTools is installed.
- Enter the computing environment via the .rproj file.
- Set up renv using
renv::activate()to enable library management system andrenv::restore()to install the necessary dependencies. - Ensure the location of the Stata executable is set correctly in the script
Scripts/00_Master.R. - If derived data (see section Dataset list) is available, comment out section “Prepare Orbis Flat Files” in
Scripts/00_Master.Rand section “Geocode zip code + city” inScripts/14_Geocode_Address_Data.R. - Run
Scripts/00_Master.R.
List of tables and programs
| Figure/Table # | Program | Output file (Program) | Output file (Mapping) | Note |
|---|---|---|---|---|
| Table 1 | n.a. (no data) | |||
| Table 2 | 08_Descriptives.R | Tables/descriptives.tex | Results/Tables/table2.tex | Requires confidential data |
| Table 3 | 08_Descriptives.R | Tables/share_table.tex | Results/Tables/table3.tex | Requires confidential data |
| Table 4 | 09_Main_Analysis.R | Tables/unweighted/no_controls/main_spec_levels.tex | Results/Tables/table4.tex | Requires confidential data |
| Table 5 | n.a. (no data) | |||
| Table 6 | 16_Firm-level_Associations.R | Tables/validation_rev.tex | Results/Tables/table6.tex | Requires confidential data |
| Table 7 | 16_Firm-level_Associations.R | Tables/validation_num_emp.tex | Results/Tables/table7.tex | Requires confidential data |
| Table 8 | 16_Firm-level_Associations.R | Tables/validation_lbt.tex | Results/Tables/table8.tex | Requires confidential data |
| Table 9 | 16_Firm-level_Associations.R | Tables/overview_tax_changes.tex | Results/Tables/table9.tex | Requires confidential data |
| Table 10 | 16_Firm-level_Associations.R | Tables/hypo_vs_real.tex | Results/Tables/table10.tex | Requires confidential data |
| Table 11 | 16_Firm-level_Associations.R | Tables/proxy_test_covid.tex | Results/Tables/table11.tex | Requires confidential data |
| Table A1 | 19_Comparison_Sample_PopulationGermany.do | Tables/comparison_xample_pop_<revenues,employees,sector>.tex | Results/Tables/tablea1.tex | Requires confidential data |
| Table A2 | 18_Comparison_Sample_Orbis.do | Tables/comparison_sample_orbis_fy2019.tex | Results/Tables/tablea2.tex | Requires confidential data |
| Table A3 | 18_Comparison_Sample_Orbis.do | Tables/comparison_link_nolink.tex | Results/Tables/tablea3.tex | Requires confidential data |
| Table A4 | 09_Main_Analysis.R | Tables/unweighted/controls/main_spec.tex | Results/Tables/tablea4.tex | Requires confidential data |
| Table A5 | 09_Main_Analysis.R | Tables/unweighted/controls/main_spec_controls.tex | Results/Tables/tablea5.tex | Requires confidential data |
| Table A6 | 10_Robustness_fmlogit.do | Tables/marginal_effects_fmlogit.csv | Results/Tables/tablea6.csv | Requires confidential data |
| Figure 1 | 09_Main_Analysis.R | Figures/unweighted/no_controls/main_pointrange_unweighted.pdf | Results/Figures/figure1.pdf | Requires confidential data |
| Figure 2a | 09_Main_Analysis.R | Figures/unweighted/no_controls/heterogeneity_unweighted_size_category_decrease.pdf | Results/Figures/figure2a.pdf | Requires confidential data |
| Figure 2b | 09_Main_Analysis.R | Figures/unweighted/no_controls/heterogeneity_unweighted_size_category_increase.pdf | Results/Figures/figure2b.pdf | Requires confidential data |
| Figure 2c | 09_Main_Analysis.R | Figures/unweighted/no_controls/heterogeneity_margins_unweighted_size_category.pdf | Results/Figures/figure2c.pdf | Requires confidential data |
| Figure 3a | 09_Main_Analysis.R | Figures/unweighted/no_controls/heterogeneity_unweighted_sector_decrease.pdf | Results/Figures/figure3a.pdf | Requires confidential data |
| Figure 3b | 09_Main_Analysis.R | Figures/unweighted/no_controls/heterogeneity_unweighted_sector_increase.pdf | Results/Figures/figure3b.pdf | Requires confidential data |
| Figure 3c | 09_Main_Analysis.R | Figures/unweighted/no_controls/heterogeneity_margins_unweighted_sector.pdf | Results/Figures/figure3c.pdf | Requires confidential data |
| Figure 4a | 09_Main_Analysis.R | Figures/unweighted/no_controls/heterogeneity_unweighted_legal_form_decrease.pdf | Results/Figures/figure4a.pdf | Requires confidential data |
| Figure 4b | 09_Main_Analysis.R | Figures/unweighted/no_controls/heterogeneity_unweighted_legal_form_increase.pdf | Results/Figures/figure4b.pdf | Requires confidential data |
| Figure 4c | 09_Main_Analysis.R | Figures/unweighted/no_controls/heterogeneity_margins_unweighted_legal_form.pdf | Results/Figures/figure4c.pdf | Requires confidential data |
| Figure 5a | 09_Main_Analysis.R | Figures/unweighted/no_controls/heterogeneity_unweighted_profit_impact_decrease.pdf | Results/Figures/figure5a.pdf | Requires confidential data |
| Figure 5b | 09_Main_Analysis.R | Figures/unweighted/no_controls/heterogeneity_unweighted_profit_impact_increase.pdf | Results/Figures/figure5b.pdf | Requires confidential data |
| Figure 5c | 09_Main_Analysis.R | Figures/unweighted/no_controls/heterogeneity_margins_unweighted_profit_impact.pdf | Results/Figures/figure5c.pdf | Requires confidential data |
| Figure A1 | 03_Survey_Data_Prep.R | Figures/progress_report.pdf | Results/Figures/figurea1.pdf | Requires confidential data |
| Figure A2 | n.a. (no data) | |||
| Figure A3 | n.a. (no data) | |||
| Figure A4 | n.a. (no data) | |||
| Figure A5 | n.a. (no data) | |||
| Figure A6 | n.a. (no data) | |||
| Figure A7 | n.a. (no data) | |||
| Figure A8 | n.a. (no data) | |||
| Figure A9 | n.a. (no data) | |||
| Figure A10 | 03_Survey_Data_Prep.R | Figures/missing_categories_decrease.pdf | Results/Figures/figurea10.pdf | Requires confidential data |
| Figure A11 | 03_Survey_Data_Prep.R | Figures/missing_categories_increase.pdf | Results/Figures/figurea11.pdf | Requires confidential data |
| Figure A12 | 08_Descriptives.R | Figures/balance_tests.pdf | Results/Figures/figurea12.pdf | Requires confidential data |
| Figure A13 | 08_Descriptives.R | Figures/reasons_investment.pdf | Results/Figures/figurea13.pdf | Requires confidential data |
| Figure A14 | 09_Main_Analysis.R | Figures/comparison_specifications.pdf | Results/Figures/figurea14.pdf | Requires confidential data |
| Figure A15 | 09_Main_Analysis.R | Figures/unweighted/controls/heterogeneity_margins_unweighted_Size Rev.pdf | Results/Figures/figurea12.pdf | Requires confidential data |
| Figure A16 | 09_Main_Analysis.R | Figures/unweighted/controls/heterogeneity_margins_unweighted_Sector.pdf | Results/Figures/figurea12.pdf | Requires confidential data |
| Figure A17 | 09_Main_Analysis.R | Figures/unweighted/controls/heterogeneity_margins_unweighted_Legal.pdf | Results/Figures/figurea12.pdf | Requires confidential data |
| Figure A18 | 09_Main_Analysis.R | Figures/unweighted/controls/heterogeneity_margins_unweighted_Impact On Net Income.pdf | Results/Figures/figurea12.pdf | Requires confidential data |
References
Bundesamt für Kartographie und Geodäsie, 2022. Verwaltungsgebiete 1:250 000 Stand 01.01.2022 (VG250 01.01.). Available at: https://gdz.bkg.bund.de/index.php/default/open-data/verwaltungsgebiete-1-250-000-stand-01-01-vg250-01-01.html. Accessed at 22 August 2024.
Statistisches Bundesamt (Destatis), 2022. Municipality directory (Gemeindeverzeichnis), Territorial status 01/01/1970-12/31/2022. Available at: https://www.destatis.de/EN/Themes/Countries-Regions/Regional-Statistics/_node.html. Accessed 30 September 2024.
Statistische Ämter des Bundes und der Länder (SÄBL), 2022. Hebesätze der Realsteuern in Deutschland. Editions 2003-2022. Available at: https://www.statistikportal.de/de/veroeffentlichungen/hebesaetze-der-realsteuern-deutschland. Accessed at 27 September 2024.
German Business Panel (GBP), 2022. Wave 2: Cost Structure, Accounting Choices, Corona Support Programs, Tax Incidence, Organizational Trust During and Beyond the COVID-19 Crisis. Codebook available at https://backend.gbpanel.org/app/uploads/2022/11/Codebook_Welle2_2022_01_17_v3_2.pdf Request access at https://www.gbpanel.org/page/data. Data version 09/23/2022.
Moody’s, 2024. Orbis database, flatfiles vintage June 2024. More information available at: https://www.moodys.com/web/en/us/capabilities/company-reference-data/orbis.html.
OpenStreetMap (OSM), 2025. Geolocations of German addresses. Obtained via API, accessed at 03/04/2025