This is an openaccess article distributed under the terms of the Creative Commons Attribution License (CC BY). The use, distribution or reproduction in other forums is permitted, provided the original author(s) and the copyright owner(s) are credited and that the original publication in this journal is cited, in accordance with accepted academic practice. No use, distribution or reproduction is permitted which does not comply with these terms.
Inferring causality from observational studies is difficult due to inherent differences in patient characteristics between treated and untreated groups. The randomised controlled trial is the gold standard study design as the random allocation of individuals to treatment and control arms should result in an equal distribution of known and unknown prognostic factors at baseline. However, it is not always ethically or practically possible to perform such a study in the field of transplantation. Propensity score and instrumental variable techniques have theoretical advantages over conventional multivariable regression methods and are increasingly being used within observational studies to reduce the risk of confounding bias. An understanding of these techniques is required to critically appraise the literature. We provide an overview of propensity score and instrumental variable techniques for transplant clinicians, describing their principles, assumptions, strengths, and weaknesses. We discuss the different patient populations included in analyses and how to interpret results. We illustrate these points using data from the Access to Transplant and Transplant Outcome Measures study examining the association between pretransplant cardiac screening in kidney transplant recipients and posttransplant cardiac events.
Randomised controlled trials (RCTs) are the gold standard study design for determining causal associations between clinical interventions and outcomes (
In some situations RCTs are inappropriate or impractical, for example if there are ethical concerns or excessive costs (
When RCTs are impractical, observational data can inform practice. However, as the exposure is not randomly assigned, differences in casemix can occur between exposed and unexposed groups. This generates confounding bias: a situation where the treatment and outcome have a common cause, resulting in a lack of exchangeability between treated and untreated groups. This can result in the association between treatment and outcome differing from the true effect measure (
In kidney transplantation, there is no contemporary RCT examining the utility of screening for asymptomatic coronary artery disease prior to transplant listing. Screening is frequently performed but there is variation in practice between centres, likely influenced by local opinion (
Given these challenges, we use observational data from the Access to Transplant and Transplant Outcome Measures (ATTOM) study (
The propensity score (PS) refers to the predicted probability of an individual receiving a treatment by collapsing measured confounders into a single value, ranging from 0: no probability to 1: absolute probability of them receiving the treatment of interest (
The PS is typically estimated using a logistic regression model specifying the exposure as the dependent variable and measured confounders as independent variables. Measured confounders are those known at baseline that are predictive of both treatment and outcome. Variables that are predictive of treatment but not outcome should not be included as this may increase the variance of the estimated exposure effect (
Once the model has been created, each individual’s PS is generated based on their measured confounders. The score reflects their propensity for receiving the treatment, not whether this actually happened. Two balanced groups with a similar distribution of PS can then be created using matching or weighting techniques. Key features of PS analyses are shown in
Comparison of propensity score and instrumental variable techniques.
Propensity score matching  Propensity score weighting  Instrumental variable  

Assumptions  Positivity  Positivity  Relevance assumption 
Exchangeability/ignorability  Exchangeability/ignorability  Exclusion restriction  
Consistency  Consistency  Independence assumption  
Monotonicity or homogeneity  
Unmeasured confounding  Not eliminated  Not eliminated  Eliminated/reduced 
Study application  Smaller studies or low event rate  Smaller studies or low event rate  Large multicentre studies 
Analysis and interpretation  Patientlevel  Patientlevel  Instrument level e.g. centre, physician 
Causal effect  Average treatment effect on the treated  Average treatment effect  Average treatment effect or local average treatment effect depending on assumptions 
Advantages  Simple to analyse and interpret  Retains data from all patients  Does not require modelling on confounders, minimises unmeasured confounding 
Disadvantages  Exclusion of unmatched patients means results may not be applicable to whole study population  Results can be unstable if extreme weights are present  Analysis assumptions difficult to test Challenging to find suitable instrument 
In propensity score matching, treated and untreated individuals are “paired” based on their PS (
Included subjects in propensity score analyses using matching and weighting techniques.
The matching technique should create two groups with an equal distribution of measured covariates (
Inverse probability weighting (IPW, also known as propensity score weighting) creates a pseudopopulation informed by all patients with a balanced distribution of measured covariates between groups (
Each individual is assigned a “weight” depending on their measured covariates and the treatment they receive. For individuals who receive treatment, their weight is 1/PS, whilst individuals who do not receive treatment have a weight of 1/(1PS). This means individuals receiving an “unexpected” treatment contribute larger weights to the analysis than individuals receiving their “expected” treatment (
PS techniques have several advantages over conventional multivariable regression models. First, conventional multivariable Cox models require around 10 events per covariate to produce a stable estimate, and combining covariates into a single PS is useful when the population is small, event rate is low, or number of covariates is large (
Second, in conventional regression models the treated and untreated groups can systematically differ. This means estimating the effect of treatment on a patient, who would never have been considered for treatment in real life, can be unreliable as the estimation is based on model extrapolations beyond the support of the data. PS matched analyses refer to only those patients who could feasibly exist in either the “treated” or “untreated” group. Whilst PS matched analyses can therefore provide improved realworld results, identifying the population to whom the results are applicable to can be challenging, especially where there is variation in treatment practice between centres.
Third, PS models highlight the limitations within which results should be interpreted. If a large proportion of individuals are unmatched in PS matched analyses, or there are patients with large PS weights in IPW analyses, this signifies poor overlap in covariate distributions between treated and untreated groups and means the likelihood of individuals being allocated to either treatment group is low. As traditional multivariable models extrapolate results to individuals in underrepresented covariate strata, this could lead to bias in effect estimates. PS methods can alert researchers to these issues and highlight the limits within which comparisons of treatment options can be made.
PS assumptions (exchangeability, positivity, and consistency) are described in
In PS matching, unmatched individuals are “lost,” reducing the study size. Individuals with the highest and lowest PS (the “always treated” and “never treated”) are less likely to be matched and are underrepresented in the regression models. Whilst there is no “required” proportion of patients that must be matched, the causal effect is only applicable to matched patients, not the whole study population.
In IPW, data from all participants is retained. However, if individuals contribute large weights to analyses, results may be unstable. There is no consensus on what a “large” weight is, and weight stabilisation is often used to minimise this risk. Some advocate truncating weights to a maximum of 10 for more precise estimates, (
For interested readers, more detailed information on propensity scores can be found at the following references (
Instrumental variable (IV) analyses were developed for economic studies and subsequently adopted in the medical setting. They aim to minimise confounding by indication by examining individuals based on an “instrumental variable”: a variable that influences treatment and has no confounder with the outcome. This allows the IV to be capitalised on as a type of natural randomisation (
To perform IV analyses, the IV is recommended to meet key assumptions (
(1) It must be strongly associated with the exposure (relevance assumption).
(2) It must only affect outcome through its association with the exposure (exclusion restriction).
(3) There must be no unmeasured confounders to the instrumental variable and the outcome (independence assumption).
(4) A fourth assumption is either that of effect homogeneity or effect monotonicity. Effect homogeneity states that the treatment should have a constant effect on the outcome across all individuals. In effect monotonicity, no patients should receive the opposite treatment to expected at all levels of the instrument i.e., at both the instrument to which they were assigned and instrument(s) to which they were not assigned (so called “defier” patients;
A potential IV is initially identified using empirical evidence. The analysis then involves a twostage regression model. As the technique originated in economics this was traditionally two sequential linear regressions using a twostage least squares procedure (
In the second stage, a regression model examines the outcome of interest as the dependent variable, and the “predicted treatment” generated in the first stage is included as an independent variable instead of the received treatment (“predictor substitution” method). This regression can be univariable or multivariable. A multivariable model enables adjustment for potential confounding of the instrumentoutcome relationship. Whilst instrumentoutcome confounding represents a violation of the independence assumption, conditioning on preexposure covariates in the first and second stages of the IV model can reduce the impact of this and also increase the plausibility of the homogeneity assumption. (
As the analysis is performed, potential violations of IV assumptions should be assessed. Results must be interpreted in the context of how likely it is for the assumptions to be met.
(1) Relevance assumption: this is examined using the F statistic and partial Rsquared values. An F statistic under 10 typically is used to identify a weak instrument (
(2) Exclusion restriction: there is no statistical test to definitively confirm that the IV does not influence the outcome other than through treatment allocation. (
(3) Independence assumption. This cannot be tested and is usually argued based on empirical evidence.
(4) Effect monotonicity or homogeneity. These assumptions may be implausible and are complex to define and assess. In effect monotonicity, identifying which compliance group (
Finding a suitable IV can be challenging and large multicentre studies are often required. Ensuring assumptions of the IV are met may not be possible (
When analysing causal inference studies, it is necessary to consider to whom the causal effect is applicable to. Terms used include the “average treatment effect” (ATE), “average treatment effect on the treated” (ATT) and “local average treatment effect” (LATE).
ATE refers to the effect of treatment on the whole population. This is typically estimated by IPW techniques, which include all study participants. ATT refers to the effect of treatment on only those individuals potentially eligible to receive it and is typically estimated by PS matched analyses. In IV analyses, the causal effect depends on whether effect homogeneity or monotonicity hold. If homogeneity is assumed, the estimate refers to the ATE. If monotonicity is assumed, the estimate refers to the LATE. This reflects the effect of treatment on the subgroup of “complier” patients who receive the expected treatment given their instrument (
As the ATE, ATT and LATE refer to different groups of patients, their effect sizes can differ. Differences can aid the interpretation of study findings by providing insights into the effect of treatment on different groups of patients, and do not necessarily signify failure of a technique.
In each of the above analyses, the final regression model that generates the causal effect can either be “marginal” or “conditional.” Models which contain only the treatment (or predicted treatment in the IV analysis) and outcome generate marginal treatment effects. Although the characteristics of treated and untreated individuals should be similar through the PS matching, IPW or IV techniques, generating truly “exchangeable” groups of treated and untreated patients remains difficult. Models which condition on (and hence adjust for) confounders in the final regression may reduce such residual imbalances and generate conditional treatment effects.
The effect sizes from marginal and conditional regression models differ and cannot be directly compared (
To demonstrate the above techniques, a worked example is provided using data from the ATTOM study. ATTOM was designed to examine factors associated with transplantation in the UK, recruiting patients between 2011 and 2013 (
Design of a potential randomised control trial to investigate the utility of cardiac screening prior to kidney transplant listing, and the design of the worked example, highlighting areas of residual bias.
Component  Ideal randomised control trial  Worked example and residual bias 

Eligibility  Individuals with chronic kidney disease being worked up for kidney transplantation  Patients who were recruited to the ATTOM study and received a kidney transplant. Whilst these patients are representative of the UK kidney transplant population, information was not available on all patients who commenced transplant workup and it is not known if results are applicable to this whole population. Selection bias and survivor bias may be present 
Treatment strategies  Receive a cardiac screening test (and any subsequent recommended cardiac intervention) vs. not receive a cardiac screening test prior to kidney transplant listing  Receiving a cardiac screening test (and any subsequent recommended cardiac intervention) as per local standard practice vs. not receiving a screening test prior to kidney transplant listing 
Treatment assignment  Eligible individuals would be randomly assigned to one of the two treatment strategies and would be aware of the treatment which they were assigned to  Patients were selected for screening based on predetermined local protocols or clinical judgement of the medical team. As treatment assignment was not randomised and there were not strict eligibility criteria, inferences are limited to those patients who might be considered for screening, rather than patients who would never or always be screened 
Follow up  Follow up would start at the time of assignment to a treatment strategy (i.e. when randomised to receive cardiac screening or not) and would continue for a set period of time over which some patients would be activated on the waitlist and receive a transplant. This is likely to require long follow up, for example 3–5 years  Follow up started at the point of kidney transplantation and was for up to 5 years. This start point was chosen as the date transplant workup commenced was unknown, and data were not available on patients who commenced workup but were not waitlisted. This risks survival bias as all patients survived until the point of transplantation. Further, the misalignment of treatment assignment and follow up start means there could be fundamental differences between patients who are transplanted after screening vs. those transplanted without screening. As screening may not have a uniform effect on individuals unobserved in this study, there is a risk of selection bias 
Primary end point  Posttransplant MACE. The exact time frame posttransplant that should be examined could be debated, but given screening aims to reduce shortterm morbidity and mortality a time frame of around 1 year could be considered  Posttransplant MACE at 90 days, 1 year and 5 years posttransplant. Patients were censored for noncardiac death, therefore estimates refer to the direct effect of screening on MACE and not the total effect of screening on MACE through all causal pathways, including through any effect on noncardiac death 
Secondary end point  Activation on transplant waitlist  Not captured 
Time to waitlisting  
Time to transplantation  
Waitlist MACE  
Patient reported outcomes  
Causal contrast  Intentiontotreat effect—effect of being randomised to screening or no screening, even if offprotocol screening tests were performed  Per protocol effect—effect of adhering to the treatment strategies over follow up 
Per protocol effect  effect of adhering to the treatment strategy over follow up  
Statistical analysis  Intentiontotreat; consideration would need to be made as to how to analyse patients not transplanted over follow up  Per protocol analysis 
We wished to examine whether cardiac screening reduced posttransplant major adverse cardiac events (MACE). MACE was defined as unstable angina, myocardial infarction, coronary revascularisation, or cardiac death. Data on nonfatal cardiac events were obtained through linkage of the ATTOM dataset with routinely collected hospital data (
Over the study period, 2572 individuals received a transplant. The mean age was 50 years (SD 13) and 61% were male. Ethnicity was White in 76%, Black in 14% and Asian in 9%. There was a history of diabetes in 13% and ischaemic heart disease in 7%. Overall, 51% underwent screening for asymptomatic coronary artery disease with a stress test (exercise tolerance test, stress echocardiogram, myocardial perfusion scan), CT coronary angiogram or invasive coronary angiogram before transplant listing. The proportion of individuals screened across the 18 transplant centres in England ranged from 5%–100% (
Funnel plot demonstrating the number of individuals screened by transplant centre.
Median follow up was 5.0 years (IQR 3.8–5.5), over which time 211 individuals experienced MACE. Median time to MACE was 2.3 years (IQR 1.0–3.7; range 1 day–6.6 years). Over follow up, 227 patients died (8.9%); 40 had a cardiac death that was counted as MACE.
To examine whether screening has a causal effect on MACE at 90 days, 1 year or 5 years posttransplant, Cox regression models were performed using propensity score matching, inverse probability weighting, and instrumental variable analysis techniques.
Noncardiac death is a competing risk for posttransplant MACE, as patients dying of noncardiac causes cannot subsequently develop MACE. The analyses presented in the following section determine the “direct” effect of screening on MACE as patients are censored at noncardiac death, as opposed to the “total” effect of screening on MACE which would include causal pathways involving noncardiac death (
Interpreting direct treatment effects is challenging as they assume an unrealistic situation where competing events do not occur. Further, direct treatment effects have additional causal assumptions such as no unaccounted confounding of the relationship between the competing event (noncardiac death) and outcome of interest (MACE). If there is likely to be a confounding relationship between the censoring event and the outcome of interest, techniques such as inverse probability of censoring weighting may be required to derive valid estimates of the direct treatment effect—such analyses require sufficient data availability for the probability of censoring (i.e., non cardiac death) to be modelled accurately over time (
As the purpose of this paper is to demonstrate the application of different causal inference techniques, for pragmatic reasons the following analyses represent the direct effect of screening on MACE. Information on competing risk analyses, which can navigate this issue by generating total treatment effects, are found at the following references (
To generate the PS, variables deemed to potentially relate to screening and MACE were determined and included in a logistic regression model. These comprised: age, sex, ethnicity, socioeconomic status, smoking status and history of ischaemic heart disease, diabetes, cerebrovascular disease, and peripheral vascular disease. Transplant centre was not included as it should not independently associate with MACE, would prevent us capitalising on variation in practice to create groups screened and unscreened patients, and could result in violation of the positivity assumption (
As the proportion of screened and nonscreened individuals was roughly equal, PS matching was performed on a 1:1 basis without replacement using a caliper of 0.2 times the standard deviation of the logit of the propensity score. Matching was possible in 1760 individuals. The distribution of the PS before and after matching is shown in
Characteristics of screened and unscreened groups across the whole population and in propensity score matched and unmatched groups, followed by characteristics by centre screening use: low volume of screening (<25% of transplant patients screened;
Association between screening and posttransplant MACE at 90 days, 1 year and 5 years using propensity score matching, weighting and instrumental variable techniques.
Association between screening and MACE at 90 days posttransplant 14 events in PS matched group, 23 events in whole population  

Method and treatment effect  HR  95% CI 

Hazard ratio with 95% confidence interval 
PS match marginal  0.75  0.33–1.72  0.50 

IPW marginal  0.93  0.45–1.89  0.83  
IV marginal  2.91  0.82–10.33  0.10  
PS match conditional  0.80  0.31–2.05  0.64  
IPW conditional  0.95  0.44–2.05  0.90  
IV conditional  1.37  0.29–6.55  0.69  


PS match marginal  1.14  0.56–2.31  0.72 

IPW marginal  1.30  0.77–2.20  0.33  
IV marginal  4.18  1.79–9.76  0.001  
PS match conditional  1.12  0.51–2.47  0.77  
IPW conditional  1.28  0.72–2.26  0.40  
IV conditional  1.85  0.65–5.29  0.25  


PS match marginal  1.31  0.85–2.03  0.22 

IPW marginal  1.39  0.94–2.06  0.10  
IV marginal  3.19  2.09–4.87  <0.001  
PS match conditional  1.31  0.86–1.99  0.20  
IPW conditional  1.38  1.00–1.90  0.05  
IV conditional  1.21  0.72–2.02  0.48 
CI, confidence interval; HR hazard ratio; IV, instrumental variable; PS, propensity score; IPW, inverse probability weighting. Multivariable includes variables used to estimate the propensity score in the outcome regression model.
For IPW, inverse probability of treatment weights were calculated. Weights were stabilised by multiplying them by the proportion of individuals who underwent screening in the exposed group, and proportion of individuals who did not undergo screening in the unexposed group (
In total 2502 individuals were examined in the IPW analysis; 70 individuals were excluded due to missing data in variables used to generate the PS. Cox regression models were performed incorporating the IPW (
It is important to note that these results represent a complete case analysis, as the 70 individuals with missing data were excluded. Complete case analyses assume data are missing completely at random, though other missing data mechanisms and their potential implications need to be considered (
Transplant centre is determined by geographical location so is largely randomly allocated. We determined centre had the potential to be an IV as it (at least partly) met the following assumptions (
(1) Relevance assumption: the likelihood of undergoing screening is associated with transplant centre (
(2) Exclusion restriction: this assumption cannot be guaranteed as there could be nonscreening differences in centrelevel practice that influence outcome, e.g., use of medical therapy, but this would not be expected given there is national guidance on cardiovascular risk management (
(3) Independence assumption: this assumption cannot be proven, as acknowledged in IV literature. Whilst it may be assumed that if measured confounders are balanced across IV groups, unmeasured confounders will be too, this is purely speculative.
(4) Homogeneity or monotonicity. Screening may not have a uniform effect on individuals, for example it could benefit those with high cardiovascular risk but not low risk patients, thus violating homogeneity. Monotonicity (no patients receiving the opposite treatment to what would be expected at any level of the instrument) may be more likely to hold as patients receive screening based on defined protocols at their transplant centre. This assumption however cannot be proven and defining the four compliance types (
Patient characteristics based on the prevalence of screening pretransplant by centre. The KruskallWallis test was used to examine continuous variables and the Chi square test for categorical variables.
Percentage of individuals screened by centre  

<25% 4 centres 
25%–49% 5 centres 
50–74% 6 centres 
≥75% 3 centres 


Median age (years)  50 (40–60)  50 (41–59)  52 (40–60)  52 (42–62)  0.22 
Male sex (%)  58.8  61.5  63.6  58.2  0.17 
White ethnicity (%)  64.7  78.6  72.9  86.3  <0.001 
IMD quintile 1 (%)  27.1  28.0  23.0  13.6  <0.001 
Diabetic nephropathy (%)  23.2  22.0  23.9  23.8  0.29 
Diabetes (%)  14.2  12.5  14.4  10.2  0.12 
Ischaemic heart disease (%)  6.3  6.2  8,8  7.7  0.20 
Peripheral vascular disease (%)  2.6  2.0  2.9  2.0  0.56 
Cerebrovascular disease (%)  2.6  4.0  5.4  4.8  0.09 
Preemptive transplant (%)  20.9  20.9  24.1  20.7  0.34 
Propensity score techniques
• Comparison of outcomes in recipients receiving a living versus standard criteria deceased donor kidney transplant (
• Comparison of outcomes in donation after brainstem death and donation after cardiac death donors in liver transplantation (
• Association between immunosuppression regime (triple or quadruple therapy) in heart transplant recipients and death and rejection episodes (
Instrumental variable techniques
• Association between dialysis duration and patient outcome following kidney transplantation, using blood group as an instrumental variable (
• Examining whether delayed graft function is associated with long term outcomes after kidney transplantation using cold ischaemic time as an instrumental variable (
• Comparison of deceased and living organ donation rates in countries with an optin and optout policies using legal system and nonhealth based philanthropy as instrumental variables (
In the first stage, a linear regression containing potential confounders of the treatmentoutcome relationship (deemed to be those used to create the PS) and transplant centre was used to predict the likelihood of an individual undergoing screening. Linear regression was selected for this analysis as opposed to logistic regression as described in IV literature (
The first stage generated a predicted value, representing the likelihood of each individual being screened. The F statistic was 70 and the partial Rsquared value was 0.33, indicating centre was a strong IV.
In the second stage, univariable and multivariable Cox regression models were performed using the predicted value from the first stage (predictor substitution method). This step can be considered as including the proportion of patients screened by centre as a patient characteristic, rather than whether each individual was screened. The multivariable model included the same confounders used to create the PS as these were deemed to potentially confound both the instrumentoutcome and treatmentoutcome relationship, and therefore including these confounders makes the independence assumption more likely to hold. Screening did not reduce MACE in the conditional model at 90 days (conditional HR 1.37, 95% CI 0.29–6.55), 1 year (conditional HR 1.85, 95% CI 0.65–5.29) or 5 years (conditional HR 1.21, 95% CI 0.72–2.02). These results reflect the LATE: the causal effect of screening on the ‘complier’ patients in the population.
Results from PS matched, IPW and IV analyses are shown in
In the PS matched analysis, the results are only applicable to 1760 transplant recipients with lowmedium baseline risk of MACE, not the overall population. The 812 individuals excluded from the analysis were more likely to be male, of Asian ethnicity, have a history of cardiovascular disease and be of a lower socioeconomic status and thus have the greatest baseline cardiovascular risk. Whilst these results suggest no benefit to screening, this cannot be directly applied to these highest risk patients.
The IPW analysis includes all patients and represents the whole transplanted population. Similar findings were observed to the PS matched analysis at 90 days and 1 year. At 5 years, there was weak evidence that individuals who had undergone screening were more likely to experience MACE in the conditional model but it should be noted that this analysis did not meet the Cox proportionality assumption.
In the IV analysis, screening did not reduce MACE on conditional analyses with a hazard ratio above 1 throughout, suggesting “complier” screened individuals had a higher risk of MACE than complier nonscreened individuals, although confidence intervals were extremely wide. Given these results represent the LATE, it is not known whether the effect of screening on noncomplier patients differs. Whilst the IV technique minimises unmeasured confounding, these results raise the possibility that unmeasured patient level characteristics associate with centre and outcome (i.e., clinicians screen their patients as they see their population as being inherently higher risk), or there are unmeasured differences in centre level practice, e.g., use of medical therapy that could bias results. Alternatively, it is possible that the PS matched and IPW analyses are prone to bias due to unmeasured confounding, and the IV analysis provides a result that is closer to the truth. Some studies suggest IV techniques provide less biased results than PS analyses, (
The marginal hazard ratios presented in
Whilst the causal inference techniques applied to our worked example reduce confounding by indication, other forms of bias remain (
Propensity score and instrumental variable techniques reduce confounding in observational studies and are suited to areas where treatment decisions vary with clinician or facility preference. Whilst RCTs minimise confounding through the random allocation of treatment, results may not be generalisable if the individuals recruited to a trial are not representative of the population of interest, e.g., if individuals with less severe disease who are “lower risk” or with more severe disease who have “most to gain” are preferentially recruited. Population observational data allows all patients within clinical practice to be examined, but treatment effects from causal inference techniques still may not be applicable to the whole population due to limited overlap in confounder distributions between patient groups. Techniques deal with this issue in different ways. For example, in PS matching patients are excluded from analyses if a “suitable” match cannot be found. In IPW analyses, the presence of large weights can highlight instances where regression adjustment would result in the model being extrapolated to groups with little or no overlap in confounder distribution. Whilst large weights can make the ATE estimate unstable and results in wide confidence intervals, IPW techniques provide an “honest” reflection of the uncertainty in the estimate which might be underestimated in regression adjustment. Causal effects from each technique therefore permit inferences on different populations, which is important when interpreting study results.
Our case study demonstrates how causal inference techniques can estimate comparative effectiveness of interventions using observational data, but don’t eliminate all forms of bias and may still not allow firm conclusions to be drawn. Differences in results may reflect the different populations the estimates are applicable to, the presence of unmeasured confounding, or imperfections in the instrument. It is difficult to know which analysis provides the closest result to the “true” estimate, and results should be interpreted in the context of the limitations of each method.
Despite these challenges, the unique issues in performing RCTs in transplantation, combined with the increase in size and granularity of routine healthcare datasets are likely to result in wider use of propensity score and instrumental variable techniques. Examples of transplantation studies using these techniques are shown in
AN performed the analyses, produced the figures and tables and wrote the manuscript under the supervision of RR, DT, and JF. JF contributed to study design, statistical analyses and manuscript preparation. NL contributed to statistical analyses and manuscript preparation. GO, RR, and DT contributed to study design and manuscript preparation.
AN, GO, RR, and DT received funding from the National Institute for Health Research (NIHR) under the Programme Grants for Applied Research scheme (RPPG010910116) for completion of the ATTOM study. This paper presents research from the Access to Transplantation and Transplant Outcome Measures (ATTOM) study which was funded by the National Institute for Health Research (NIHR).
The views expressed are those of the authors and not necessarily those of the NHS, the NIHR or the Department of Health.
JF received personal fees from Fresenius Medical Care and grants from Vifor Pharma and Novartis outside the submitted work. NL received personal fees from Pierre Fabre, Merck, Sharp & Dohme, Vertex, Ferring, and Portola; and nonfinancial support from Amgen (provision of data to aid methodological research) outside the submitted work.
The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.
Many thanks to the ATTOM research team, the research nurses and to the patients in the study.
The Supplementary Material for this article can be found online at:
ATE, average treatment effect; ATT, average treatment effect on the treated; ATTOM, Access to Transplant and Transplant Outcome Measures Study; CI, confidence interval; HR, Hazard ratio; IMD, index of multiple deprivation; IV, instrumental variable; IPW, inverse probability weighting (using propensity scores); IQR, interquartile range; LATE, local average treatment effect; MACE, major adverse cardiac event; PS, propensity score; RCT, randomised controlled trial; SD, standard deviation.
a value ranging between 0 and 1 that summaries the likelihood of an individual receiving a treatment based on their measured covariates
process through which individuals in treated and untreated groups are matched to each other based on their propensity score. This can be done on a 1:1 (1 patient in the untreated group matched to 1 treated individual) or manytoone (many patients in the untreated group matched to 1 treated individual) basis
once an individual from the untreated group has been matched, they cannot be used as a comparator for any further treated individuals
an individual in the untreated group can be used as a match for more than 1 treated individual. Useful if the number of untreated individuals is small.
matching process which pairs treated and untreated individuals based on them having the closest propensity scores, irrespective of whether the untreated individual is a better match for another treated individual.
matching process which aims to minimise the difference in propensity scores between pairs across the whole population. May be preferred over nearest neighbour matching if the proportion of untreated individuals in the population is small.
technique which weights individuals based on their propensity score to create a pseudopopulation with balanced measured covariates in treated and untreated groups
a variable that is causally associated with the exposure, only affects outcome through its association with that exposure, and has no confounders with the outcome. Allows individuals to be examined based on the instrument to minimise the risk of unmeasured confounding.