Lightbox Design

Following on the background explanation on the difference between Lux, Lumens and other Output Measures, the next question is how to measure these things.

A few key points to keep in mind:

  • Lux is the measurement of the maximum output from the brightest part of the beam (the center), measured in candelas. Lux is directly affected by the focusing optics and reflector design.
  • Lumens is the measurement of the total, overall output as typically measured in an integrating sphere. So, all of the light is measured no matter where it goes.
  • Depending on the optics, two lights could have identical outputs, but widely different center beam Lux.

Measuring Lux is easy – all you need is a properly calibrated light meter, measured at an appropriate distance to ensure maximum focusing (see my Testing Standards page more info). For many years now, I have had a NIST-certified light meter that I use for this purpose.

Lumens are another matter altogether.

Given that a properly-calibrated integrating sphere will easily run you >$30,000 today, true lumen measures are beyond the reach of your typical enthusiast. It is certainly possible to try and build a reasonable IS facsimile on the cheap – for example, one of the DIY coated hollow styrofoam sphere designs you can find on the internet. You could then try to calibrate it by shipping some of your lights out to be professionally tested, and see if you can then develop your own calibration standard by comparison. And if I were starting from scratch today, that’s exactly what I would do.

But I’m not starting from scratch. Instead, I first designed my lightbox based on the example of one of the leading flashlight reviewers when I got started; Doug P. of the now defunct (aka Quickbeam on CPF). He came up with a milk carton lightbox that actually served pretty well for quick-and-dirty relative output measures. As an aside, I recall Doug took a lot of complaints from IS purists about this approach – a point I’ll come back to at the end.

I took Doug’s idea a step further (and even a step cheaper). For example, I didn’t have a calibrated Lux light meter at that time – but I did have a decent data-logging digital multimeter. So I created my own light meter. I bought a dozen uncalibrated CdS photoresistive cells at Radio Shack for $5, and then picked the one that showed a linear relationship between resistance and light intensity over the widest dynamic range. Using some left-over bits of speaker cable, I wired it to my data-logging DMM and Bob’s your uncle – one relative output measuring lightbox at your service. I didn’t have an explicit calibration standard at that time (I developed one later, see below), so I just used a linear correction factor that gave me relative output values that fairly closely matched Doug’s readings.

One advantage is that my lightbox design is permanently mounted (i.e., the sensor never moves). I also maintain an ongoing internal calibration of the sensor using a standardized set of lights, tested at regular intervals. Fun fact – the sensor actually drifts out of alignment at a slow linear rate that I can easily correct for. From inception, I can confirm that there is ~1% drift per year, which I correct for on a monthly basis. So the relative output values I reported over the years were entirely consistent.

My lightbox also differs in one important way – I’ve reversed the light and sensor placement, to facilitate runtimes. In my case, the flashlight enters the flat bottom of the milk carton, and the sensor is located on the side of the carton near the base. The bezel of the flashlight thus serves as its own baffle, preventing any light from shining directly on the sensor.

As Doug noted, a milk carton is hardly a perfect integrating sphere. But it doesn’t have to be – as long as you realize the results are simply relative output values, you can still draw meaningful comparisons between lights. The point is that the internal standard for my lightbox (and all relative output runtime graphs) were tightly controlled and monitored so that results were directly comparable. If I were to run those lights in my lightbox again (with the current updated calibration), the graphs would be pretty much indistinguishable (I know, because I did this periodically to confirm the calibration).

As to the actual lumen estimate, that is a different matter – I have never made any claim to itheir absolute value accuracy. The method I have used to adjust my internally-consistent calibrated lightbox values to estimated lumens (and note that I always refer to them as such) is based on a statistical relationship developed from an extensive series of comparisons, described in detail further down this page.

The point is that the relative value accuracy of my measures remains remarkably high. So, for example, if I estimate one light at 300 lumens and another at 360 lumens, you can feel fairly comfortable with the conclusion that the second light is indeed about 20% brighter. But whether or not that is really 250 and 300 lumens (or 350 and 420 lumens, etc, etc.) I cannot say with any certainty. For that, I am relying on all the results of the ~150 or comparison points in the analysis below.

Frankly, there is no way anyone without a properly maintained, properly-sized, NIST-certified, calibrated integrating sphere – used under controlled conditions by a knowledgeable and skilled operator – can assert true, absolute accurate lumens. However, as you will see in the analysis below, I think I have gone to more effort than most in trying to make my estimates as good as they can be. That said, I typically find that my estimated lumens are a bit higher than most other enthusiasts estimates, suggesting my calibration standard is biased slightly upwards. But I have stuck with it due to the compelling correlation, and the importance of backward consistency to previous reviews.

In any case, I still make no claim to lumen estimate accuracy. But the runtime graphs remain a well-calibrated and internally-consistent relative set of results from my testing, using only new batteries properly examined for relative performance.

How to calculate Estimated Lumen values from my lightbox relative output values

As mentioned above, I decided to see if I could convert my relative output measures back in 2010, using a correlation to other integrating sphere (IS) measures.

Below is a graph showing how lights in my lightbox correlated to the reported IS values by three CPF members (MrGman, ti-force, and bigchelis) and three manufacturers (Fenix, 4Sevens and Novatac) at the time. Each data point represented one output mode of a given light we have in common. I had matched the reported batteries and time post-activation for the lights in question (if multiple time points were available, I picked the last one we both had in common). There are about 150 unique data points in the total set, representing over 40 lights.


If my lightbox were a calibrated IS, you would expect to see a perfectly linear relationship as shown by the diagonal line. Obviously, it wasn’t.

But the relationship does look somewhat linear, just not 1:1. This is why some people used a rough scaling factor for their milk carton lighboxes. But is it really linear? Let’s improve the scale ranges and see:


Ok, clearly there was not a simple linear relationship of my lightbox to any of the IS results (although it isn’t that far off, either). Still, this means that you cannot simply multiply my readings by a specific number to get a really good lumen estimate (i.e. the classic “y = mx + b” linear relationship with a slope and y-intercept wouldn’t hold here).

But even though the relationship between my box and the reported IS values was not linear, it was most certainly not random. Rather, it was consistently curvilinear. In fact, to my eye, it looked like a simple power relationship (i.e. y = a * x^b).

Before the age of computers, it was certainly a complex problem to try and fit non-linear data. But nowadays, you can do all sorts of comparison modeling of non-linear systems with statistical validation.

I spent some time running analyses of this data set, including multi-order polynomials, and I couldn’t quite get one good relationship that fit the whole range of 0.1 to 800 lumens perfectly. However, I did find two simple power fits that worked well – one for < 20 on my lightbox relative output scale, and one for > 25. Here is how they look:



As you can see, these non-linear curves fit the data set very well. Correlation coefficients (r2) were rather good at 0.96 to 0.97. In fact, the fits were remarkably good considering that we are talking about different light samples, run in 6 different ISs!

As a result, I felt that you could be fairly confident in converting my lightbox relative output values (ROVs) into estimated lumens using the following formulas:

For lightbox readings < 20 ROV, Estimated Lumens = 0.56 * ROV^1.30

For lightbox readings > 25 ROV, Estimate Lumens = 0.28 * ROV^1.48

For values in-between 20 and 25 ROV, I recommended averaging the two methods.

For those of you who have trouble visualizing non-linear data, below are some direct plots of my transformed lumen estimates against the original 6 different IS sources. If the conversions had worked well, the results should be perfectly linear.


My lumen estimates thus seem to correlate to the reported IS measures pretty linearly, wouldn’t you say?

Please note, there are a few caveats here, which I noted at the time of this analysis. First off, I didn’t have a lot of comparison data at the high-output range (i.e. >300 lumens). Thus, I recommended lumen estimates at the high end should be regarded with some degree of skepticism.

Another key point – a milk carton is not really a good integrating sphere! Not surprisingly, I had noticed in my lightbox that really strong throwers with narrow spillbeams typically reported lower values than similarly driven lights with wider spillbeams. Frankly, I was surprised my values correlated so well with so many other sources. This suggested to me that all lightboxes (including true ISs) have some difficulty in accurately integrating dedicated thrower lights. My point was that all attempts to compare overall output in lights with widely different beam patterns needs to be considered carefully (i.e., there may be systematic biases in all measures).

But at the end of the day, I thought the analysis and correlation results told a pretty compelling story – especially since they consisted of the multiple output levels of over 40 lights taken from 6 different sources. You could thus feel fairly confident in converting my lightbox readings to something approximating lumens.

As an aside, a pet peeve of mine has always been the difference between precision and accuracy. Technically speaking, accuracy refers to the degree of conformity of a measure to the actual, true value. Precision is simply the degree of refinement with which a measure can be taken or stated.

In this case, I know my lightbox is precise to 3-4 significant figures (i.e. I can reliably get a value to that stated precision, on repeated testing, assuming identical placement in the box). Of course, that says nothing about how accurate the result is! My “feel” for the range of lumen estimates out there tells me that we really shouldn’t be reporting anything more than 2 sig figs for the output measures. Beyond that, I rather doubt any given IS is really all that accurate to the “true” lumen output.

Of course, scientific notation doesn’t work well for text, so what I’ve always done is round my estimates to the nearest 2 sig figs (e.g. 140, 210, 16, etc.), with +/- 0.5 for the larger values >1000 (e.g., 2550, 1300, 1450, etc.). That was really as far as I felt comfortable pushing my results.

How well does my calibration hold up for modern >800 lumen flashlights? Then and now.

Even during the time of my original analysis, I had an issue with higher output lights. At the time, most of these high-power lights were actually too big to fit into my lightbox. Many of them had large multi-emitter heads, or massive reflectors for more throw.

So I had to find an alternative to sticking them in my lightbox. Since the original goal of a milk carton lightbox was to simulate a ceiling bounce measure in a small room/closet, that is exactly what I went back to for output measures for high-powered lights.

Back in 2012, I went and performed a comparable analysis of ceiling bounce-to-lumen correlations for a number of high output lights in small powder room in my house with no windows, in much the same way as I did for the lightbox-to-lumen conversion. The results allowed me to fairly accurately provide estimated lumens for high output lights.

Indeed, I was actually surprised at how well this conversion continued to be consistent for new lights with properly-tested lumen measures (i.e., my method continued to appear to be accurate, within a low margin of error).

This is when I first starting presenting graphs of high-output lights directly in estimated lumens, instead of ROVs (since I now had two different sets of ROVs, one for the lightbox and one for room). But since I had always done ceiling bounce measures of outputs for all my lights, this gave me a large database to correlate lightbox readings to my ceiling bounce setup. So, I could easily represent my high-output lights on the same relative lightbox output scale, by converting backwards from the room lumen estimates. This was convenient when wanting to compare the output/runtime performance to older lights.

As an aside, if you want to follow the discussion thread of this method on CPF at the time, please see my original lumen estimation thread there.

Flash-forward to today (2023) – now, you can easily get small lights that fit into my lightbox that easily produce many thousands of lumens.  So how does my original power relationship calibration hold up for these directly measured higher output lights?

Not surprisingly, it doesn’t – at least, not at higher outputs. Up to ~700-800 lumens or so, the main lumen estimation power calibration described above this still seems to work very well. But fear not – using a similar methodology as before, I tested a series of new high-output lights from makers with IS-verified lumens, to see if I could find a new calibration standard. And I did. It turns out, there’s a simple adjustment factor that I can apply to the lightbox’s calibrated photoresistive measures between ~800 and ~5000 lumens that results in consistently comparable lumen estimates to published specs for those lights using the exact same previous <800 lumen power relationship.

So we are back in business: I can accurately report relative estimated lumens for the runtimes on all the new lights I’m testing. 🙂  You can thus feel confident that the lumen estimates I provide today are very comparable to what I provided in my heyday or reviewing back in ~2008-2015 (i.e., the estimates are backward compatible).

But just like before, I make no claim that these are accurate lumen estimates. They are simply consistent ones – both internally consistent among the new models, and backwards consistent with my old reviews. As mentioned above, I feel my lumen estimates are on the high side, but I don’t see a point in trying to start over from scratch at this stage with a whole new setup.  Best just to continue with an updated calibration standard, so all my new reviews and older reviews remain consistent with one another.

That said, a comparison to other recent reviewers suggests that my lightbox calibration is ~20% higher on average. So if you might want to take that into account if you want to compare to other reviewers.

I hope you find the ongoing analyses useful. If you don’t, well, perhaps you can relate to this perspective: like Doug P., I too was criticized back in the day by some output purists for my lightbox approach (although they couldn’t argue the math). But hey, if a jury-rigged solution for turning a square into a circle was apparently good enough to get the Apollo 13 astronauts back home safely, I think it is ok for a few runtime graphs. 🙂

Since Doug’s site is long gone, I figure it is only fitting that I repeat here his final words, that I long ago captured from his lightbox design page:

On a side note: If anyone decides to aggressively demonstrate their scientific prowess my sending a nasty e-mail stating that this is the most poorly conceived, horribly executed, and grossly inaccurate thing they have ever seen, please consider the following: All such responses will be ignored unless accompanied by a cashiers check sent by US mail for $20,000 to cover the cost of an integrating sphere and the associated equipment. Thank you! 🙂